当前位置：网站首页>Summary of machine learning + pattern recognition learning (VI) -- feature selection and feature extraction

Summary of machine learning + pattern recognition learning (VI) -- feature selection and feature extraction

2022-06-12 07:28:00 【Nidiya is trying】

One 、 Feature extraction and selection task definition ： After getting some specific features of the actual object , Then these original features produce a pair of classification recognition Most effective 、 The least number Characteristics of . So that the points of different patterns are far apart in the minimum dimension feature space , Similar mode points are close to each other .

Two 、 The background of feature extraction and selection ：① Few characteristic measurements were obtained , As a result, less information is provided ② Too many measurements were obtained , Cause dimensional disaster ( When the number of features reaches the limit , The performance is not good )③ Features have a lot of useless information , Or some useful information can not reflect the essence , Only through transformation can we get more meaningful quantities .

Two 、 Two basic approaches to feature selection and extraction

（ One ） Direct selection ( feature selection )： Directly from what has been obtained n Select from the original features d Features . The main methods are ： Statistical test method 、 Branch and bound method 、 Genetic algorithm, etc .

1、 Optimal search algorithm —— Branch and bound method (BAB Algorithm )： Using the monotonicity of the separability criterion, the branch and bound strategy , The tree structure of left small and right large sum value , So that some feature combinations are not calculated and the global optimization is not affected . This fast search method with the above characteristics is called branch and bound .

（1） The reason why the branch and bound method is efficient ：① When constructing the search tree , The right side of each subtree whose child nodes of the same parent node are root is less than the left side , The structure of the tree is simple on the right .② In the same level , Node J The value is smaller on the left and larger on the right , The search process is from right to left .③ from J Monotonicity knowledge of , Search for a node on the tree J The value is greater than that of each node of the subtree with this node as the root node J value . from ①②③ know , There are many feature combinations that can still obtain the global optimal solution without calculation .

2、 Suboptimal search algorithm

（1） Single optimal feature selection method ： Calculate the criterion value when each feature is used alone and sort it in descending order , Before selection d Features with the best classification effect .

（2） Add feature method ( Sequential forward method SFS)： Select one feature from the unselected features at a time , The separability criterion value when it is combined with the selected features J Maximum .

（3） Subtractive feature method ( Sequential backward method SBS)： Start with all features and eliminate one feature at a time , The eliminated features shall maximize the value of the remaining feature combination .

（4） increase l reduce r Law ： combination (2)、(3), Add local backtracking .

3、 Genetic algorithm (ga) ： It is a search algorithm based on natural selection and population genetic mechanism , The phenomena of breeding, hybridization and mutation in the process of natural selection and natural heredity are simulated . When using genetic algorithm to solve problems ：① Each possible solution of the problem is encoded as a “ Chromosome ”, That is, individual . Several individuals form a group ( All possible solutions )② At the beginning of genetic algorithm , Randomly generate some individuals ( Initial solution )③ Each individual is evaluated according to a predetermined objective function , Give fitness , Based on this fitness, some individuals are selected to produce the next generation ,“ bad ” Individuals are eliminated , The higher the fitness, the easier it is to be selected ④ The selected individuals are combined by cross mutation operator to generate a new generation , Individuals of this generation inherit some good traits of the previous generation , Therefore, it is superior to the previous generation in performance , In this way, it gradually evolves in the direction of the optimal solution .

（1） Algorithm steps ： code —— Initial solution —— Fitness assessment —— The higher the fitness assessment, the easier it is to be selected —— Cross variation —— Generate the next generation —— Conduct fitness assessment , Repeat the cycle in this way . When the evolutionary algebra exceeds the threshold or no better solution is obtained for successive generations , Then stop the algorithm .【 Population size and evolutionary algebra are two important parameters 】

（2） Genetic manipulation ： Simulate the operation of biological genes , Its task is to impose certain operations on individuals according to their fitness , So as to realize the evolutionary process of survival of the fittest , The solution of the problem can be optimized generation by generation , Approach the optimal solution . Genetic operation includes three basic genetic operators ： choice 、 cross 、 variation ：

① choice 、 Crossover basically completes most of the search functions of genetic algorithm , Selection is based on fitness , Crossover is an important means to obtain excellent individuals ;

② Mutation increases the ability of genetic algorithm to find the optimal solution , Can avoid due to choice 、 The permanent loss of some information caused by the crossover operator , It ensures the effectiveness of genetic algorithm , Make the genetic algorithm have the ability of local random search .

（3） The five basic elements of genetic algorithm design ： Parameter code 、 Initial group design 、 Fitness function design 、 Genetic manipulation design 、 Control parameter setting

（ Two ） Transformation method ( feature extraction )： Yes n Three original features are transformed to reduce the dimension , Coordinate transformation and then take subspace . The main methods are feature selection based on separability criterion 、 Feature selection based on misjudgment probability 、 discrete K-L Transformation method (DKLT)、 Feature selection based on decision boundary .

1、 Principal component analysis PCA： Make the data set consist of fewer “ It works ” Characteristics . The idea is find “ The main ” Elements and structures , Remove noise and redundancy , The original complex data Dimension reduction , Reveal the simple structure behind the complex data .【 Find one from the original d Dimension input space to new K Mapping of dimensional space with minimum information loss .】 Based on maximizing variance .

2、 be based on K-L Transform feature extraction ：PCA yes K-L One of the most basic forms of transformation . The essence is the rotation of coordinates .

（ 3、 ... and ） Comparison between feature selection and feature transformation ：

1、 Feature selection is from D Select one of the original features d individual , Keep the original physical meaning ;

2、 The characteristic transformation is to transform D The original features are transformed into d A new feature , The resulting features Has no original physical meaning , But generally speaking Eliminate the correlation between features , Reduce information irrelevant to classification in features .

原网站

版权声明
本文为[Nidiya is trying]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010557067694.html

当前位置：网站首页>Summary of machine learning + pattern recognition learning (VI) -- feature selection and feature extraction

Summary of machine learning + pattern recognition learning (VI) -- feature selection and feature extraction

边栏推荐

猜你喜欢

随机推荐