当前位置:网站首页>Summary of machine learning + pattern recognition learning (VI) -- feature selection and feature extraction
Summary of machine learning + pattern recognition learning (VI) -- feature selection and feature extraction
2022-06-12 07:28:00 【Nidiya is trying】
One 、 Feature extraction and selection task definition : After getting some specific features of the actual object , Then these original features produce a pair of classification recognition Most effective 、 The least number Characteristics of . So that the points of different patterns are far apart in the minimum dimension feature space , Similar mode points are close to each other .
Two 、 The background of feature extraction and selection :① Few characteristic measurements were obtained , As a result, less information is provided ② Too many measurements were obtained , Cause dimensional disaster ( When the number of features reaches the limit , The performance is not good )③ Features have a lot of useless information , Or some useful information can not reflect the essence , Only through transformation can we get more meaningful quantities .
Two 、 Two basic approaches to feature selection and extraction
( One ) Direct selection ( feature selection ): Directly from what has been obtained n Select from the original features d Features . The main methods are : Statistical test method 、 Branch and bound method 、 Genetic algorithm, etc .
1、 Optimal search algorithm —— Branch and bound method (BAB Algorithm ): Using the monotonicity of the separability criterion, the branch and bound strategy , The tree structure of left small and right large sum value , So that some feature combinations are not calculated and the global optimization is not affected . This fast search method with the above characteristics is called branch and bound .
(1) The reason why the branch and bound method is efficient :① When constructing the search tree , The right side of each subtree whose child nodes of the same parent node are root is less than the left side , The structure of the tree is simple on the right .② In the same level , Node J The value is smaller on the left and larger on the right , The search process is from right to left .③ from J Monotonicity knowledge of , Search for a node on the tree J The value is greater than that of each node of the subtree with this node as the root node J value . from ①②③ know , There are many feature combinations that can still obtain the global optimal solution without calculation .
2、 Suboptimal search algorithm
(1) Single optimal feature selection method : Calculate the criterion value when each feature is used alone and sort it in descending order , Before selection d Features with the best classification effect .
(2) Add feature method ( Sequential forward method SFS): Select one feature from the unselected features at a time , The separability criterion value when it is combined with the selected features J Maximum .
(3) Subtractive feature method ( Sequential backward method SBS): Start with all features and eliminate one feature at a time , The eliminated features shall maximize the value of the remaining feature combination .
(4) increase l reduce r Law : combination (2)、(3), Add local backtracking .
3、 Genetic algorithm (ga) : It is a search algorithm based on natural selection and population genetic mechanism , The phenomena of breeding, hybridization and mutation in the process of natural selection and natural heredity are simulated . When using genetic algorithm to solve problems :① Each possible solution of the problem is encoded as a “ Chromosome ”, That is, individual . Several individuals form a group ( All possible solutions )② At the beginning of genetic algorithm , Randomly generate some individuals ( Initial solution )③ Each individual is evaluated according to a predetermined objective function , Give fitness , Based on this fitness, some individuals are selected to produce the next generation ,“ bad ” Individuals are eliminated , The higher the fitness, the easier it is to be selected ④ The selected individuals are combined by cross mutation operator to generate a new generation , Individuals of this generation inherit some good traits of the previous generation , Therefore, it is superior to the previous generation in performance , In this way, it gradually evolves in the direction of the optimal solution .
(1) Algorithm steps : code —— Initial solution —— Fitness assessment —— The higher the fitness assessment, the easier it is to be selected —— Cross variation —— Generate the next generation —— Conduct fitness assessment , Repeat the cycle in this way . When the evolutionary algebra exceeds the threshold or no better solution is obtained for successive generations , Then stop the algorithm .【 Population size and evolutionary algebra are two important parameters 】
(2) Genetic manipulation : Simulate the operation of biological genes , Its task is to impose certain operations on individuals according to their fitness , So as to realize the evolutionary process of survival of the fittest , The solution of the problem can be optimized generation by generation , Approach the optimal solution . Genetic operation includes three basic genetic operators : choice 、 cross 、 variation :
① choice 、 Crossover basically completes most of the search functions of genetic algorithm , Selection is based on fitness , Crossover is an important means to obtain excellent individuals ;
② Mutation increases the ability of genetic algorithm to find the optimal solution , Can avoid due to choice 、 The permanent loss of some information caused by the crossover operator , It ensures the effectiveness of genetic algorithm , Make the genetic algorithm have the ability of local random search .
(3) The five basic elements of genetic algorithm design : Parameter code 、 Initial group design 、 Fitness function design 、 Genetic manipulation design 、 Control parameter setting
( Two ) Transformation method ( feature extraction ): Yes n Three original features are transformed to reduce the dimension , Coordinate transformation and then take subspace . The main methods are feature selection based on separability criterion 、 Feature selection based on misjudgment probability 、 discrete K-L Transformation method (DKLT)、 Feature selection based on decision boundary .
1、 Principal component analysis PCA: Make the data set consist of fewer “ It works ” Characteristics . The idea is find “ The main ” Elements and structures , Remove noise and redundancy , The original complex data Dimension reduction , Reveal the simple structure behind the complex data .【 Find one from the original d Dimension input space to new K Mapping of dimensional space with minimum information loss .】 Based on maximizing variance .
2、 be based on K-L Transform feature extraction :PCA yes K-L One of the most basic forms of transformation . The essence is the rotation of coordinates .
( 3、 ... and ) Comparison between feature selection and feature transformation :
1、 Feature selection is from D Select one of the original features d individual , Keep the original physical meaning ;
2、 The characteristic transformation is to transform D The original features are transformed into d A new feature , The resulting features Has no original physical meaning , But generally speaking Eliminate the correlation between features , Reduce information irrelevant to classification in features .
边栏推荐
- 2022年G3锅炉水处理复训题库及答案
- 【高考那些事】准大学生看过来,选择方向和未来,自己把握
- Interview intelligence questions
- Vs2019 MFC IP address Control Control inherit cipaddressctrl class redessine
- Go common usage
- MySQL索引(一篇文章轻松搞定)
- Thoroughly understand the "rotation matrix / Euler angle / quaternion" and let you experience the beauty of three-dimensional rotation
- D
- Embedded gd32 code read protection
- RT thread studio learning (VIII) connecting Alibaba cloud IOT with esp8266
猜你喜欢

Freshmen are worried about whether to get a low salary of more than 10000 yuan from Huawei or a high salary of more than 20000 yuan from the Internet

Detailed explanation of 14 registers in 8086CPU

晶闸管,它是很重要的,交流控制器件

Machine learning from entry to re entry: re understanding of SVM

Use of gt911 capacitive touch screen

右击文件转圈卡住、刷新、白屏、闪退、桌面崩溃的通用解决方法

Personalized federated learning using hypernetworks paper reading notes + code interpretation

Difference and application of SPI, UART and I2C communication

Learning to continuously learn paper notes + code interpretation

Summary of machine learning + pattern recognition learning (II) -- perceptron and neural network
随机推荐
Interview computer network - transport layer
Imx6q PWM drive
Construction of running water lamp experiment with simulation software proteus
Use of gt911 capacitive touch screen
BI技巧丨当月期初
Unity用Shader实现UGU i图片边缘选中高亮
Modelarts培训任务1
Noi openjudge computes the n-th power of 2
Test manager defines and implements test metrics
Shortcut key modification of TMUX and VIM
@Datetimeformat @jsonformat differences
1.3-1.9 summary
Pyhon的第五天
Set up a remote Jupiter notebook
FCPX插件:简约线条呼出文字标题介绍动画Call Outs With Photo Placeholders for FCPX
Dynamic coordinate transformation in ROS (dynamic parameter adjustment + dynamic coordinate transformation)
libprint2
Thyristor, it is a very important AC control device
Velocity autocorrelation function lammps v.s MATALB
Complete set of typescript Basics