当前位置：网站首页>Chapter XI feature selection

Chapter XI feature selection

2022-07-08 01:07:00 【Intelligent control and optimization decision Laboratory of Cen】

1、 Briefly describe the purpose of feature selection .

Feature selection is an important “ Data preprocessing ” The process , In real machine learning tasks , Feature selection is usually carried out after obtaining data , Then train the learning machine . There are two main purposes for our feature selection ：

Reduce the disaster of dimensionality
We often encounter the problem of dimension disaster in real tasks , This is caused by too many attributes , If you can choose important features , So that the subsequent learning process only needs to build a model on some features , Then the disaster of dimension will be greatly reduced . In that sense , The motivation of feature selection is similar to that of dimensionality reduction introduced in Chapter 10 .
Reduce the difficulty of learning tasks
Removing irrelevant features often reduces the difficulty of learning tasks , It's like a detective solving a case , If you strip the cocoon of complicated factors , Leave only the key elements , The truth is often easier to see .

2、 Try to compare the similarities and differences between feature selection and dimension reduction methods introduced in Chapter 10 .

The purpose of dimensionality reduction and feature selection is to reduce the data dimension . But in fact, the difference between the two is very big , Their essence is completely different . Now let's focus on the difference between the two ：

Dimension reduction
Dimensionality reduction is essentially mapping from one dimension space to another , The number of features does not decrease , Of course, in the process of mapping, the eigenvalues will change accordingly . for instance , Now it's characterized by 1000 dimension , We want to bring it down to 500 dimension . The process of dimensionality reduction is to find a new one 1000 Dimension maps to 500 Mapping of dimensions . In raw data 1000 Features , Each corresponds to the dimension reduced 500 A value in dimensional space . Suppose that the value of one of the original features is 9, Then the corresponding value after dimension reduction may be 3.
feature selection
Feature selection is to select part of the extracted features as training set features , The feature does not change its value before and after selection , But the feature dimension after selection must be smaller than before , After all, we only chose some of the features . for instance , Now it's characterized by 1000 dimension , Now we are going to start from here 1000 Choose among the features 500 individual , Well, this 500 The value of each feature is the same as that of the corresponding original feature 500 The two eigenvalues are exactly the same . For another 500 Features that were not selected were directly discarded . Suppose that the value of one of the original features is 9, Then feature selection after selecting this feature, its value is still 9, It hasn't changed .

3、 According to the selection strategy, feature selection can be divided into what kinds , Explain their characteristics respectively .

Filter selection
The filtering method is to select the features of the data set first , And then train the learner . The feature selection process has nothing to do with the subsequent learners , This is equivalent to first analyzing the initial characteristics “ Filter ”, Then use the filtered features to train the model .
Filter selection methods include ：
1. Remove low variance features ;
2. Correlation coefficient ranking , The correlation coefficient between each feature and the output value is calculated separately , Set a threshold , Select some features whose correlation coefficient is greater than the threshold ;
3. The correlation between features and output values is obtained by hypothesis test , Methods include, for example, chi square test 、t test 、F Inspection, etc .
4. Mutual information , Using mutual information, the correlation is analyzed from the perspective of information entropy .
Package selection
Wrap type continuously selects feature subsets from the initial feature set , Training learner , Evaluate the subset according to the performance of the learner , Until the best subset is selected . Wrapped feature selection is directly optimized for a given learner .
advantage ： From the performance of the final learner , Wrapped is better than filtered ;
shortcoming ： Because the learner needs to be trained many times in the process of feature selection , Therefore, the computational cost of wrapped feature selection is usually larger than that of filtered feature selection .
Embedded options
Embedded feature selection integrates feature selection and learner training , Both are completed in the same optimization process , Feature selection is also automatically carried out in the process of learner training .
The main idea is ： In the process of determining the model , Pick out the attributes that are important to the training of the model .

4、 Try to write Relief-F Algorithm description of .

Relief The algorithm is simple and efficient , But it can only handle two kinds of data , The improved Relief-F Algorithm , It can deal with many kinds of problems .Relief-F When dealing with multi class problems , Take one sample randomly from the training sample set each time R, Then from and R Find similar samples R Of k Nearest neighbor samples (near Hits), From each R Of different classes of samples k Nearest neighbor samples (near Misses), Then update the weight of each feature , It is shown in the following formula ：
$W(A)=W(A)-\sum_{j=1}^{k}diff(A,R,H_j)/(mk)+\sum_{C\notin class(R)}[\frac{p(C)}{1-p(class(R))}\sum_{j=1}^{k}diff(A,R,M_j(C))]/(mk)$
upper-middle $diff(A,R_1,R_2)$ Representative sample $R_1$ and $R_2$ In character $A$ The difference in , $M_j(C)$ Represents a class $C\notin class(R)$ pass the civil examinations $j$ Nearest neighbor samples , As follows ：
$diff(A,R_1,R_2)=\left\{ \begin{array}{rcl} \frac{|R_1[A]-R_2[A]|}{max(A)-min(A)} & &\text{if A is continuous} \\ 0 & & \text{if A is discrete and $R_1[A]\neq R_2[A]$}\\ 1 && \text{if A is discrete and $R_1[A]= R_2[A]$} \end{array} \right.$
The significance of weight lies in , Subtract the difference of this feature of the same classification , Add the difference of this feature in different classifications .（ If the feature is related to classification , Then the value of the feature of the same classification should be similar , The values of different categories should be different ）.

5、 Try to describe the basic framework of filtering and wrapping selection methods .

Filtering selection is to design a filtering method for feature selection , Then train the learner . And this filtering method is to design a “ Correlation statistics ”, To calculate the characteristics , Finally, set a threshold to choose . Calculation of relevant statistics ： For each sample $x_i$ , He can do the following things ： Find out the same kind , The nearest sample $x_1$ ; In the alien , Find the nearest $x_2$ . If $x_i$ And $x_1$ A more recent , It shows that features are beneficial to the same kind and different kinds , Will increase the corresponding statistics ; conversely , If $x_i$ And $x_2$ A more recent , Explain that the feature plays a side effect , Will reduce the corresponding statistics . As shown in the figure below ：
Insert picture description here
2. The feature selection process of wrapped selection method is related to learners , It directly uses the performance of the learner as the evaluation criterion of feature selection , Select the feature subset that is most conducive to the performance of the learner . As shown in the figure below ：

Insert picture description here

原网站

版权声明
本文为[Intelligent control and optimization decision Laboratory of Cen]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202130549316746.html