当前位置：网站首页>05_ Feature Engineering - dimension reduction

05_ Feature Engineering - dimension reduction

2022-06-11 17:21:00 【IT-cute】

List of articles

One 、 Dimension reduction
- 1.1 Necessity of dimension reduction
- 1.2 Dimensionality reduction purpose
Two 、 Dimension reduction —PCA（ Unsupervised ）
3、 ... and 、 Dimension reduction —LDA（ Supervised ）
- 3.1 LDA principle
- 3.2 LDA problem solving
Four 、PCA and LDA similarities and differences

One 、 Dimension reduction

Feature dimensionality reduction can only be carried out after feature selection .
When feature selection is complete , You can train the model directly , But it may be due to too large characteristic matrix , Resulting in a large amount of calculation , The problem of long training time , Therefore, it is necessary to reduce the dimension of feature matrix .
Common dimensionality reduction methods include be based on L1 The penalty model Outside , also Principal component analysis (PCA) and Linear discriminant analysis (LDA), The essence of both methods is Map the original data to the sample space with lower dimension ;
But in a different way ,PCA To make the mapped sample have Greater divergence ,LDA To make the mapped sample have Best classification performance .
Besides using PCA and LDA Dimension reduction outside , You can also use Theme model To achieve the effect of dimension reduction .

1.1 Necessity of dimension reduction

In actual machine learning projects , feature selection / Dimensionality reduction is Must be carried out Of , Because there are several problems in the data ：

Data Multicollinearity ： There is a correlation between feature attributes . Multicollinearity will lead to the spatial instability of the solution , Thus, the generalization ability of the model is weak ;
High latitude spatial samples have sparsity , This makes it difficult for the model to find data features ;
Too many variables will prevent the model from finding rules ;
Only considering the influence of a single variable on the target attribute may ignore the potential relationship between variables .

1.2 Dimensionality reduction purpose

The purpose of dimensionality reduction is ：

Reduce the number of feature attributes .
Make sure that there is... Between feature attributes Are independent of each other Of .（ Features are independent of each other ）

Two 、 Dimension reduction —PCA（ Unsupervised ）

Principal component analysis (PCA)： Merge the feature vectors of high latitude into the feature attributes of low latitude , It's a kind of Unsupervised Dimension reduction method of .

n_components： The number of new features .

Insert picture description here

2.1 PCA principle

PCA(Principal Component Analysis) It is a commonly used linear dimensionality reduction method , It's an unsupervised dimension reduction algorithm . The goal of the algorithm is through some kind of Linear projection , Mapping high dimensional data to low dimensional space to represent , also It is expected that the variance of the data on the projected dimension is the largest （ Maximum variance theory ）, In order to use fewer data dimensions , At the same time, the characteristics of more original data points are retained .
Generally speaking , If you map all the points together , Then the dimension must be reduced , But at the same time, almost all the information ( Including the distance between dots ) It's all lost , and ** If the mapped data has a large variance , Then it can be considered that the data points will be scattered , In this case , You can keep more information .** So we can see PCA It is an unsupervised linear dimensionality reduction method with the least loss of original data information .
stay PCA In dimension reduction , The data is converted from the original coordinate system to the new coordinate system , The choice of the new coordinate system is determined by the characteristics of the data itself . The first axis is selected from the original data The direction with the largest variance , Statistically speaking , This is the most important direction ; Second axis selection and first axis vertical perhaps orthogonal The direction of ; The third axis selection and the first 、 The second axis is vertical perhaps orthogonal The direction of ; The process is repeated , The calculation ends when the number of dimensions in the new coordinate system is consistent with that in the original coordinate system . The data characteristics represented by these directions are called **“ The principal components ”**.

Insert picture description here

2.2 PCA Calculation

hypothesis X Is already centralized （z-score） Passed data matrix , One sample per column （ One feature per line ）; Sample points x_i The projection on the hyperplane in the new space is ：W^Tx_i; If the projections of all sample points can be separated as far as possible , It means that the variance of each dimension of the point after projection should be maximized , that The sum of variances of each dimension of projected sample points It can be expressed as ：
Insert picture description here

2.3 PCA Implementation process of

Input ： Sample set X={x₁,x₂,…,x_n}; Each sample has m Whitman's sign ,X It's a m That's ok n Columns of the matrix .
step ：
- 1、 Data centric ： Yes X Each line in ( A characteristic attribute ) Let's do zero averaging , Minus the mean of this row .（ Standardization ）
- 2、 Find out the matrix after data centralization X The covariance matrix of ( That is, the matrix formed by the covariance between features )
- 3、 Solve the eigenvalues and eigenvectors of the covariance matrix
- 4、 The arrangement of eigenvectors in columns from large to small according to eigenvalues is called a matrix , Get the top k Column data forms a matrix W.
- 5、 Using the matrix W And sample set X The multiplication of the matrix is reduced to k The final data matrix of dimension .

2.4 PCA Case study

Insert picture description here

2.5 PCA Reduced dimensional SVD The solution

Insert picture description here

3、 ... and 、 Dimension reduction —LDA（ Supervised ）

Linear judgment analysis (LDA)： LDA It is an operation of merging feature attributes based on classification model , It's a kind of Supervised Dimension reduction method of .
Insert picture description here

3.1 LDA principle

LDA The full name is Linear Discriminant Analysis（ Linear discriminant analysis ）, Is a supervised learning algorithm .
LDA The principle is , The tagged data （ spot ）, By means of projection , Projected into a lower dimensional space , Make the projected point , Will form a classification , Cluster by cluster , Points of the same category , Will be closer to... In the projected space . In a word, it is ：“ Minimum intra class variance after projection , Maximum variance between classes ”

Insert picture description here

3.2 LDA problem solving

Assume conversion to w, Then the linear transformation function is x’= w^Tx; And the converted data is one-dimensional .
Consider the case of binary classification , Consider the converted value Greater than a certain threshold , Belong to a certain category , Less than or equal to a certain threshold , Belong to another category , Using the class sample Center point To represent category information , At this time, the distance between the two centers is the farthest ：

Insert picture description here

At the same time, it is required that the sample data in the same category should be as close as possible , That is, the covariance of projection points of the same category should be as small as possible .

Insert picture description here

Combine the two , So our final objective function is ：

Insert picture description here

Transform the objective function （A、B For the square ,A Is a positive definite matrix ）：

Insert picture description here

Four 、PCA and LDA similarities and differences

The same thing ：

Both of them can reduce the dimension of data .
Both of them use the idea of matrix decomposition to reduce the dimension .
Both assume that the data is Gaussian .

Difference ：

LDA It is a supervised dimensionality reduction algorithm ,PCA It is an unsupervised dimensionality reduction algorithm .
LDA Dimensionality reduction can be reduced to the number of categories at most k-1 Dimension of , and PCA There is no limit to .
LDA In addition to dimension reduction , It can also be applied to classification .
LDA Choose the projection with the best classification performance , and PCA Select the direction where the sample point projection has the maximum variance .