当前位置:网站首页>05_ Feature Engineering - dimension reduction

05_ Feature Engineering - dimension reduction

2022-06-11 17:21:00 IT-cute

One 、 Dimension reduction

  • Feature dimensionality reduction can only be carried out after feature selection .

  • When feature selection is complete , You can train the model directly , But it may be due to too large characteristic matrix , Resulting in a large amount of calculation , The problem of long training time , Therefore, it is necessary to reduce the dimension of feature matrix .

  • Common dimensionality reduction methods include be based on L1 The penalty model Outside , also Principal component analysis (PCA) and Linear discriminant analysis (LDA), The essence of both methods is Map the original data to the sample space with lower dimension ;

  • But in a different way ,PCA To make the mapped sample have Greater divergence ,LDA To make the mapped sample have Best classification performance .

  • Besides using PCA and LDA Dimension reduction outside , You can also use Theme model To achieve the effect of dimension reduction .

1.1 Necessity of dimension reduction

In actual machine learning projects , feature selection / Dimensionality reduction is Must be carried out Of , Because there are several problems in the data :

  • Data Multicollinearity : There is a correlation between feature attributes . Multicollinearity will lead to the spatial instability of the solution , Thus, the generalization ability of the model is weak ;
  • High latitude spatial samples have sparsity , This makes it difficult for the model to find data features ;
  • Too many variables will prevent the model from finding rules ;
  • Only considering the influence of a single variable on the target attribute may ignore the potential relationship between variables .

1.2 Dimensionality reduction purpose

The purpose of dimensionality reduction is :

  • Reduce the number of feature attributes .
  • Make sure that there is... Between feature attributes Are independent of each other Of .( Features are independent of each other )

Two 、 Dimension reduction —PCA( Unsupervised )

Principal component analysis (PCA): Merge the feature vectors of high latitude into the feature attributes of low latitude , It's a kind of Unsupervised Dimension reduction method of .

  • n_components: The number of new features .

 Insert picture description here

2.1 PCA principle

  • PCA(Principal Component Analysis) It is a commonly used linear dimensionality reduction method , It's an unsupervised dimension reduction algorithm . The goal of the algorithm is through some kind of Linear projection , Mapping high dimensional data to low dimensional space to represent , also It is expected that the variance of the data on the projected dimension is the largest ( Maximum variance theory ), In order to use fewer data dimensions , At the same time, the characteristics of more original data points are retained .
  • Generally speaking , If you map all the points together , Then the dimension must be reduced , But at the same time, almost all the information ( Including the distance between dots ) It's all lost , and ** If the mapped data has a large variance , Then it can be considered that the data points will be scattered , In this case , You can keep more information .** So we can see PCA It is an unsupervised linear dimensionality reduction method with the least loss of original data information .
  • stay PCA In dimension reduction , The data is converted from the original coordinate system to the new coordinate system , The choice of the new coordinate system is determined by the characteristics of the data itself . The first axis is selected from the original data The direction with the largest variance , Statistically speaking , This is the most important direction ; Second axis selection and first axis vertical perhaps orthogonal The direction of ; The third axis selection and the first 、 The second axis is vertical perhaps orthogonal The direction of ; The process is repeated , The calculation ends when the number of dimensions in the new coordinate system is consistent with that in the original coordinate system . The data characteristics represented by these directions are called **“ The principal components ”**.

 Insert picture description here

2.2 PCA Calculation

hypothesis X Is already centralized (z-score) Passed data matrix , One sample per column ( One feature per line ); Sample points xi The projection on the hyperplane in the new space is :WTxi; If the projections of all sample points can be separated as far as possible , It means that the variance of each dimension of the point after projection should be maximized , that The sum of variances of each dimension of projected sample points It can be expressed as :
 Insert picture description here
 Insert picture description here

2.3 PCA Implementation process of

  • Input : Sample set X={x1,x2,…,xn}; Each sample has m Whitman's sign ,X It's a m That's ok n Columns of the matrix .

  • step :

    • 1、 Data centric : Yes X Each line in ( A characteristic attribute ) Let's do zero averaging , Minus the mean of this row .( Standardization )

    • 2、 Find out the matrix after data centralization X The covariance matrix of ( That is, the matrix formed by the covariance between features )

    • 3、 Solve the eigenvalues and eigenvectors of the covariance matrix

    • 4、 The arrangement of eigenvectors in columns from large to small according to eigenvalues is called a matrix , Get the top k Column data forms a matrix W.

    • 5、 Using the matrix W And sample set X The multiplication of the matrix is reduced to k The final data matrix of dimension .

2.4 PCA Case study

 Insert picture description here

2.5 PCA Reduced dimensional SVD The solution

 Insert picture description here

3、 ... and 、 Dimension reduction —LDA( Supervised )

Linear judgment analysis (LDA): LDA It is an operation of merging feature attributes based on classification model , It's a kind of Supervised Dimension reduction method of .
 Insert picture description here

3.1 LDA principle

  • LDA The full name is Linear Discriminant Analysis( Linear discriminant analysis ), Is a supervised learning algorithm .
  • LDA The principle is , The tagged data ( spot ), By means of projection , Projected into a lower dimensional space , Make the projected point , Will form a classification , Cluster by cluster , Points of the same category , Will be closer to... In the projected space . In a word, it is :“ Minimum intra class variance after projection , Maximum variance between classes ”

 Insert picture description here

3.2 LDA problem solving

  • Assume conversion to w, Then the linear transformation function is x’= wTx; And the converted data is one-dimensional .
  • Consider the case of binary classification , Consider the converted value Greater than a certain threshold , Belong to a certain category , Less than or equal to a certain threshold , Belong to another category , Using the class sample Center point To represent category information , At this time, the distance between the two centers is the farthest :

 Insert picture description here

  • At the same time, it is required that the sample data in the same category should be as close as possible , That is, the covariance of projection points of the same category should be as small as possible .

 Insert picture description here

  • Combine the two , So our final objective function is :

 Insert picture description here

  • Transform the objective function (A、B For the square ,A Is a positive definite matrix ):

 Insert picture description here

Four 、PCA and LDA similarities and differences

The same thing :

  • Both of them can reduce the dimension of data .
  • Both of them use the idea of matrix decomposition to reduce the dimension .
  • Both assume that the data is Gaussian .

Difference :

  • LDA It is a supervised dimensionality reduction algorithm ,PCA It is an unsupervised dimensionality reduction algorithm .
  • LDA Dimensionality reduction can be reduced to the number of categories at most k-1 Dimension of , and PCA There is no limit to .
  • LDA In addition to dimension reduction , It can also be applied to classification .
  • LDA Choose the projection with the best classification performance , and PCA Select the direction where the sample point projection has the maximum variance .
     Insert picture description here
原网站

版权声明
本文为[IT-cute]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206111711145333.html