当前位置:网站首页>Matlab / envi principal component analysis implementation and result analysis

Matlab / envi principal component analysis implementation and result analysis

2022-07-07 06:24:00 PeanutbutterBoh

Antecedents feed : Recently, I was doing principal component analysis to screen variables , The purpose is to calculate the load of each environmental data on different principal components , But it feels wrong with the results of other papers , So I refer to some literature to try to understand .

1 Principal component load

Baidu Encyclopedia said : Principal component load ( oad of principal component) The correlation coefficient between original variables and principal components in principal component analysis .
Further understanding : Refer to this document for a particularly detailed link : Principal component analysis
 Insert picture description here
 Insert picture description here

So it's easy to understand , The principal component load is the coefficient before the original data , The principal component can be obtained by multiplying the principal component load with the original data .

2 matlab Principal component analysis experiment

According to the above ideas , I use them separately ENVI and matlab Did an experiment .
What I use here is 26 Environment variables , The resolution is the same . I want to use principal component analysis to calculate the loads of different variables , To determine which variables are important, and then keep them to run the model . Go straight up matlab Code for

clear all; clc
[tifname,tifpath] = uigetfile('.tif',' Selection environment tif data ','MultiSelect','on');

for i = 1:numel(tifname)
    [A,~] = geotiffread([tifpath,tifname{
    i}]);
    Environ_Var(:,:,i) = double(A);  %  Synthesize those environment variables into a matrix 
end

E = reshape(Environ_Var,size(Environ_Var,1)*size(Environ_Var,2),...
    size(Environ_Var,3));   %  Change three-dimensional data into two-dimensional , Columns are variables 
E(E == -9999) = nan;
E = E(~isnan(E(:,1)),:);  %  Remove nan value 
E_Norm = normalize(E);  %  Normalized variable 

[coeff,score,latent,~,explained,~] = pca(E_Norm,'Centered',false);

Be careful :1、 My variables have different unit dimensions , So we should standardize them , And then call pca Functions do not need to be centralized .
2、 No data value in my environment variable is -9999, So I removed them before standardization .

From the calculation results, we can know :
coeff That is, the main component load , Each column corresponds to a principal component (matlab It is called the principal component coefficient in the help document )
Why do you say that , from matlab Help document ( link : Principal component analysis of raw data ) You know :score*coeff ’ Can be restored to the original data .( As for why coeff The transpose ,score Is the principal component score , Rows correspond to observations , Columns correspond to components ;coeff Each column of contains a coefficient of the principal component . So use score Go to take coeff One line of is an observation )
 Insert picture description here
Then I also tried , It was found that it was , The recovered data is the same as my standardized data .
 Insert picture description here
From the beginning AT X = PC You can know by changing , Load is the principal component coefficient A.
But there is another problem here matlab Calculate this coeff Need to take -1( That is, the symbols are opposite ) Is the real variable load , As for why to look down ENVI The experiment of

3 ENVI Principal component analysis experiment

See the previous blog for specific operations ENVI5.3.1 Use Landsat 8 Image PCA example operation
But this blog has only pure steps , No analysis results , And pay attention : Choose correlation matrix or covariance matrix : If the dimension between the data is large, use the correlation matrix , If the dimension difference is not big or you have standardized the data, then use covariance . and ENVI There must be no nan value , Mask if necessary
Pay attention to this point, and then look directly ENVI result .( There is no screenshot of the specific results, I exist excel in )
 Insert picture description here
You can see ENVI Will give you characteristic value 、 Things like eigenvectors , The eigenvector is that each line represents a component , Columns represent variables .
Now take out the load of the first principal component (ENVI Eigenvector 、matlab Of coeff) Take a look at the results :
 Insert picture description here
It's amazing that the two values are almost the same, but the symbols are opposite . How to determine which symbol is correct ? That depends on the meaning of the load . that SST year mean give an example ( For example, you can also see this https://wenku.baidu.com/view/5dc0b7c1514de518964bcf84b9d528ea81c72f6f.html It's very detailed ), Now I know if SST Rising is bad for my biological survival , that SST The load of should be a minus sign .
So in matlab Draw a picture in , The closer the variable is to the origin , It needs to be eliminated
 Insert picture description here

4 summary

To calculate the load of the variable on the principal component ,ENVI It's the eigenvector eigenvector,matlab Is the principal component coefficient coeff cube -1 Change number .

原网站

版权声明
本文为[PeanutbutterBoh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070141276372.html