当前位置:网站首页>Matlab / envi principal component analysis implementation and result analysis
Matlab / envi principal component analysis implementation and result analysis
2022-07-07 06:24:00 【PeanutbutterBoh】
Antecedents feed : Recently, I was doing principal component analysis to screen variables , The purpose is to calculate the load of each environmental data on different principal components , But it feels wrong with the results of other papers , So I refer to some literature to try to understand .
Catalog
1 Principal component load
Baidu Encyclopedia said : Principal component load ( oad of principal component) The correlation coefficient between original variables and principal components in principal component analysis .
Further understanding : Refer to this document for a particularly detailed link : Principal component analysis
So it's easy to understand , The principal component load is the coefficient before the original data , The principal component can be obtained by multiplying the principal component load with the original data .
2 matlab Principal component analysis experiment
According to the above ideas , I use them separately ENVI and matlab Did an experiment .
What I use here is 26 Environment variables , The resolution is the same . I want to use principal component analysis to calculate the loads of different variables , To determine which variables are important, and then keep them to run the model . Go straight up matlab Code for
clear all; clc
[tifname,tifpath] = uigetfile('.tif',' Selection environment tif data ','MultiSelect','on');
for i = 1:numel(tifname)
[A,~] = geotiffread([tifpath,tifname{
i}]);
Environ_Var(:,:,i) = double(A); % Synthesize those environment variables into a matrix
end
E = reshape(Environ_Var,size(Environ_Var,1)*size(Environ_Var,2),...
size(Environ_Var,3)); % Change three-dimensional data into two-dimensional , Columns are variables
E(E == -9999) = nan;
E = E(~isnan(E(:,1)),:); % Remove nan value
E_Norm = normalize(E); % Normalized variable
[coeff,score,latent,~,explained,~] = pca(E_Norm,'Centered',false);
Be careful :1、 My variables have different unit dimensions , So we should standardize them , And then call pca Functions do not need to be centralized .
2、 No data value in my environment variable is -9999, So I removed them before standardization .
From the calculation results, we can know :
coeff That is, the main component load , Each column corresponds to a principal component (matlab It is called the principal component coefficient in the help document )
Why do you say that , from matlab Help document ( link : Principal component analysis of raw data ) You know :score*coeff ’ Can be restored to the original data .( As for why coeff The transpose ,score Is the principal component score , Rows correspond to observations , Columns correspond to components ;coeff Each column of contains a coefficient of the principal component . So use score Go to take coeff One line of is an observation )
Then I also tried , It was found that it was , The recovered data is the same as my standardized data .
From the beginning AT X = PC You can know by changing , Load is the principal component coefficient A.
But there is another problem here matlab Calculate this coeff Need to take -1( That is, the symbols are opposite ) Is the real variable load , As for why to look down ENVI The experiment of
3 ENVI Principal component analysis experiment
See the previous blog for specific operations ENVI5.3.1 Use Landsat 8 Image PCA example operation
But this blog has only pure steps , No analysis results , And pay attention : Choose correlation matrix or covariance matrix : If the dimension between the data is large, use the correlation matrix , If the dimension difference is not big or you have standardized the data, then use covariance . and ENVI There must be no nan value , Mask if necessary
Pay attention to this point, and then look directly ENVI result .( There is no screenshot of the specific results, I exist excel in )
You can see ENVI Will give you characteristic value 、 Things like eigenvectors , The eigenvector is that each line represents a component , Columns represent variables .
Now take out the load of the first principal component (ENVI Eigenvector 、matlab Of coeff) Take a look at the results :
It's amazing that the two values are almost the same, but the symbols are opposite . How to determine which symbol is correct ? That depends on the meaning of the load . that SST year mean give an example ( For example, you can also see this https://wenku.baidu.com/view/5dc0b7c1514de518964bcf84b9d528ea81c72f6f.html It's very detailed ), Now I know if SST Rising is bad for my biological survival , that SST The load of should be a minus sign .
So in matlab Draw a picture in , The closer the variable is to the origin , It needs to be eliminated
4 summary
To calculate the load of the variable on the principal component ,ENVI It's the eigenvector eigenvector,matlab Is the principal component coefficient coeff cube -1 Change number .
边栏推荐
- 缓存在高并发场景下的常见问题
- postgresql 数据库 timescaledb 函数time_bucket_gapfill()报错解决及更换 license
- SubGHz, LoRaWAN, NB-IoT, 物联网
- Swagger3 configuration
- Convert numbers to string strings (to_string()) convert strings to int sharp tools stoi();
- POI导出Excel:设置字体、颜色、行高自适应、列宽自适应、锁住单元格、合并单元格...
- [FPGA tutorial case 13] design and implementation of CIC filter based on vivado core
- Rk3399 platform development series explanation (interruption) 13.10, workqueue work queue
- JVM命令之- jmap:导出内存映像文件&内存使用情况
- Markdown 并排显示图片
猜你喜欢
[FPGA tutorial case 14] design and implementation of FIR filter based on vivado core
博士申请 | 上海交通大学自然科学研究院洪亮教授招收深度学习方向博士生
Markdown displays pictures side by side
Ideas of high concurrency and high traffic seckill scheme
当我们谈论不可变基础设施时,我们在谈论什么
Jcmd of JVM command: multifunctional command line
3531. 哈夫曼树
"Parse" focalloss to solve the problem of data imbalance
JVM命令之 jstat:查看JVM統計信息
软件测试的几个关键步骤,你需要知道
随机推荐
微信小程序隐藏video标签的进度条组件
Calculation model FPS
PostgreSQL database timescaledb function time_ bucket_ Gapfill() error resolution and license replacement
3531. 哈夫曼树
Crudini profile editing tool
Test the foundation of development, and teach you to prepare for a fully functional web platform environment
屏幕程序用串口无法调试情况
Redis(二)—Redis通用命令
jmeter 函数助手 — — 随机值、随机字符串、 固定值随机提取
计算模型 FPS
Chain storage of stack
JVM monitoring and diagnostic tools - command line
If you don't know these four caching modes, dare you say you understand caching?
ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the
JMeter function assistant - random value, random string, fixed value random extraction
Redis(一)——初识Redis
JVM命令之- jmap:导出内存映像文件&内存使用情况
Array proof during st table preprocessing
Dc-7 target
vim映射大K