当前位置:网站首页>SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)
SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)
2022-06-11 01:26:00 【I have a clear idea】
Random sampling 30 A middle school student in a certain grade , Measure their height (X1)、 weight (X2)、 bust (X3) And sitting height (X4), The data are shown in the table below ,
n | X1 | X2 | X3 | X4 | n | X1 | X2 | X3 | X4 |
1 | 148 | 41 | 72 | 78 | 16 | 152 | 35 | 73 | 79 |
2 | 139 | 34 | 71 | 76 | 17 | 149 | 47 | 82 | 79 |
3 | 160 | 49 | 77 | 86 | 18 | 145 | 35 | 70 | 77 |
4 | 149 | 36 | 67 | 79 | 19 | 160 | 47 | 74 | 87 |
5 | 159 | 45 | 80 | 86 | 20 | 156 | 44 | 78 | 85 |
6 | 142 | 31 | 66 | 76 | 21 | 151 | 42 | 73 | 82 |
7 | 153 | 43 | 76 | 83 | 22 | 147 | 38 | 73 | 78 |
8 | 150 | 43 | 77 | 79 | 23 | 157 | 39 | 68 | 80 |
9 | 151 | 42 | 77 | 80 | 24 | 147 | 30 | 65 | 75 |
10 | 139 | 31 | 68 | 74 | 25 | 157 | 48 | 80 | 88 |
11 | 140 | 29 | 64 | 74 | 26 | 151 | 36 | 74 | 80 |
12 | 161 | 47 | 78 | 84 | 27 | 144 | 36 | 68 | 76 |
13 | 158 | 49 | 78 | 83 | 28 | 141 | 30 | 67 | 76 |
14 | 140 | 33 | 67 | 77 | 29 | 139 | 32 | 68 | 73 |
15 | 137 | 31 | 66 | 73 | 30 | 148 | 38 | 70 | 78 |
The trial SAS Program the following questions
- Find the sample correlation matrix R And its eigenvalues and the corresponding orthogonal unitary eigenvectors ;
- Calculate the principal components of the first two standardized samples and their cumulative contribution rate ;
- Try to explain the body index data of middle school students .
Experimental code :
proc import out=temp2
datafile="C:\Users\86166\Desktop\IT\SAS experiment \ experiment 7\1.xls"
DBMS=EXCEL2000 replace;
run;
/*proc standard mean=0 std=1 data=temp1 out=temp2;
var X1-X4;
run;*/
proc princomp data=temp2 prefix=S out=temp3 outstat=temp4;/*std (type=cov/corr)*/
var X1-X4;
run;
options ps=32 ls=85;
proc plot data=temp3;
plot S2*S1 $ n="*"/
href=-1 href=2 vref=0;/* On the abscissa S1=-1 and 2 Draw a vertical line at , In the ordinate Z2=0 Draw a vertical line at */
/*title ' Principal component scatter plot ';*/
run;
proc sort data=temp3; /* Press S1 The score of is sorted from small to large */
by S1;
run;
proc print data=temp3;
var n S1 S2 X1-X4;
run;
proc sort data=temp3; /* Press S2 The score of is sorted from small to large */
by S2;
run;
proc print data=temp3;
var n S1 S2 X1-X4;
run;
proc print data=temp4;
run;





Experimental interpretation and analysis :
1、 Its sample correlation matrix R And its eigenvalues and the corresponding orthogonal unitary eigenvectors :
Correlation matrix :
The eigenvalue :
Orthogonalized eigenvector :

2、 The principal components of the first two standardized samples and their cumulative contribution rates : 
3、 Try to explain the body index data of middle school students :
PRINCOMP The process starts from the correlation matrix to perform principal component analysis . By output 7.2.1 The eigenvalues of the correlation matrix in , The contribution rate of the first principal component has reached 88.53%; And the cumulative contribution rate of the first two principal components has reached 96.36%. Therefore, only two principal components can be used to summarize this set of data .
In addition, the third and fourth eigenvalues are approximate to 0, It can be concluded that 4 Standardized body index variables (Xi,i=1,2,3,4) There is an approximate linear relationship ( It is called collinearity ), Such as
0.505747X1-0.690844X2+0.461488X3-0.232343X4≈c( constant ).
The first and second principal components can be written from the eigenvectors corresponding to the two largest eigenvalues :
S1=0.4970X1+0.5146X2+0.4809X3+0.5069X4
S2=-0.5432X1+0.2102X2+0.7246X3-0.3683X4
Both the first and second principal components are standardized variables Xi(i=1,2,3,4) The linear combination of , And the combination coefficient is the component of the eigenvector .
Since the first principal component has approximately equal loads on all variables , Therefore, it can be considered that the first principal component is the total measurement of students' body . Each component of his eigenvector is in 0.5 near , And they are all positive , It reflects the burliness of the students . A tall student , His 4 The size of each part is relatively large ; And the short students , His 4 The size of each part is relatively small . So we call the first principal component the size factor .
Because the second principal component is in the first component of its eigenvector ( Height X1 The coefficient of ( load )) And the fourth component ( Sit high X4 The coefficient of ( load )) negative , And second ( Weight X2 The coefficient of ) And the third component ( Chest circumference X3 The coefficient of ) Positive value , The chest circumference X3 The load is the highest , So it directly reflects the students' fat and thin situation in the students' body , Therefore, the second principal component is called fat and thin factor .
from S1 The lower the score, the lower the degree of stature , That is, the shorter , As a student 11 Ranking first means that he is short and not big ; The higher the score, the reverse is true , As a student 25 Being at the end of the row means that he is tall , A large build .
Similarly, if according to S2 To sort , The lower the score, the thinner the student , The higher the score, the fatter he will be .( This depends on the fact that the load of the factor is positive , So the higher the score , The higher the degree of factor fitting , So the score is high , Larger weight and bust , The fatter you get )
PLOT Output graphics generated by the process , It can be seen from the figure that , According to the student's body size , this 30 Students should be divided into three groups ( Take the score of the first principal component as -1 and 2 It's the dividing point ).
Which students are included in each group can be known from the serial number next to each scatter point . More detailed information can be obtained from PRINT From the list of output data generated by the process .
In the above output list 30 The output results of observations reordered from small to large according to the first principal component . From here we can get more information about the students in each group when they are divided into three groups as follows :G1={11,15,29,10,28,6,24,14,2,27,18},G2={4,30,22,1,16,26,23,21,8,9,7,17},G3={20,13,19,12,5,3,25} If considered S1 ,S2 Clustering , This is the principal component clustering method . This experiment did not follow S2 To sort and print , You can also refer to the above methods for operation .
边栏推荐
- Beijing Dongcheng District high tech enterprise cultivation support standard, with a subsidy of 100000 yuan
- SAS因子分析(proc factor过程和因子旋转以及回归法求因子得分函数)
- SAS聚类分析(系统聚类cluster,动态聚类fastclus,变量聚类varclus)
- How much is the bonus of China Patent Award, with a subsidy of 1million yuan
- Recruitment | Nanjing | triostudio Sanli Agency - Interior Designer / construction drawing deepening Designer / device / Product Designer / Intern, etc
- 多兴趣召回模型实践|得物技术
- Introduction to prefix, infix and suffix expressions (code implementation of inverse Polish calculator)
- Inventory management and strategy mode
- Introduction to the application process of China Patent Award, with a subsidy of 1million yuan
- 2022年高考加油 那年我的高考如此兴奋
猜你喜欢

Docking of express bird system

項目_基於網絡爬蟲的疫情數據可視化分析

Inventory management and strategy mode
![[introduction to ROS] - 01 introduction to ROS](/img/6f/67ebb4336b6f7b3a1076b09d871b8e.png)
[introduction to ROS] - 01 introduction to ROS

Yunna PDA wireless fixed assets inventory management system

Sealem Finance打造Web3去中心化金融平台基础设施
WSL automatically updates the IP hosts file

Current limiting and download interface request number control

网络基础(1)-----认识网络

项目_基于网络爬虫的疫情数据可视化分析
随机推荐
Store binary tree in sequence [store tree in array]
SAS期末复习知识点总结(应用多元统计实验笔记)
Network foundation (1) -- understanding the network
函数的节流和防抖
【VBA脚本】提取word文档中所有批注的信息和待解决状态
Beijing Tongzhou District high tech enterprise cultivation support standard, with a subsidy of 100000 yuan
部分 力扣 LeetCode 中的SQL刷题整理
深圳市南山区专精特新企业申报条件,补贴10-50万
Beijing Pinggu District high tech enterprise cultivation support standard, with a subsidy of 100000 yuan
Why can't Google search page infinite?
Direct insert sort and shell sort
明朝的那些皇帝
Docking of express bird system
[paper reading] tganet: text guided attention for improved polyp segmentation
Introduction to support standards for specialized, special and new manufacturing enterprises in Chaoyang District, Beijing, with a subsidy of 1million yuan
中间件_Redis_06_Redis的事务
After a/b machine is connected normally, B machine suddenly restarts. Ask a what is the TCP status at this time? How to eliminate this state in the server program?
深圳中国专利奖申报流程介绍,补贴100万
程序员应对压力的几个小窍门
深圳市南山区专精特新企业申报流程,补贴10-50万