当前位置:网站首页>SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)
SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)
2022-06-11 01:26:00 【I have a clear idea】
Random sampling 30 A middle school student in a certain grade , Measure their height (X1)、 weight (X2)、 bust (X3) And sitting height (X4), The data are shown in the table below ,
n | X1 | X2 | X3 | X4 | n | X1 | X2 | X3 | X4 |
1 | 148 | 41 | 72 | 78 | 16 | 152 | 35 | 73 | 79 |
2 | 139 | 34 | 71 | 76 | 17 | 149 | 47 | 82 | 79 |
3 | 160 | 49 | 77 | 86 | 18 | 145 | 35 | 70 | 77 |
4 | 149 | 36 | 67 | 79 | 19 | 160 | 47 | 74 | 87 |
5 | 159 | 45 | 80 | 86 | 20 | 156 | 44 | 78 | 85 |
6 | 142 | 31 | 66 | 76 | 21 | 151 | 42 | 73 | 82 |
7 | 153 | 43 | 76 | 83 | 22 | 147 | 38 | 73 | 78 |
8 | 150 | 43 | 77 | 79 | 23 | 157 | 39 | 68 | 80 |
9 | 151 | 42 | 77 | 80 | 24 | 147 | 30 | 65 | 75 |
10 | 139 | 31 | 68 | 74 | 25 | 157 | 48 | 80 | 88 |
11 | 140 | 29 | 64 | 74 | 26 | 151 | 36 | 74 | 80 |
12 | 161 | 47 | 78 | 84 | 27 | 144 | 36 | 68 | 76 |
13 | 158 | 49 | 78 | 83 | 28 | 141 | 30 | 67 | 76 |
14 | 140 | 33 | 67 | 77 | 29 | 139 | 32 | 68 | 73 |
15 | 137 | 31 | 66 | 73 | 30 | 148 | 38 | 70 | 78 |
The trial SAS Program the following questions
- Find the sample correlation matrix R And its eigenvalues and the corresponding orthogonal unitary eigenvectors ;
- Calculate the principal components of the first two standardized samples and their cumulative contribution rate ;
- Try to explain the body index data of middle school students .
Experimental code :
proc import out=temp2
datafile="C:\Users\86166\Desktop\IT\SAS experiment \ experiment 7\1.xls"
DBMS=EXCEL2000 replace;
run;
/*proc standard mean=0 std=1 data=temp1 out=temp2;
var X1-X4;
run;*/
proc princomp data=temp2 prefix=S out=temp3 outstat=temp4;/*std (type=cov/corr)*/
var X1-X4;
run;
options ps=32 ls=85;
proc plot data=temp3;
plot S2*S1 $ n="*"/
href=-1 href=2 vref=0;/* On the abscissa S1=-1 and 2 Draw a vertical line at , In the ordinate Z2=0 Draw a vertical line at */
/*title ' Principal component scatter plot ';*/
run;
proc sort data=temp3; /* Press S1 The score of is sorted from small to large */
by S1;
run;
proc print data=temp3;
var n S1 S2 X1-X4;
run;
proc sort data=temp3; /* Press S2 The score of is sorted from small to large */
by S2;
run;
proc print data=temp3;
var n S1 S2 X1-X4;
run;
proc print data=temp4;
run;





Experimental interpretation and analysis :
1、 Its sample correlation matrix R And its eigenvalues and the corresponding orthogonal unitary eigenvectors :
Correlation matrix :
The eigenvalue :
Orthogonalized eigenvector :

2、 The principal components of the first two standardized samples and their cumulative contribution rates : 
3、 Try to explain the body index data of middle school students :
PRINCOMP The process starts from the correlation matrix to perform principal component analysis . By output 7.2.1 The eigenvalues of the correlation matrix in , The contribution rate of the first principal component has reached 88.53%; And the cumulative contribution rate of the first two principal components has reached 96.36%. Therefore, only two principal components can be used to summarize this set of data .
In addition, the third and fourth eigenvalues are approximate to 0, It can be concluded that 4 Standardized body index variables (Xi,i=1,2,3,4) There is an approximate linear relationship ( It is called collinearity ), Such as
0.505747X1-0.690844X2+0.461488X3-0.232343X4≈c( constant ).
The first and second principal components can be written from the eigenvectors corresponding to the two largest eigenvalues :
S1=0.4970X1+0.5146X2+0.4809X3+0.5069X4
S2=-0.5432X1+0.2102X2+0.7246X3-0.3683X4
Both the first and second principal components are standardized variables Xi(i=1,2,3,4) The linear combination of , And the combination coefficient is the component of the eigenvector .
Since the first principal component has approximately equal loads on all variables , Therefore, it can be considered that the first principal component is the total measurement of students' body . Each component of his eigenvector is in 0.5 near , And they are all positive , It reflects the burliness of the students . A tall student , His 4 The size of each part is relatively large ; And the short students , His 4 The size of each part is relatively small . So we call the first principal component the size factor .
Because the second principal component is in the first component of its eigenvector ( Height X1 The coefficient of ( load )) And the fourth component ( Sit high X4 The coefficient of ( load )) negative , And second ( Weight X2 The coefficient of ) And the third component ( Chest circumference X3 The coefficient of ) Positive value , The chest circumference X3 The load is the highest , So it directly reflects the students' fat and thin situation in the students' body , Therefore, the second principal component is called fat and thin factor .
from S1 The lower the score, the lower the degree of stature , That is, the shorter , As a student 11 Ranking first means that he is short and not big ; The higher the score, the reverse is true , As a student 25 Being at the end of the row means that he is tall , A large build .
Similarly, if according to S2 To sort , The lower the score, the thinner the student , The higher the score, the fatter he will be .( This depends on the fact that the load of the factor is positive , So the higher the score , The higher the degree of factor fitting , So the score is high , Larger weight and bust , The fatter you get )
PLOT Output graphics generated by the process , It can be seen from the figure that , According to the student's body size , this 30 Students should be divided into three groups ( Take the score of the first principal component as -1 and 2 It's the dividing point ).
Which students are included in each group can be known from the serial number next to each scatter point . More detailed information can be obtained from PRINT From the list of output data generated by the process .
In the above output list 30 The output results of observations reordered from small to large according to the first principal component . From here we can get more information about the students in each group when they are divided into three groups as follows :G1={11,15,29,10,28,6,24,14,2,27,18},G2={4,30,22,1,16,26,23,21,8,9,7,17},G3={20,13,19,12,5,3,25} If considered S1 ,S2 Clustering , This is the principal component clustering method . This experiment did not follow S2 To sort and print , You can also refer to the above methods for operation .
边栏推荐
- 2022北京怀柔区新技术新产品(服务)认定要求
- 云呐|庆远固定资产管理及条码盘点系统
- 北京延庆区高新技术企业培育支持标准,补贴10万
- Teach you the front and back separation architecture (V) system authentication implementation
- 快递鸟系统对接
- Some tips for programmers to deal with stress
- Current limiting and download interface request number control
- One way linked list to realize student information management
- Understanding of multithreading
- Some idle gossip
猜你喜欢

什么是C端 什么是B端 这里告诉你

多兴趣召回模型实践|得物技术

Function of barcode fixed assets management system, barcode management of fixed assets

IRS应用发布之十六:H5 应用设计指南

焱融看|混合云环境下,如何实现数据湖最优存储解决方案

nodejs中使用mySql数据库

Teach you the front and back separation architecture (V) system authentication implementation

网络基础(1)-----认识网络

云呐|庆远固定资产管理及条码盘点系统

中间件_Redis_06_Redis的事务
随机推荐
How to write this with data and proc without SQL
Sealem finance builds Web3 decentralized financial platform infrastructure
Embedded learning materials and project summary
How about compound interest insurance and financial products? Can I buy it?
Introduction to the application process of China Patent Award, with a subsidy of 1million yuan
Set up a flag -- Reconstruct promise
Beijing Yanqing District high tech enterprise cultivation support standard, with a subsidy of 100000 yuan
Web3 ecological decentralized financial platform sealem Finance
Team management | how to improve the thinking skills of technical leaders?
ava. Lang.noclassdeffounderror: org/apache/velocity/context/context solution
87. (leaflet house) leaflet military plotting - straight arrow modification
Support standard for cultivation of high-tech enterprises in Changping District, Beijing, with a subsidy of 100000 yuan
北京延庆区高新技术企业培育支持标准,补贴10万
Beijing Dongcheng District high tech enterprise cultivation support standard, with a subsidy of 100000 yuan
中国专利奖奖金多少,补贴100万
[paper reading] boostmis: boosting medical image semi supervised learning with adaptive pseudolabeling
Introduction to the subsidy fund for leading technological innovation of Beijing enterprises, with a subsidy of 5million yuan
Beijing Mentougou District high tech enterprise cultivation support standard, with a subsidy of 100000 yuan
The emperors of the Ming Dynasty
关于mobx