当前位置：网站首页>SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)

SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)

2022-06-11 01:26:00 【I have a clear idea】

Random sampling 30 A middle school student in a certain grade , Measure their height （X1）、 weight （X2）、 bust （X3） And sitting height （X4）, The data are shown in the table below ,

n	X1	X2	X3	X4	n	X1	X2	X3	X4
1	148	41	72	78	16	152	35	73	79
2	139	34	71	76	17	149	47	82	79
3	160	49	77	86	18	145	35	70	77
4	149	36	67	79	19	160	47	74	87
5	159	45	80	86	20	156	44	78	85
6	142	31	66	76	21	151	42	73	82
7	153	43	76	83	22	147	38	73	78
8	150	43	77	79	23	157	39	68	80
9	151	42	77	80	24	147	30	65	75
10	139	31	68	74	25	157	48	80	88
11	140	29	64	74	26	151	36	74	80
12	161	47	78	84	27	144	36	68	76
13	158	49	78	83	28	141	30	67	76
14	140	33	67	77	29	139	32	68	73
15	137	31	66	73	30	148	38	70	78

The trial SAS Program the following questions

Find the sample correlation matrix R And its eigenvalues and the corresponding orthogonal unitary eigenvectors ;
Calculate the principal components of the first two standardized samples and their cumulative contribution rate ;
Try to explain the body index data of middle school students .

Experimental code ：

proc import out=temp2                                                                                                                   
datafile="C:\Users\86166\Desktop\IT\SAS experiment \ experiment 7\1.xls"                                                                                
DBMS=EXCEL2000 replace;                                                                                                                 
run;                                                                                                                                    
/*proc standard mean=0 std=1 data=temp1 out=temp2;                                                                                      
var X1-X4;                                                                                                                              
run;*/                                                                                                                                  
proc princomp data=temp2 prefix=S out=temp3 outstat=temp4;/*std  (type=cov/corr)*/                                                                       
var X1-X4;                                                                                                                              
run;                                                                                                                                    
options ps=32 ls=85;                                                                                                                    
proc plot data=temp3;                                                                                                                   
plot S2*S1 $ n="*"/                                                                                                                     
href=-1 href=2 vref=0;/* On the abscissa S1=-1 and 2 Draw a vertical line at , In the ordinate Z2=0 Draw a vertical line at */   
/*title ' Principal component scatter plot ';*/                                                                 
run;                                                                                                                                    
proc sort data=temp3;   /* Press S1 The score of is sorted from small to large */                                                                                    
by S1;                                                                                                                                  
run;                                                                                                                                    
proc  print  data=temp3;                                                                                                                
var  n  S1  S2  X1-X4;                                                                                                          
run;
proc sort data=temp3;   /* Press S2 The score of is sorted from small to large */                                                                                    
by S2;                                                                                                                                  
run;                                                                                                                                    
proc  print  data=temp3;                                                                                                                
var  n  S1  S2  X1-X4;                                                                                                          
run;
proc  print  data=temp4;                                                                                                                                                                                                                        
run;

Experimental interpretation and analysis ：

1、 Its sample correlation matrix R And its eigenvalues and the corresponding orthogonal unitary eigenvectors ：

Correlation matrix ：

The eigenvalue ： Orthogonalized eigenvector ：

2、 The principal components of the first two standardized samples and their cumulative contribution rates ：

3、 Try to explain the body index data of middle school students ：

PRINCOMP The process starts from the correlation matrix to perform principal component analysis . By output 7.2.1 The eigenvalues of the correlation matrix in , The contribution rate of the first principal component has reached 88.53%; And the cumulative contribution rate of the first two principal components has reached 96.36%. Therefore, only two principal components can be used to summarize this set of data .

In addition, the third and fourth eigenvalues are approximate to 0, It can be concluded that 4 Standardized body index variables (Xi,i=1,2,3,4) There is an approximate linear relationship ( It is called collinearity ), Such as 

0.505747X1-0.690844X2+0.461488X3-0.232343X4≈c( constant ).

The first and second principal components can be written from the eigenvectors corresponding to the two largest eigenvalues ：

S1=0.4970X1+0.5146X2+0.4809X3+0.5069X4

S2=-0.5432X1+0.2102X2+0.7246X3-0.3683X4

Both the first and second principal components are standardized variables Xi(i=1,2,3,4) The linear combination of , And the combination coefficient is the component of the eigenvector .

Since the first principal component has approximately equal loads on all variables , Therefore, it can be considered that the first principal component is the total measurement of students' body . Each component of his eigenvector is in 0.5 near , And they are all positive , It reflects the burliness of the students . A tall student , His 4 The size of each part is relatively large ; And the short students , His 4 The size of each part is relatively small . So we call the first principal component the size factor .

Because the second principal component is in the first component of its eigenvector ( Height X1 The coefficient of ( load )) And the fourth component ( Sit high X4 The coefficient of ( load )) negative , And second ( Weight X2 The coefficient of ) And the third component ( Chest circumference X3 The coefficient of ) Positive value , The chest circumference X3 The load is the highest , So it directly reflects the students' fat and thin situation in the students' body , Therefore, the second principal component is called fat and thin factor .

from S1 The lower the score, the lower the degree of stature , That is, the shorter , As a student 11 Ranking first means that he is short and not big ; The higher the score, the reverse is true , As a student 25 Being at the end of the row means that he is tall , A large build .

Similarly, if according to S2 To sort , The lower the score, the thinner the student , The higher the score, the fatter he will be .（ This depends on the fact that the load of the factor is positive , So the higher the score , The higher the degree of factor fitting , So the score is high , Larger weight and bust , The fatter you get ）

PLOT Output graphics generated by the process , It can be seen from the figure that , According to the student's body size , this 30 Students should be divided into three groups ( Take the score of the first principal component as -1 and 2 It's the dividing point ).

Which students are included in each group can be known from the serial number next to each scatter point . More detailed information can be obtained from PRINT From the list of output data generated by the process .

In the above output list 30 The output results of observations reordered from small to large according to the first principal component . From here we can get more information about the students in each group when they are divided into three groups as follows :G1={11,15,29,10,28,6,24,14,2,27,18},G2={4,30,22,1,16,26,23,21,8,9,7,17},G3={20,13,19,12,5,3,25} If considered S1 ,S2 Clustering , This is the principal component clustering method . This experiment did not follow S2 To sort and print , You can also refer to the above methods for operation .

原网站

版权声明
本文为[I have a clear idea]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206110018150343.html

当前位置：网站首页>SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)

SAS principal component analysis (finding correlation matrix, eigenvalue, unit eigenvector, principal component expression, contribution rate and cumulative contribution rate, and data interpretation)

Experimental code ：

Experimental interpretation and analysis ：

边栏推荐

猜你喜欢

随机推荐

n	X1	X2	X3	X4	n	X1	X2	X3	X4
1	148	41	72	78	16	152	35	73	79
2	139	34	71	76	17	149	47	82	79
3	160	49	77	86	18	145	35	70	77
4	149	36	67	79	19	160	47	74	87
5	159	45	80	86	20	156	44	78	85
6	142	31	66	76	21	151	42	73	82
7	153	43	76	83	22	147	38	73	78
8	150	43	77	79	23	157	39	68	80
9	151	42	77	80	24	147	30	65	75
10	139	31	68	74	25	157	48	80	88
11	140	29	64	74	26	151	36	74	80
12	161	47	78	84	27	144	36	68	76
13	158	49	78	83	28	141	30	67	76
14	140	33	67	77	29	139	32	68	73
15	137	31	66	73	30	148	38	70	78

n	X1	X2	X3	X4	n	X1	X2	X3	X4
1	148	41	72	78	16	152	35	73	79
2	139	34	71	76	17	149	47	82	79
3	160	49	77	86	18	145	35	70	77
4	149	36	67	79	19	160	47	74	87
5	159	45	80	86	20	156	44	78	85
6	142	31	66	76	21	151	42	73	82
7	153	43	76	83	22	147	38	73	78
8	150	43	77	79	23	157	39	68	80
9	151	42	77	80	24	147	30	65	75
10	139	31	68	74	25	157	48	80	88
11	140	29	64	74	26	151	36	74	80
12	161	47	78	84	27	144	36	68	76
13	158	49	78	83	28	141	30	67	76
14	140	33	67	77	29	139	32	68	73
15	137	31	66	73	30	148	38	70	78

n	X1	X2	X3	X4	n	X1	X2	X3	X4
1	148	41	72	78	16	152	35	73	79
2	139	34	71	76	17	149	47	82	79
3	160	49	77	86	18	145	35	70	77
4	149	36	67	79	19	160	47	74	87
5	159	45	80	86	20	156	44	78	85
6	142	31	66	76	21	151	42	73	82
7	153	43	76	83	22	147	38	73	78
8	150	43	77	79	23	157	39	68	80
9	151	42	77	80	24	147	30	65	75
10	139	31	68	74	25	157	48	80	88
11	140	29	64	74	26	151	36	74	80
12	161	47	78	84	27	144	36	68	76
13	158	49	78	83	28	141	30	67	76
14	140	33	67	77	29	139	32	68	73
15	137	31	66	73	30	148	38	70	78