当前位置:网站首页>[learning notes] Introduction to principal component analysis
[learning notes] Introduction to principal component analysis
2022-06-28 20:15:00 【Burger jingle】
One : brief introduction
Principal component analysis (PCA) is a multivariate statistical analysis method that transforms multiple indicators into several comprehensive indicators by reducing the dimension , Use fewer variables to explain most of the information in the original data .
Two : Basic principles and procedures
1. The basic principle : A little .
It should be noted that :
(1) The results of principal component analysis are influenced by dimensions , Dimensionless treatment is required first , Then the covariance or correlation coefficient matrix is used for analysis .
(2) The principal components selected in the actual study do not exceed 6 individual , Only need to contribute more than 85% that will do .
2. Basic steps :

3、 ... and : Example

# Program files Pex11_7.py
import numpy as np
from sklearn.decomposition import PCA
a=np.loadtxt("Pdata11_7.txt")
b=np.r_[a[:,1:4],a[:,-3:]] # Construct a data matrix
# #np.r_ It's connecting two matrices in columns , That is to add two matrices up and down , The number of columns is required to be equal .‘
# So the method of constructing the matrix above will be clear .
md=PCA().fit(b) # Build and train models
print(" The characteristic value is :",md.explained_variance_)
print(" Contribution rate of each principal component :",md.explained_variance_ratio_)
print(" The singular value is :",md.singular_values_)
print(" Coefficient of each principal component :\n",md.components_) # Each line is a principal component
""" Next, calculate the eigenvalue and eigenvector directly , Compare with library function """
cf=np.cov(b.T) # Calculate the covariance matrix
c,d=np.linalg.eig(cf) # Find eigenvalues and eigenvectors
print(" The characteristic value is :",c)
print(" The eigenvector is :\n",d)
print(" The contribution rate of each principal component is :",c/np.sum(c)) 
As shown above : Use library functions PV=CA when , The sign of the principal component is uncontrollable , So we calculate eigenvalues and eigenvectors directly .
meanwhile :PCA Principal component analysis using covariance matrix . We can also choose the correlation coefficient matrix . This is equivalent to standardizing the data .
Four : Application of principal component analysis
General steps for comprehensive evaluation :

example 11.8
# Program files Pex11_8.py
import numpy as np
from scipy.stats import zscore
a=np.loadtxt("Pdata11_8.txt")
print(" The correlation coefficient matrix is :\n",np.corrcoef(a.T))
b=np.delete(a,0,axis=1) # Delete the first 1 Column data
# This is because r12=r21=1, therefore x1,x2 Completely linear correlation , Consistent selection of indicators , Just keep one
c=zscore(b); r=np.corrcoef(c.T) # Data standardization and calculation of correlation coefficient matrix
d,e=np.linalg.eig(r) # Find eigenvalues and eigenvectors
rate=d/d.sum() # Calculate the contribution rate of each principal component
print(" The characteristic value is :",d)
print(" The eigenvector is :\n",e)
print(" The contribution rate of each principal component is :",rate)
k=1; # Put forward the number of principal components
F=e[:,:k]; score_mat=c.dot(F) # Calculate the principal component score matrix
score1=score_mat.dot(rate[0:k]) # Calculate the score of each evaluation object
score2=-score1 # By observation , Adjust the sign of the score
# This is mainly because the previous eigenvectors do not take the opposite number . If you take this, you don't have to take the opposite
print(" The score of each evaluation object is :",score2)
index=score1.argsort()+1 # The position of each element after sorting in the original array
print(" The number of each city from high to low is :",index)Note here : First observe the correlation coefficient matrix , If any non diagonal element is 1, It means that the variable selection is unreasonable , The two are completely linear , An indicator should be deleted .

Be careful : On the principal component score matrix and the score of the evaluation object , As shown in the figure below . Because the title is benefit index value , So the higher the evaluation value, the higher the ranking . Otherwise, the ranking will be reversed .
边栏推荐
- C#应用程序界面开发基础——窗体控制
- Day88.七牛云: 房源图片、用户头像上传
- Software supply chain security risk guide for enterprise digitalization and it executives
- 实型数运算
- resilience4j 重试源码分析以及重试指标采集
- odoo15 Module operations are not possible at this time, please try again later or contact your syste
- Rsync remote synchronization
- Is it safe for CICC fortune to open an account? Let's talk about CICC fortune
- 03.hello_ rust
- How to "calculate" in the age of computing power? The first mover advantage of "convergence of computing and networking" is very important!
猜你喜欢

Markdown Mermaid planting grass (1)_ Introduction to Mermaid

2022 welder (elementary) special operation certificate examination question bank and answers

Number theory -- detailed proof of Euler function, sieve method for Euler function, Euler theorem and Fermat theorem

2022 P cylinder filling test exercises and online simulation test

Severity code description project file line suppress status error lnk2038 detected a mismatch of "runtimelibrary": the value "md\u dynamicrelease" does not match the value "mdd\u dynamicde"

Leetcode 36. 有效的数独(可以,一次过)

SQL server2019 create a new SQL server authentication user name and log in

Windows 64 bit download install my SQL

Xiaobai's e-commerce business is very important to choose the right mall system!

【324. 摆动排序 II】
随机推荐
internship:术语了解及着手写接口
2022 tea master (intermediate) examination simulated 100 questions and simulated examination
压缩与解压缩命令
【学习笔记】主成分分析法介绍
csdn涨薪技术-Selenium自动化测试全栈总结
Is it safe for CICC fortune to open an account? Let's talk about CICC fortune
Leetcode 36. 有效的数独(可以,一次过)
3. 整合 Listener
Grep text search tool
各种类型长
Input and output character data
修复一次flutter 无法选中模拟器
Windows 64 bit download install my SQL
The severity code indicates that the project file line prohibits the display of status errors. C1047 object or library file ".Lib" is different from other objects (such as "x64\release\main.obj")
odoo15 Module operations are not possible at this time, please try again later or contact your syste
2342
30讲 线性代数 第四讲 线性方程组
Markdown Mermaid Grass (1) Introduction à Mermaid
head、tail查看文件
How to analyze the relationship between enterprise digital transformation and data asset management?