当前位置:网站首页>Data dimensionality reduction principal component analysis
Data dimensionality reduction principal component analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Principal component analysis is a linear combination of multiple indicators with certain correlation , Reduce the dimension by explaining as much information as possible in the original data with the least dimension , The reduced dimension variables are linearly independent of each other , The new variable finally determined is a linear combination of the original variables , And the proportion of the principal components in the variance is also smaller , The weaker the ability to synthesize the original information , Unlike factor analysis , Factor analysis is to use a few common factors to explain the relationships among many variables to be observed , It is not a reassembly of the original variables .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable )
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
Now a bank has a 100 User attribute data table of variables , It is necessary to minimize the loss rate of the original information of the data , Desensitization and dimensionality reduction .

4、 Modeling steps
Principal component analysis is the use of “ Dimension reduction ” thought , Put multiple indicators A multivariate statistical method transformed into a few comprehensive indicators , The synthesis here Indicators are principal components . Each principal component is a linear combination of original variables , Independent of each other , Most of the information of the original variables is retained . Its essence is through the correlation of original variables , Seek relevant changes The synthetic substitute of quantity , And the information loss in the transformation process is minimized .
Calculate the covariance matrix according to the standardized data set R:

Calculation of matrix R The eigenvalues of the λ1 ≥λ2 ≥…≥λn ≥ 0 And The corresponding eigenvectors u1 ,u2 ,…,un , among uj= (u1 j,u2 j, …,unj) ,u nj It means the first one j The second of the eigenvectors n Weight ; Composed of eigenvectors n A new indicator variable :

In style ,y 1 It's No 1 The principal components ,y 2 It's No 2 The principal components ,…,y n It's No n The principal components . Calculate the principal components y j Contribution rate b j(j=1,2,...,n) And y 1,y 2 ,…,y n (p ≤ n) Cumulative contribution rate
.


边栏推荐
- Introduction to the paper | application of machine learning in database cardinality estimation
- R语言dplyr包na_if函数把向量数值中的控制转化为缺失值NA、按照映射规则把指定内容转化为缺失值NA
- ORA-01455: converting column overflows integer datatype
- Have you stepped on the nine common pits in the e-commerce system?
- 线程应用实例
- 论文导读 | 机器学习在数据库基数估计中的应用
- Progress progress bar
- Kubernetes three open interfaces first sight
- Stm32g0 USB DFU upgrade verification error -2
- [daily question] first day
猜你喜欢

机器学习笔记 - 时间序列预测研究:法国香槟的月销量

Thread application instance

【JVM调优实战100例】01——JVM的介绍与程序计数器
![[fluent] dart data type (VaR data type | object data type)](/img/1b/fe2529af5f6663fad1fb7861f14ab5.jpg)
[fluent] dart data type (VaR data type | object data type)

Learning summary of MySQL advanced 6: concept and understanding of index, detailed explanation of b+ tree generation process, comparison between MyISAM and InnoDB

codeforces每日5题(均1700)-第四天

M2dgr: slam data set of multi-source and multi scene ground robot (ICRA 2022)

High frequency interview questions

Compile oglpg-9th-edition source code with clion

使用CLion编译OGLPG-9th-Edition源码
随机推荐
SIFT特征点提取「建议收藏」
The mybatieshelperpro tool can be generated to the corresponding project folder if necessary
PHP非对称加密方法私钥及公钥加密解密的方法
使用xml文件打印mybaties-log插件的方式
【ERP软件】ERP体系二次开发有哪些危险?
聊聊电商系统中红包活动设计
ICDE 2023|TKDE Poster Session(CFP)
Juypter notebook modify the default open folder and default browser
Stm32g0 USB DFU upgrade verification error -2
Page title component
使用 Cheat Engine 修改 Kingdom Rush 中的金钱、生命、星
仿京东放大镜效果(pink老师版)
C文件输入操作
教程篇(5.0) 10. 故障排除 * FortiEDR * Fortinet 網絡安全專家 NSE 5
PyTorch函数中的__call__和forward函数
Introduction to the paper | analysis and criticism of using the pre training language model as a knowledge base
[daily question] the next day
#gStore-weekly | gStore源码解析(四):安全机制之黑白名单配置解析
Excel查找一列中的相同值,删除该行或替换为空值
Excel finds the same value in a column, deletes the row or replaces it with a blank value