当前位置:网站首页>Data dimensionality reduction principal component analysis
Data dimensionality reduction principal component analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Principal component analysis is a linear combination of multiple indicators with certain correlation , Reduce the dimension by explaining as much information as possible in the original data with the least dimension , The reduced dimension variables are linearly independent of each other , The new variable finally determined is a linear combination of the original variables , And the proportion of the principal components in the variance is also smaller , The weaker the ability to synthesize the original information , Unlike factor analysis , Factor analysis is to use a few common factors to explain the relationships among many variables to be observed , It is not a reassembly of the original variables .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable )
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
Now a bank has a 100 User attribute data table of variables , It is necessary to minimize the loss rate of the original information of the data , Desensitization and dimensionality reduction .
4、 Modeling steps
Principal component analysis is the use of “ Dimension reduction ” thought , Put multiple indicators A multivariate statistical method transformed into a few comprehensive indicators , The synthesis here Indicators are principal components . Each principal component is a linear combination of original variables , Independent of each other , Most of the information of the original variables is retained . Its essence is through the correlation of original variables , Seek relevant changes The synthetic substitute of quantity , And the information loss in the transformation process is minimized .
Calculate the covariance matrix according to the standardized data set R:
Calculation of matrix R The eigenvalues of the λ1 ≥λ2 ≥…≥λn ≥ 0 And The corresponding eigenvectors u1 ,u2 ,…,un , among uj= (u1 j,u2 j, …,unj) ,u nj It means the first one j The second of the eigenvectors n Weight ; Composed of eigenvectors n A new indicator variable :
In style ,y 1 It's No 1 The principal components ,y 2 It's No 2 The principal components ,…,y n It's No n The principal components . Calculate the principal components y j Contribution rate b j(j=1,2,...,n) And y 1,y 2 ,…,y n (p ≤ n) Cumulative contribution rate .
边栏推荐
- Golang concurrent programming goroutine, channel, sync
- 【JVM调优实战100例】03——JVM堆调优四例
- Mysql高级篇学习总结8:InnoDB数据存储结构页的概述、页的内部结构、行格式
- 高级性能测试系列《24. 通过jdbc执行sql脚本》
- Why should we build an enterprise fixed asset management system and how can enterprises strengthen fixed asset management
- PHP-Parser羽毛球预约小程序开发require线上系统
- metric_ Logger urination
- 【JVM调优实战100例】02——虚拟机栈与本地方法栈调优五例
- Learning summary of MySQL advanced 6: concept and understanding of index, detailed explanation of b+ tree generation process, comparison between MyISAM and InnoDB
- Codeworks 5 questions per day (1700 average) - day 4
猜你喜欢
STM32G0 USB DFU 升级校验出错-2
高级性能测试系列《24. 通过jdbc执行sql脚本》
聊聊电商系统中红包活动设计
机器学习笔记 - 时间序列预测研究:法国香槟的月销量
How can retail enterprises open the second growth curve under the full link digital transformation
论文导读 | 机器学习在数据库基数估计中的应用
Use cheat engine to modify money, life and stars in Kingdom rush
【JVM调优实战100例】01——JVM的介绍与程序计数器
What is 9D movie like? (+ common sense of dimension space)
为什么要做企业固定资产管理系统,企业如何加强固定资产管理
随机推荐
Have you stepped on the nine common pits in the e-commerce system?
What is 9D movie like? (+ common sense of dimension space)
The difference between interceptor and filter
二进制操作
MySQL高级(进阶)SQL语句
页面标题组件
教程篇(5.0) 10. 故障排除 * FortiEDR * Fortinet 网络安全专家 NSE 5
Progress progress bar
《病人家属,请来一下》读书笔记
从list转化成map的时候,如果根据某一属性可能会导致key重复而异常,可以设置处理这种重复的方式
数字滚动带动画
PHP非对称加密方法私钥及公钥加密解密的方法
Develop fixed asset management system, what voice is used to develop fixed asset management system
How to print mybats log plug-in using XML file
Yolov3 trains its own data set to generate train txt
IEDA refactor的用法
metric_logger小解
Gstore weekly gstore source code analysis (4): black and white list configuration analysis of security mechanism
Web2.0的巨头纷纷布局VC,Tiger DAO VC或成抵达Web3捷径
Excel如何进行隔行复制粘贴