当前位置:网站首页>Data dimensionality reduction principal component analysis
Data dimensionality reduction principal component analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Principal component analysis is a linear combination of multiple indicators with certain correlation , Reduce the dimension by explaining as much information as possible in the original data with the least dimension , The reduced dimension variables are linearly independent of each other , The new variable finally determined is a linear combination of the original variables , And the proportion of the principal components in the variance is also smaller , The weaker the ability to synthesize the original information , Unlike factor analysis , Factor analysis is to use a few common factors to explain the relationships among many variables to be observed , It is not a reassembly of the original variables .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable )
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
Now a bank has a 100 User attribute data table of variables , It is necessary to minimize the loss rate of the original information of the data , Desensitization and dimensionality reduction .

4、 Modeling steps
Principal component analysis is the use of “ Dimension reduction ” thought , Put multiple indicators A multivariate statistical method transformed into a few comprehensive indicators , The synthesis here Indicators are principal components . Each principal component is a linear combination of original variables , Independent of each other , Most of the information of the original variables is retained . Its essence is through the correlation of original variables , Seek relevant changes The synthetic substitute of quantity , And the information loss in the transformation process is minimized .
Calculate the covariance matrix according to the standardized data set R:

Calculation of matrix R The eigenvalues of the λ1 ≥λ2 ≥…≥λn ≥ 0 And The corresponding eigenvectors u1 ,u2 ,…,un , among uj= (u1 j,u2 j, …,unj) ,u nj It means the first one j The second of the eigenvectors n Weight ; Composed of eigenvectors n A new indicator variable :

In style ,y 1 It's No 1 The principal components ,y 2 It's No 2 The principal components ,…,y n It's No n The principal components . Calculate the principal components y j Contribution rate b j(j=1,2,...,n) And y 1,y 2 ,…,y n (p ≤ n) Cumulative contribution rate
.


边栏推荐
- 数据降维——因子分析
- R语言dplyr包filter函数筛选dataframe数据、如果需要筛选的数据列(变量)名称中包含引号则需要使用!!sym语法处理、否则因为无法处理引号筛选不到任何数据
- 学习八股文的知识点~~1
- Golang并发编程——goroutine、channel、sync
- [fluent] dart data type (VaR data type | object data type)
- reduce--遍历元素计算 具体的计算公式需要传入 结合BigDecimal
- [100 cases of JVM tuning practice] 02 - five cases of virtual machine stack and local method stack tuning
- yolov3 训练自己的数据集之生成train.txt
- R语言使用epiDisplay包的lrtest函数对多个glm模型(logisti回归)执行似然比检验(Likelihood ratio test)对比两个模型的性能是否有差异、广义线性模型的似然比检
- Talk about the design of red envelope activities in e-commerce system
猜你喜欢
![[0701] [论文阅读] Alleviating Data Imbalance Issue with Perturbed Input During Inference](/img/c7/9b7dc4b4bda4ecfe07aec1367fe059.png)
[0701] [论文阅读] Alleviating Data Imbalance Issue with Perturbed Input During Inference

Thread application instance

为什么要做企业固定资产管理系统,企业如何加强固定资产管理

How can retail enterprises open the second growth curve under the full link digital transformation

Hospital online inquiry source code hospital video inquiry source code hospital applet source code
![[paper reading] Ca net: leveraging contextual features for lung cancer prediction](/img/ef/bb48ee88d5dc6fe876a498ab53106e.png)
[paper reading] Ca net: leveraging contextual features for lung cancer prediction

思维意识转变是施工企业数字化转型成败的关键

论文导读 | 机器学习在数据库基数估计中的应用

教程篇(5.0) 10. 故障排除 * FortiEDR * Fortinet 網絡安全專家 NSE 5

PHP-Parser羽毛球预约小程序开发require线上系统
随机推荐
使用xml文件打印mybaties-log插件的方式
Tutorial (5.0) 09 Restful API * fortiedr * Fortinet network security expert NSE 5
Hospital online inquiry source code hospital video inquiry source code hospital applet source code
Preprocessing and preprocessing macros
Emmet basic syntax
Mysql高级篇学习总结7:Mysql数据结构-Hash索引、AVL树、B树、B+树的对比
性能测试如何创造业务价值
PHP非对称加密方法私钥及公钥加密解密的方法
Fastdfs installation
为什么要做企业固定资产管理系统,企业如何加强固定资产管理
机器学习笔记 - 时间序列预测研究:法国香槟的月销量
[paper reading] Ca net: leveraging contextual features for lung cancer prediction
仿京东放大镜效果(pink老师版)
A4988驱动步进电机「建议收藏」
[fluent] dart data type (VaR data type | object data type)
STM32G0 USB DFU 升级校验出错-2
QT中的QPropertyAnimation使用和toast案列
2022软件工程期末考试 回忆版
R language uses the lsnofunction function function of epidisplay package to list all objects in the current space, except user-defined function objects
ICDE 2023|TKDE Poster Session(CFP)