当前位置:网站首页>Data dimensionality reduction principal component analysis
Data dimensionality reduction principal component analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Principal component analysis is a linear combination of multiple indicators with certain correlation , Reduce the dimension by explaining as much information as possible in the original data with the least dimension , The reduced dimension variables are linearly independent of each other , The new variable finally determined is a linear combination of the original variables , And the proportion of the principal components in the variance is also smaller , The weaker the ability to synthesize the original information , Unlike factor analysis , Factor analysis is to use a few common factors to explain the relationships among many variables to be observed , It is not a reassembly of the original variables .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable )
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
Now a bank has a 100 User attribute data table of variables , It is necessary to minimize the loss rate of the original information of the data , Desensitization and dimensionality reduction .

4、 Modeling steps
Principal component analysis is the use of “ Dimension reduction ” thought , Put multiple indicators A multivariate statistical method transformed into a few comprehensive indicators , The synthesis here Indicators are principal components . Each principal component is a linear combination of original variables , Independent of each other , Most of the information of the original variables is retained . Its essence is through the correlation of original variables , Seek relevant changes The synthetic substitute of quantity , And the information loss in the transformation process is minimized .
Calculate the covariance matrix according to the standardized data set R:

Calculation of matrix R The eigenvalues of the λ1 ≥λ2 ≥…≥λn ≥ 0 And The corresponding eigenvectors u1 ,u2 ,…,un , among uj= (u1 j,u2 j, …,unj) ,u nj It means the first one j The second of the eigenvectors n Weight ; Composed of eigenvectors n A new indicator variable :

In style ,y 1 It's No 1 The principal components ,y 2 It's No 2 The principal components ,…,y n It's No n The principal components . Calculate the principal components y j Contribution rate b j(j=1,2,...,n) And y 1,y 2 ,…,y n (p ≤ n) Cumulative contribution rate
.


边栏推荐
- #gStore-weekly | gStore源码解析(四):安全机制之黑白名单配置解析
- GMapping代码解析[通俗易懂]
- In pytorch function__ call__ And forward functions
- 守望先锋世界观架构 ——(一款好的游戏是怎么来的)
- M2dgr: slam data set of multi-source and multi scene ground robot (ICRA 2022)
- [daily question] the next day
- Yunna | why use the fixed asset management system and how to enable it
- R语言ggplot2可视化:gganimate包创建动态柱状图动画(gif)、使用transition_states函数在动画中沿给定维度逐步显示柱状图
- R language dplyr package filter function filters dataframe data. If the name of the data column (variable) to be filtered contains quotation marks, you need to use!! SYM syntax processing, otherwise n
- SIFT特征点提取「建议收藏」
猜你喜欢

MySQL advanced learning summary 8: overview of InnoDB data storage structure page, internal structure of page, row format

Markdown basic grammar

9D电影是怎样的?(+维度空间常识)

新手必看,点击两个按钮切换至不同的内容

数据降维——因子分析

M2dgr: slam data set of multi-source and multi scene ground robot (ICRA 2022)

全链路数字化转型下,零售企业如何打开第二增长曲线

Talk about the design of red envelope activities in e-commerce system

Transformation of thinking consciousness is the key to the success or failure of digital transformation of construction enterprises

仿京东放大镜效果(pink老师版)
随机推荐
论文导读 | 机器学习在数据库基数估计中的应用
电商系统中常见的 9 大坑,你踩过没?
2022 software engineering final exam recall Edition
According to the atlas of data security products and services issued by the China Academy of information technology, meichuang technology has achieved full coverage of four major sectors
Excel finds the same value in a column, deletes the row or replaces it with a blank value
Talk about the design of red envelope activities in e-commerce system
MySQL advanced learning summary 8: overview of InnoDB data storage structure page, internal structure of page, row format
reduce--遍历元素计算 具体的计算公式需要传入 结合BigDecimal
metric_logger小解
仿京东放大镜效果(pink老师版)
【测试开发】一文带你了解什么是软件测试
潇洒郎:彻底解决Markdown图片问题——无需上传图片——无需网络——转发给他人图片无缺失
思维意识转变是施工企业数字化转型成败的关键
开发固定资产管理系统,开发固定资产管理系统用什么语音
拦截器与过滤器的区别
9D电影是怎样的?(+维度空间常识)
高频面试题
Emmet基础语法
新手必看,点击两个按钮切换至不同的内容
Crypto usage in nodejs