当前位置:网站首页>Data dimensionality reduction factor analysis
Data dimensionality reduction factor analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Factor analysis is based on the idea of dimension reduction , In the case of no or less loss of original data information as far as possible , The complex variables are aggregated into a few independent common factors , These common factors can reflect the main information of many variables , While reducing the number of variables , It also reflects the internal relationship between variables . Generally, factor analysis has three functions : One is to reduce the dimension of factors , Second, calculate the factor weight , Third, calculate the weighted calculation factor to summarize the comprehensive score .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable ).
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
According to the region 2021 Per capita in GDP、 Per capita disposable income and other indicators , Quantitatively evaluate the ranking of economic development level of multiple provinces, cities and regions or the weight of each index
4、 Modeling steps
Factor analysis is a method of reducing multidimensional variables to a few common factors according to the correlation between variables , Then the multidimensional variable statistical analysis method is analyzed . The basic idea is to divide the original variables into two parts : One part is the linear combination of common factors , Condensing represents most of the information in the original variables ; The other part is the special factor which has nothing to do with the common factor , It reflects the linear combination of common factors and original variables The gap between .p Dimension variable The factor analysis model is :
Or as
among f =[f 1 ,f 2 ,…,f m ]T namely by carry take Of Male common because Son towards The amount , generation surface 了 primary beginning change The amount in No can straight Pick up view measuring but customer view save stay Of m (m <p) Three mutually independent common influencing factors ;A=() Is the factor load matrix , matrix Elements aik by change The amount x i Yes Male common because Son fk The load of , It reflects the correlation coefficient between the two , The greater the absolute value , The more relevant ;
For multidimensional variables x The key to establish the factor analysis model is to solve the factor load matrix A And the common factor vector f , The steps are as follows :
1. In order to eliminate the influence of different dimensions of variables , To contain n individual p Samples of dimensional variables X=[x1 ,x2 ,…,xn ] Standardize . After standardization , The mean value of each variable is 0, The variance of 1. For the convenience of expression, the standardized variables are still used X Express , Its elements are :
2. Find the covariance matrix of the sample S, Its elements are :
3. For the sample covariance matrix S Do eigenvalue decomposition , obtain p Eigenvalues λ1 ≥λ2≥…≥λp ≥0, The corresponding eigenvalue vector is γ1 , γ2 ,…,γp , Before taking it m The eigenvector of the largest eigenvalue estimates the factor load matrix . At the same time, in order to ensure the variance of each component of the common factor vector by 1, Divide it by the corresponding standard deviation λj . The corresponding eigenvector in the factor load matrix γj Then multiply by λj . Therefore, the factor load matrix
The parameter m Determined by the cumulative variance contribution rate of common factors , namely
It is generally believed , At present m The cumulative variance contribution rate of common factors exceeds 90% when , It can be considered that before m The linear combination of common factors can basically restore the original variable information .
Common factor vector f , That is, the specific score of the original variable on the common factor can be estimated by regression method
Go through the above steps , After obtaining the factor load matrix and the common factor vector , Then we can get that the special factor vector of the original variable is :
边栏推荐
- STM32G0 USB DFU 升级校验出错-2
- Memory management of C
- Learn the knowledge points of eight part essay ~ ~ 1
- 预处理和预处理宏
- Kubernetes three open interfaces first sight
- 使用xml文件打印mybaties-log插件的方式
- NPOI导出Excel2007
- Processing strategy of message queue message loss and repeated message sending
- Compile oglpg-9th-edition source code with clion
- [0701] [论文阅读] Alleviating Data Imbalance Issue with Perturbed Input During Inference
猜你喜欢
医院在线问诊源码 医院视频问诊源码 医院小程序源码
新手必看,点击两个按钮切换至不同的内容
According to the atlas of data security products and services issued by the China Academy of information technology, meichuang technology has achieved full coverage of four major sectors
Hospital online inquiry source code hospital video inquiry source code hospital applet source code
IEDA refactor的用法
论文导读 | 机器学习在数据库基数估计中的应用
[100 cases of JVM tuning practice] 03 -- four cases of JVM heap tuning
Learning summary of MySQL advanced 6: concept and understanding of index, detailed explanation of b+ tree generation process, comparison between MyISAM and InnoDB
教程篇(5.0) 10. 故障排除 * FortiEDR * Fortinet 網絡安全專家 NSE 5
Machine learning notes - time series prediction research: monthly sales of French champagne
随机推荐
2022编译原理期末考试 回忆版
【JVM调优实战100例】02——虚拟机栈与本地方法栈调优五例
ICDE 2023|TKDE Poster Session(CFP)
聊聊电商系统中红包活动设计
数据降维——因子分析
Markdown基础语法
Emmet基础语法
Imitation Jingdong magnifying glass effect (pink teacher version)
云呐|为什么要用固定资产管理系统,怎么启用固定资产管理系统
数据降维——主成分分析
C的内存管理
Codeworks 5 questions per day (1700 average) - day 4
教程篇(5.0) 10. 故障排除 * FortiEDR * Fortinet 网络安全专家 NSE 5
golang:[]byte转string
Reduce -- traverse element calculation. The specific calculation formula needs to be passed in and combined with BigDecimal
高级性能测试系列《24. 通过jdbc执行sql脚本》
【JVM调优实战100例】01——JVM的介绍与程序计数器
R语言ggplot2可视化:gganimate包创建动态柱状图动画(gif)、使用transition_states函数在动画中沿给定维度逐步显示柱状图
How to play when you travel to Bangkok for the first time? Please keep this money saving strategy
Yolov3 trains its own data set to generate train txt