当前位置:网站首页>Data dimensionality reduction factor analysis
Data dimensionality reduction factor analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Factor analysis is based on the idea of dimension reduction , In the case of no or less loss of original data information as far as possible , The complex variables are aggregated into a few independent common factors , These common factors can reflect the main information of many variables , While reducing the number of variables , It also reflects the internal relationship between variables . Generally, factor analysis has three functions : One is to reduce the dimension of factors , Second, calculate the factor weight , Third, calculate the weighted calculation factor to summarize the comprehensive score .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable ).
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
According to the region 2021 Per capita in GDP、 Per capita disposable income and other indicators , Quantitatively evaluate the ranking of economic development level of multiple provinces, cities and regions or the weight of each index
4、 Modeling steps
Factor analysis is a method of reducing multidimensional variables to a few common factors according to the correlation between variables , Then the multidimensional variable statistical analysis method is analyzed . The basic idea is to divide the original variables into two parts : One part is the linear combination of common factors , Condensing represents most of the information in the original variables ; The other part is the special factor which has nothing to do with the common factor , It reflects the linear combination of common factors and original variables The gap between .p Dimension variable The factor analysis model is :
Or as
among f =[f 1 ,f 2 ,…,f m ]T namely by carry take Of Male common because Son towards The amount , generation surface 了 primary beginning change The amount in No can straight Pick up view measuring but customer view save stay Of m (m <p) Three mutually independent common influencing factors ;A=() Is the factor load matrix , matrix Elements aik by change The amount x i Yes Male common because Son fk The load of , It reflects the correlation coefficient between the two , The greater the absolute value , The more relevant ;
For multidimensional variables x The key to establish the factor analysis model is to solve the factor load matrix A And the common factor vector f , The steps are as follows :
1. In order to eliminate the influence of different dimensions of variables , To contain n individual p Samples of dimensional variables X=[x1 ,x2 ,…,xn ] Standardize . After standardization , The mean value of each variable is 0, The variance of 1. For the convenience of expression, the standardized variables are still used X Express , Its elements are :
2. Find the covariance matrix of the sample S, Its elements are :
3. For the sample covariance matrix S Do eigenvalue decomposition , obtain p Eigenvalues λ1 ≥λ2≥…≥λp ≥0, The corresponding eigenvalue vector is γ1 , γ2 ,…,γp , Before taking it m The eigenvector of the largest eigenvalue estimates the factor load matrix . At the same time, in order to ensure the variance of each component of the common factor vector by 1, Divide it by the corresponding standard deviation λj . The corresponding eigenvector in the factor load matrix γj Then multiply by λj . Therefore, the factor load matrix
The parameter m Determined by the cumulative variance contribution rate of common factors , namely
It is generally believed , At present m The cumulative variance contribution rate of common factors exceeds 90% when , It can be considered that before m The linear combination of common factors can basically restore the original variable information .
Common factor vector f , That is, the specific score of the original variable on the common factor can be estimated by regression method
Go through the above steps , After obtaining the factor load matrix and the common factor vector , Then we can get that the special factor vector of the original variable is :
边栏推荐
- R language uses Cox of epidisplay package Display function obtains the summary statistical information of Cox regression model (risk rate HR, adjusted risk rate and its confidence interval, P value of
- Imitation Jingdong magnifying glass effect (pink teacher version)
- R语言使用epiDisplay包的cox.display函数获取cox回归模型汇总统计信息(风险率HR、调整风险率及其置信区间、模型系数的t检验的p值、Wald检验的p值和似然比检验的p值)、汇总统计
- [0701] [论文阅读] Alleviating Data Imbalance Issue with Perturbed Input During Inference
- Fastdfs installation
- How to delete the border of links in IE? [repeat] - how to remove borders around links in IE? [duplicate]
- 潇洒郎:彻底解决Markdown图片问题——无需上传图片——无需网络——转发给他人图片无缺失
- 2022 software engineering final exam recall Edition
- FastDFS安装
- [test development] takes you to know what software testing is
猜你喜欢
Use cheat engine to modify money, life and stars in Kingdom rush
Tutorial (5.0) 10 Troubleshooting * fortiedr * Fortinet network security expert NSE 5
Tutoriel (5.0) 10. Dépannage * fortiedr * fortinet Network Security expert NSE 5
[daily question] first day
[100 cases of JVM tuning practice] 01 - introduction of JVM and program counter
Mysql高级篇学习总结6:索引的概念及理解、B+树产生过程详解、MyISAM与InnoDB的对比
使用CLion编译OGLPG-9th-Edition源码
性能测试如何创造业务价值
[100 cases of JVM tuning practice] 03 -- four cases of JVM heap tuning
Mysql高级篇学习总结8:InnoDB数据存储结构页的概述、页的内部结构、行格式
随机推荐
[paper reading] Ca net: leveraging contextual features for lung cancer prediction
R语言ggplot2可视化分面图(facet):gganimate包基于transition_time函数创建动态散点图动画(gif)
高频面试题
使用 Cheat Engine 修改 Kingdom Rush 中的金钱、生命、星
How to print mybats log plug-in using XML file
R language ggplot2 visualization: gganimate package creates dynamic histogram animation (GIF) and uses transition_ The States function displays a histogram step by step along a given dimension in the
Introduction to the paper | analysis and criticism of using the pre training language model as a knowledge base
R语言ggplot2可视化:可视化折线图、使用labs函数为折线图添加自定义的X轴标签信息
GMapping代码解析[通俗易懂]
2022 software engineering final exam recall Edition
Have you stepped on the nine common pits in the e-commerce system?
How to copy and paste interlaced in Excel
机器学习笔记 - 时间序列预测研究:法国香槟的月销量
When converting from list to map, if a certain attribute may cause key duplication and exceptions, you can set the way to deal with this duplication
2022软件工程期末考试 回忆版
R语言使用epiDisplay包的lrtest函数对多个glm模型(logisti回归)执行似然比检验(Likelihood ratio test)对比两个模型的性能是否有差异、广义线性模型的似然比检
PHP非对称加密方法私钥及公钥加密解密的方法
Mysql高级篇学习总结6:索引的概念及理解、B+树产生过程详解、MyISAM与InnoDB的对比
性能测试如何创造业务价值
数据降维——因子分析