当前位置:网站首页>Data dimensionality reduction factor analysis
Data dimensionality reduction factor analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Factor analysis is based on the idea of dimension reduction , In the case of no or less loss of original data information as far as possible , The complex variables are aggregated into a few independent common factors , These common factors can reflect the main information of many variables , While reducing the number of variables , It also reflects the internal relationship between variables . Generally, factor analysis has three functions : One is to reduce the dimension of factors , Second, calculate the factor weight , Third, calculate the weighted calculation factor to summarize the comprehensive score .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable ).
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
According to the region 2021 Per capita in GDP、 Per capita disposable income and other indicators , Quantitatively evaluate the ranking of economic development level of multiple provinces, cities and regions or the weight of each index

4、 Modeling steps
Factor analysis is a method of reducing multidimensional variables to a few common factors according to the correlation between variables , Then the multidimensional variable statistical analysis method is analyzed . The basic idea is to divide the original variables into two parts : One part is the linear combination of common factors , Condensing represents most of the information in the original variables ; The other part is the special factor which has nothing to do with the common factor , It reflects the linear combination of common factors and original variables The gap between .p Dimension variable
The factor analysis model is :

Or as

among f =[f 1 ,f 2 ,…,f m ]T namely by carry take Of Male common because Son towards The amount , generation surface 了 primary beginning change The amount in No can straight Pick up view measuring but customer view save stay Of m (m <p) Three mutually independent common influencing factors ;A=(
) Is the factor load matrix , matrix Elements aik by change The amount x i Yes Male common because Son fk The load of , It reflects the correlation coefficient between the two , The greater the absolute value , The more relevant ;
For multidimensional variables x The key to establish the factor analysis model is to solve the factor load matrix A And the common factor vector f , The steps are as follows :
1. In order to eliminate the influence of different dimensions of variables , To contain n individual p Samples of dimensional variables X=[x1 ,x2 ,…,xn ] Standardize . After standardization , The mean value of each variable is 0, The variance of 1. For the convenience of expression, the standardized variables are still used X Express , Its elements are :

2. Find the covariance matrix of the sample S, Its elements are :

3. For the sample covariance matrix S Do eigenvalue decomposition , obtain p Eigenvalues λ1 ≥λ2≥…≥λp ≥0, The corresponding eigenvalue vector is γ1 , γ2 ,…,γp , Before taking it m The eigenvector of the largest eigenvalue estimates the factor load matrix . At the same time, in order to ensure the variance of each component of the common factor vector by 1, Divide it by the corresponding standard deviation λj . The corresponding eigenvector in the factor load matrix γj Then multiply by λj . Therefore, the factor load matrix

The parameter m Determined by the cumulative variance contribution rate of common factors , namely

It is generally believed , At present m The cumulative variance contribution rate of common factors exceeds 90% when , It can be considered that before m The linear combination of common factors can basically restore the original variable information .
Common factor vector f , That is, the specific score of the original variable on the common factor can be estimated by regression method

Go through the above steps , After obtaining the factor load matrix and the common factor vector , Then we can get that the special factor vector of the original variable is :

边栏推荐
- 数字滚动带动画
- Mysql高级篇学习总结7:Mysql数据结构-Hash索引、AVL树、B树、B+树的对比
- 移动机器人路径规划:人工势场法[通俗易懂]
- PHP非对称加密方法私钥及公钥加密解密的方法
- 教程篇(5.0) 10. 故障排除 * FortiEDR * Fortinet 網絡安全專家 NSE 5
- 拦截器与过滤器的区别
- 【测试开发】软件测试—概念篇
- PyTorch函数中的__call__和forward函数
- Processing strategy of message queue message loss and repeated message sending
- R语言dplyr包na_if函数把向量数值中的控制转化为缺失值NA、按照映射规则把指定内容转化为缺失值NA
猜你喜欢

使用 Cheat Engine 修改 Kingdom Rush 中的金钱、生命、星

Yolov3 trains its own data set to generate train txt

According to the atlas of data security products and services issued by the China Academy of information technology, meichuang technology has achieved full coverage of four major sectors

Codeworks 5 questions per day (1700 average) - day 4

9D电影是怎样的?(+维度空间常识)

拦截器与过滤器的区别

云呐|为什么要用固定资产管理系统,怎么启用固定资产管理系统
![[paper reading] Ca net: leveraging contextual features for lung cancer prediction](/img/ef/bb48ee88d5dc6fe876a498ab53106e.png)
[paper reading] Ca net: leveraging contextual features for lung cancer prediction

Excel查找一列中的相同值,删除该行或替换为空值

PyTorch函数中的__call__和forward函数
随机推荐
#gStore-weekly | gStore源码解析(四):安全机制之黑白名单配置解析
yolov3 训练自己的数据集之生成train.txt
Golang并发编程——goroutine、channel、sync
Develop fixed asset management system, what voice is used to develop fixed asset management system
2022编译原理期末考试 回忆版
移动机器人路径规划:人工势场法[通俗易懂]
预处理和预处理宏
Juypter notebook modify the default open folder and default browser
ICDE 2023|TKDE Poster Session(CFP)
Yolov3 trains its own data set to generate train txt
QT中的QPropertyAnimation使用和toast案列
Mysql高级篇学习总结7:Mysql数据结构-Hash索引、AVL树、B树、B+树的对比
Mysql高级篇学习总结8:InnoDB数据存储结构页的概述、页的内部结构、行格式
Progress progress bar
ORA-01455: converting column overflows integer datatype
R language ggplot2 visual Facet: gganimate package is based on Transition_ Time function to create dynamic scatter animation (GIF)
How to play when you travel to Bangkok for the first time? Please keep this money saving strategy
R language ggplot2 visualization: gganimate package creates dynamic histogram animation (GIF) and uses transition_ The States function displays a histogram step by step along a given dimension in the
论文导读 | 机器学习在数据库基数估计中的应用
Codeworks 5 questions per day (1700 average) - day 4