当前位置:网站首页>Data dimensionality reduction factor analysis
Data dimensionality reduction factor analysis
2022-07-02 19:19:00 【Lu 727】
1、 effect
Factor analysis is based on the idea of dimension reduction , In the case of no or less loss of original data information as far as possible , The complex variables are aggregated into a few independent common factors , These common factors can reflect the main information of many variables , While reducing the number of variables , It also reflects the internal relationship between variables . Generally, factor analysis has three functions : One is to reduce the dimension of factors , Second, calculate the factor weight , Third, calculate the weighted calculation factor to summarize the comprehensive score .
2、 Input / output description
Input :2 Two or more quantitative variables ( Assuming that N A variable ).
Output : The minimum dimension reduction is 1 dimension ( A variable , Generally used for comprehensive evaluation ), Maximum dimension reduction N A variable ( Generally used for data desensitization ), At the same time, the composition weight of each variable after dimension reduction can be obtained , Used to represent the data retention of the original variable .
3、 Case example
According to the region 2021 Per capita in GDP、 Per capita disposable income and other indicators , Quantitatively evaluate the ranking of economic development level of multiple provinces, cities and regions or the weight of each index

4、 Modeling steps
Factor analysis is a method of reducing multidimensional variables to a few common factors according to the correlation between variables , Then the multidimensional variable statistical analysis method is analyzed . The basic idea is to divide the original variables into two parts : One part is the linear combination of common factors , Condensing represents most of the information in the original variables ; The other part is the special factor which has nothing to do with the common factor , It reflects the linear combination of common factors and original variables The gap between .p Dimension variable
The factor analysis model is :

Or as

among f =[f 1 ,f 2 ,…,f m ]T namely by carry take Of Male common because Son towards The amount , generation surface 了 primary beginning change The amount in No can straight Pick up view measuring but customer view save stay Of m (m <p) Three mutually independent common influencing factors ;A=(
) Is the factor load matrix , matrix Elements aik by change The amount x i Yes Male common because Son fk The load of , It reflects the correlation coefficient between the two , The greater the absolute value , The more relevant ;
For multidimensional variables x The key to establish the factor analysis model is to solve the factor load matrix A And the common factor vector f , The steps are as follows :
1. In order to eliminate the influence of different dimensions of variables , To contain n individual p Samples of dimensional variables X=[x1 ,x2 ,…,xn ] Standardize . After standardization , The mean value of each variable is 0, The variance of 1. For the convenience of expression, the standardized variables are still used X Express , Its elements are :

2. Find the covariance matrix of the sample S, Its elements are :

3. For the sample covariance matrix S Do eigenvalue decomposition , obtain p Eigenvalues λ1 ≥λ2≥…≥λp ≥0, The corresponding eigenvalue vector is γ1 , γ2 ,…,γp , Before taking it m The eigenvector of the largest eigenvalue estimates the factor load matrix . At the same time, in order to ensure the variance of each component of the common factor vector by 1, Divide it by the corresponding standard deviation λj . The corresponding eigenvector in the factor load matrix γj Then multiply by λj . Therefore, the factor load matrix

The parameter m Determined by the cumulative variance contribution rate of common factors , namely

It is generally believed , At present m The cumulative variance contribution rate of common factors exceeds 90% when , It can be considered that before m The linear combination of common factors can basically restore the original variable information .
Common factor vector f , That is, the specific score of the original variable on the common factor can be estimated by regression method

Go through the above steps , After obtaining the factor load matrix and the common factor vector , Then we can get that the special factor vector of the original variable is :

边栏推荐
- 新手必看,點擊兩個按鈕切換至不同的內容
- [100 cases of JVM tuning practice] 02 - five cases of virtual machine stack and local method stack tuning
- [test development] software testing - concept
- Yunna | why use the fixed asset management system and how to enable it
- How to copy and paste interlaced in Excel
- 数据降维——主成分分析
- MySQL advanced learning summary 8: overview of InnoDB data storage structure page, internal structure of page, row format
- Compile oglpg-9th-edition source code with clion
- 【pytorch学习笔记】Tensor
- Introduction to the paper | analysis and criticism of using the pre training language model as a knowledge base
猜你喜欢
随机推荐
What is 9D movie like? (+ common sense of dimension space)
Hospital online inquiry source code hospital video inquiry source code hospital applet source code
R language uses the lsnofunction function function of epidisplay package to list all objects in the current space, except user-defined function objects
R language uses Cox of epidisplay package Display function obtains the summary statistical information of Cox regression model (risk rate HR, adjusted risk rate and its confidence interval, P value of
线程应用实例
R language dplyr package Na_ The if function converts the control in the vector value into the missing value Na, and converts the specified content into the missing value Na according to the mapping r
How can retail enterprises open the second growth curve under the full link digital transformation
Markdown基础语法
高频面试题
How performance testing creates business value
The R language dplyr package rowwise function and mutate function calculate the maximum value of multiple data columns in each row in the dataframe data, and generate the data column (row maximum) cor
数字滚动带动画
Kubernetes three open interfaces first sight
Tips for material UV masking
2022 software engineering final exam recall Edition
#gStore-weekly | gStore源码解析(四):安全机制之黑白名单配置解析
[paper reading] Ca net: leveraging contextual features for lung cancer prediction
Qpropertyanimation use and toast case list in QT
消息队列消息丢失和消息重复发送的处理策略
GMapping代码解析[通俗易懂]


![[fluent] dart data type (VaR data type | object data type)](/img/1b/fe2529af5f6663fad1fb7861f14ab5.jpg)






