当前位置:网站首页>R language principal component PCA, factor analysis, clustering analysis of regional economy analysis of Chongqing Economic Indicators

R language principal component PCA, factor analysis, clustering analysis of regional economy analysis of Chongqing Economic Indicators

2022-07-07 04:42:00 Extension Research Office

Link to the original text :http://tecdat.cn/?p=27515 

The source of the original text is : The official account of the tribal public

 

Establish the economic index development system of Chongqing , Take Chongqing One hour economic circle As a sample , Use factor analysis method for empirical analysis , Based on the relevant evaluation theories and methods , This paper extracts the economic scale 、 Per capita development level 、 Economic development potential 、3 Main factors , from 2 Chongqing statistical yearbook selection 8 The index system composed of the following indicators is of great significance to Chongqing 38 Analyze the eight indicators of the basic situation of economic development of districts and counties , Based on the main factor score matrix, Chongqing 38 Districts and counties Cluster analysis . It turns out that : According to the comprehensive score , It can be seen that Yuzhong District ranks the top three in terms of social and economic development level of all districts and counties 、 Yubei District 、 Jiulongpo District , The three lowest scores are in Wushan County 、 Wuxi County 、 Chengkou county , Combined with the overall analysis, we can see that Yuzhong District 、 Jiulongpo District is better in terms of overall economic scale and construction industry , The economic strength of the surrounding areas of Chongqing is poor , The investment environment is not good , Especially the lack of architecture , So that the economic development is relatively weak , In any way, the economic strength of Yuzhong District is the best among all districts and counties in Chongqing .

Establishment of evaluation indicators

Evaluate the level of economic development between regions , An appropriate index system must be established . Considering the complexity of regional economic indicators 、 Diversity and operability , On this basis, this paper establishes a relatively complete set of regional economic evaluation index system that is easy to be quantitatively analyzed , Reflect the characteristics of regional economic development from different angles .

The index system established in this paper includes 8 Indicators , From the economic scale 、 Per capita development level 、 Economic development potential and other aspects to reflect the characteristics of regional economic development . The specific indicators are as follows :

Gross Regional Product ( Ten thousand yuan )(X1)              

Total retail sales of consumer goods ( Ten thousand yuan )(X2)

Total industrial output value ( Ten thousand yuan )(X3)

Total output value of construction industry ( Ten thousand yuan )(X4)

High tech GDP ( Ten thousand yuan )(X5)

Investment in fixed assets of the whole society ( Ten thousand yuan )(X6)

Per capita disposable income ( element )(X7)

GDP per capita ( element )(X8)

Application of factor analysis in Regional Economic Research

Factor analysis model and its steps

Factor analysis is a technique of data simplification . It studies the internal dependencies between many variables , Explore the basic structure of observation data , And use a few hypothetical variables to represent its basic data structure . These hypothetical variables can reflect the main information of many original variables . The original variable is an observable explicit variable , Hypothetical variables are unobservable potential variables , It's called a factor . set up p A variable , Then the mathematical model of factor analysis can be expressed as :

 

call For the public factor , Is an unobservable variable , Their coefficients are called factor loads . Is a special factor , Can't be before m Part of a common factor . The steps of factor analysis are as follows :

(1) Standardize raw data , Still recorded as X;(2) Establish the correlation coefficient matrix R;(3) Solve the characteristic equation , Calculate the eigenvalues and eigenvectors , When the cumulative contribution rate is not less than 85% when , extract k Three principal components replace the original m Indicators , Calculate the factor load matrix A;(4) Yes A Perform maximum orthogonal rotation Exchange ;(5) Name and explain the main factors . To sort , Then calculate the score of each main factor , Take the contribution rate as the weight , Calculate the comprehensive factor score for the weighting .

Sample selection and data source

This paper selects Chongqing 38 Districts and counties were analyzed as samples , The aim is to explore how to base on R Factor analysis and cluster analysis methods of statistical software study regional economic development . The specific data are as follows :

Data analysis process

Enter the original data R In software , Select regional GDP ( Ten thousand yuan )(X1)、 Total retail sales of consumer goods ( Ten thousand yuan )(X2)、 Total industrial output value ( Ten thousand yuan )(X3)、 Total output value of construction industry ( Ten thousand yuan )(X4)、 High tech GDP ( Ten thousand yuan )(X5)、 Investment in fixed assets of the whole society ( Ten thousand yuan )(X6)、 Per capita disposable income ( element )(X7)、 GDP per capita ( element )(X8).

Before factor analysis , We observe the correlation coefficient matrix , And use KMO and Bartlett’s Test Check whether the data is suitable for factor analysis . Then do descriptive analysis Analysis-factor-description Get the initial common factor variance 、 factor 、 Characteristic values and percentages and cumulative percentages explained by each factor . The analysis results are as follows :

coebaltt(COR,)#Bartlett Spherical inspection 

 

Bartlett Sphericity test of p value ( Significance probability value sig)<0.05, Indicates that it has passed the inspection , The distribution can be approximately normal , Therefore, factor analysis can be carried out .

sreeot(PCA,type="lines")

 

It can be concluded from the table , extract 3 The cumulative variance contribution rate of factors has reached 89.854%>86%, The loss of information is only 10.146%, From 4 The contribution rate of variance of all factors is lower than 5%, So choose 3 The effect of factor analysis with common factors is ideal ; It can be seen from the gravel diagram in the figure that 4 Factors start , The eigenvalue difference changes little , in summary : When the characteristic value is greater than 0.5 Under the condition of , The three factors extracted can pass the test and be well described 8 Indicators , So before extraction 3 An eigenvalue is used to establish the factor load matrix . 

The table shows the initial factor load matrix , F1、F2、F3 As the first 、 second 、 The third common factor . The mathematical purpose of factor analysis is not only to find out common factors and group variables , It is more important to know the meaning of each common factor , For further analysis , If the meaning of each common factor is unclear , It is not convenient to explain the actual background . Because the factor load matrix is not unique , Therefore, the factor load matrix should be rotated . The purpose is to simplify the structure of the factor load matrix , Make the square value of each column or row of the load matrix to 0 and 1 Polarization . There are three main orthogonal rotation methods . The fourth power maximum method 、 Variance maximization method and equivalent maximization method .

Therefore, factor rotation is required , Make the contribution of factors to variables reach the effect of polarization . Therefore, the orthogonal rotation method of maximizing variance is adopted , Make each variable produce a higher load on a certain factor , The load of other factors is small , Thus, the factor load matrix after rotation is obtained , As shown in the following table :

     It can be seen from the table and the rotated factor graph , The ability to interpret the original data through the rotated common factor is improved , Expressed as a common factor F1 stay X1( Gross Regional Product ),X6( Investment in fixed assets of the whole society ) and X8( GDP per capita ) The load values on the are very large . Therefore, we can establish the first public factor as the factor of comprehensive economic strength , Macroscopically, it reflects the overall situation of the regional economic development scale , The higher the score on this factor , It shows that the overall situation of urban economic development is better .

Use the variance contribution rate of each common factor to calculate the comprehensive score , And calculate the comprehensive score = factor 1 The variance contribution rate of * factor 1 Score of + factor 2 The variance contribution rate of * factor 2 Score of + factor 3 The variance contribution rate of * factor 3 Score of . Arrange the data in descending order according to the comprehensive score , The partial factor scores and comprehensive scores obtained are shown in the figure below :

Result discussion

Score based on the above factors , We can draw 2012 Chongqing in 38 The economic development of the districts and counties is as follows :

1、 According to economic strength factor F1 Score greater than 1 Yuzhong District in turn 、 Yubei District 、 Jiulongpo District 、 Jiangbei District and Wanzhou District , The scores are 4.4211、1.8967、1.7808、1.201、1.2804. Explain the overall scale of the economy and the construction industry , Yuzhong District 、 Yubei District 、 Jiulongpo 、 Jiangbei District and Wanzhou District are in Chongqing 38 Among the districts and counties, it is the best , Large scale , The strongest economy , The future is very good , Regions with strong economic development strength .

2、 According to the economic development potential factor F2 Score greater than 1 There are Shapingba District and Yubei District , The scores are 3.7052、3.4396. It shows that it is relatively developed in high technology, science and industry , The investment in fixed assets is the largest , These two areas are in the main city , High degree of opening up , Scientific and technological innovation is better , Have their own industrial development , Has basically formed its own industrial structure , Give full play to their geographical advantages and resource and environmental advantages , Great potential for development .

Cluster analysis based on principal factor scores

Cluster analysis  

Cluster analysis is also called group analysis , Is to group data into multiple classes . Objects in the same class have high similarity , Objects of different classes differ greatly . In society economic There are a lot of classification problems in the field , For example, if we investigate the price index of some big cities , And there are many price indexes , There is a price index for agricultural production 、 Service price index 、 Food consumer price index 、 Retail price index of building materials, etc . Because there are many price indexes to be investigated , Usually, these price indexes are classified first . All in all , There are many problems that need to be classified , Therefore, clustering analysis, a useful tool, has attracted more and more attention , It has been widely used in many fields . 
The content of cluster analysis is very rich , There is a systematic clustering method 、 Ordered sample clustering method 、 Dynamic clustering 、 Fuzzy clustering method 、 Graph theory clustering method 、 Cluster prediction method, etc ; The most commonly used and successful cluster analysis is systematic clustering , The basic idea of systematic clustering method is to first n Each sample is regarded as a class , Then specify the... Between samples “ distance ” And the distance between classes . Select the two nearest classes and merge them into a new class , Calculate new classes and other classes ( Each current category ) Distance of , Then merge the two nearest classes . such , Reduce one category per merge , Until all samples are classified into one class . 
The basic steps of systematic clustering : 
1、 Calculation n The distance between two samples . 
2、 structure n Classes , Each class contains only one sample . 
3、 Merge the two closest classes into a new class . 
4、 Calculate the distance between the new class and each current class . 
5、 Repeat step 3、4, Merge the two closest classes into a new class , Until all classes are merged into one class . 
6、 Draw a cluster pedigree . 
7、 Determine the number of classes and classes . 
Systematic clustering method :1、 The shortest distance method ;2、 The longest distance method ;3、 Middle distance method ;4、 Barycenter method ;5、 Class average method ;6、 The sum of squares of deviations (Ward Law ). 

Based on the main factor score, Chongqing 38 Analysis of economic development of districts and counties , Use the clustering method to select the inter group link method , Calculate the distance and choose the square Euclidean distance , Only standardized data can be processed with standard Zhengtai data . The results are as follows :

rct.st(hc,k = 6, border = "red")

From the tree view , Chongqing districts and counties can be divided into six categories according to the actual situation of economy and strength :

The first category only includes Yuzhong District , Yuzhong District is the central city of Chongqing , It is the political, economic and cultural center of Chongqing 、 Highland of basic education 、 It has special location advantages and prominent strategic position . The current situation of the industrial structure is characterized by the absolute advantage of the tertiary industry , Among them, the financial industry 、 Commerce and intermediary services are the leading industries , It belongs to an area with strong economic development .

The second category only includes Yubei District , Yubei district has successively launched the master plan 65 Square kilometers of Chongqing Science and Technology Industrial Park 、 Chongqing modern agricultural park 、 Eastern Chongqing area for development Projects such as , Named by the municipal government " Chongqing Agricultural Science and Technology Park ", Therefore, the region has made a great contribution to the GDP of high technology , And the investment environment is superior , And most areas have their own central business zone , High degree of opening up , The location advantage is obvious , Reasonable industrial structure , It belongs to an area with strong economic development .


The most popular insights

1.matlab Partial least squares regression (PLSR) And principal component regression (PCR)

2.R The principal components of language high dimensional data pca、 t-SNE Algorithm dimension reduction and visual analysis

3. Principal component analysis (PCA) Basic principles and analysis examples

4.R Bayesian quantile regression is realized in language 、lasso And adaptive lasso Bayesian quantile regression

5. Use LASSO Regression prediction of stock return data analysis

6.r In language lasso Return to ,ridge Ridge return and elastic-net Model

7.r Partial least squares regression in language pls-da Data analysis

8.R Language uses principal components PCA、  Logical regression 、 Decision tree 、 Random forest analysis of heart disease data and high-dimensional visualization

9.R Language principal component analysis (PCA) Wine Visualization : The main components are divided into a scatter diagram and a load diagram

原网站

版权声明
本文为[Extension Research Office]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207062212479629.html