当前位置:网站首页>R language principal component PCA, factor analysis, clustering analysis of regional economy analysis of Chongqing Economic Indicators
R language principal component PCA, factor analysis, clustering analysis of regional economy analysis of Chongqing Economic Indicators
2022-07-07 04:42:00 【Extension Research Office】
Link to the original text :http://tecdat.cn/?p=27515
The source of the original text is : The official account of the tribal public
Establish the economic index development system of Chongqing , Take Chongqing One hour economic circle As a sample , Use factor analysis method for empirical analysis , Based on the relevant evaluation theories and methods , This paper extracts the economic scale 、 Per capita development level 、 Economic development potential 、3 Main factors , from 2 Chongqing statistical yearbook selection 8 The index system composed of the following indicators is of great significance to Chongqing 38 Analyze the eight indicators of the basic situation of economic development of districts and counties , Based on the main factor score matrix, Chongqing 38 Districts and counties Cluster analysis . It turns out that : According to the comprehensive score , It can be seen that Yuzhong District ranks the top three in terms of social and economic development level of all districts and counties 、 Yubei District 、 Jiulongpo District , The three lowest scores are in Wushan County 、 Wuxi County 、 Chengkou county , Combined with the overall analysis, we can see that Yuzhong District 、 Jiulongpo District is better in terms of overall economic scale and construction industry , The economic strength of the surrounding areas of Chongqing is poor , The investment environment is not good , Especially the lack of architecture , So that the economic development is relatively weak , In any way, the economic strength of Yuzhong District is the best among all districts and counties in Chongqing .
Establishment of evaluation indicators
Evaluate the level of economic development between regions , An appropriate index system must be established . Considering the complexity of regional economic indicators 、 Diversity and operability , On this basis, this paper establishes a relatively complete set of regional economic evaluation index system that is easy to be quantitatively analyzed , Reflect the characteristics of regional economic development from different angles .
The index system established in this paper includes 8 Indicators , From the economic scale 、 Per capita development level 、 Economic development potential and other aspects to reflect the characteristics of regional economic development . The specific indicators are as follows :
Gross Regional Product ( Ten thousand yuan )(X1)
Total retail sales of consumer goods ( Ten thousand yuan )(X2)
Total industrial output value ( Ten thousand yuan )(X3)
Total output value of construction industry ( Ten thousand yuan )(X4)
High tech GDP ( Ten thousand yuan )(X5)
Investment in fixed assets of the whole society ( Ten thousand yuan )(X6)
Per capita disposable income ( element )(X7)
GDP per capita ( element )(X8)
Application of factor analysis in Regional Economic Research
Factor analysis model and its steps
Factor analysis is a technique of data simplification . It studies the internal dependencies between many variables , Explore the basic structure of observation data , And use a few hypothetical variables to represent its basic data structure . These hypothetical variables can reflect the main information of many original variables . The original variable is an observable explicit variable , Hypothetical variables are unobservable potential variables , It's called a factor . set up p A variable , Then the mathematical model of factor analysis can be expressed as :
call For the public factor , Is an unobservable variable , Their coefficients are called factor loads . Is a special factor , Can't be before m Part of a common factor . The steps of factor analysis are as follows :
(1) Standardize raw data , Still recorded as X;(2) Establish the correlation coefficient matrix R;(3) Solve the characteristic equation , Calculate the eigenvalues and eigenvectors , When the cumulative contribution rate is not less than 85% when , extract k Three principal components replace the original m Indicators , Calculate the factor load matrix A;(4) Yes A Perform maximum orthogonal rotation Exchange ;(5) Name and explain the main factors . To sort , Then calculate the score of each main factor , Take the contribution rate as the weight , Calculate the comprehensive factor score for the weighting .
Sample selection and data source
This paper selects Chongqing 38 Districts and counties were analyzed as samples , The aim is to explore how to base on R Factor analysis and cluster analysis methods of statistical software study regional economic development . The specific data are as follows :
Data analysis process
Enter the original data R In software , Select regional GDP ( Ten thousand yuan )(X1)、 Total retail sales of consumer goods ( Ten thousand yuan )(X2)、 Total industrial output value ( Ten thousand yuan )(X3)、 Total output value of construction industry ( Ten thousand yuan )(X4)、 High tech GDP ( Ten thousand yuan )(X5)、 Investment in fixed assets of the whole society ( Ten thousand yuan )(X6)、 Per capita disposable income ( element )(X7)、 GDP per capita ( element )(X8).
Before factor analysis , We observe the correlation coefficient matrix , And use KMO and Bartlett’s Test Check whether the data is suitable for factor analysis . Then do descriptive analysis Analysis-factor-description Get the initial common factor variance 、 factor 、 Characteristic values and percentages and cumulative percentages explained by each factor . The analysis results are as follows :
coebaltt(COR,)#Bartlett Spherical inspection
Bartlett Sphericity test of p value ( Significance probability value sig)<0.05, Indicates that it has passed the inspection , The distribution can be approximately normal , Therefore, factor analysis can be carried out .
sreeot(PCA,type="lines")
It can be concluded from the table , extract 3 The cumulative variance contribution rate of factors has reached 89.854%>86%, The loss of information is only 10.146%, From 4 The contribution rate of variance of all factors is lower than 5%, So choose 3 The effect of factor analysis with common factors is ideal ; It can be seen from the gravel diagram in the figure that 4 Factors start , The eigenvalue difference changes little , in summary : When the characteristic value is greater than 0.5 Under the condition of , The three factors extracted can pass the test and be well described 8 Indicators , So before extraction 3 An eigenvalue is used to establish the factor load matrix .
The table shows the initial factor load matrix , F1、F2、F3 As the first 、 second 、 The third common factor . The mathematical purpose of factor analysis is not only to find out common factors and group variables , It is more important to know the meaning of each common factor , For further analysis , If the meaning of each common factor is unclear , It is not convenient to explain the actual background . Because the factor load matrix is not unique , Therefore, the factor load matrix should be rotated . The purpose is to simplify the structure of the factor load matrix , Make the square value of each column or row of the load matrix to 0 and 1 Polarization . There are three main orthogonal rotation methods . The fourth power maximum method 、 Variance maximization method and equivalent maximization method .
Therefore, factor rotation is required , Make the contribution of factors to variables reach the effect of polarization . Therefore, the orthogonal rotation method of maximizing variance is adopted , Make each variable produce a higher load on a certain factor , The load of other factors is small , Thus, the factor load matrix after rotation is obtained , As shown in the following table :
It can be seen from the table and the rotated factor graph , The ability to interpret the original data through the rotated common factor is improved , Expressed as a common factor F1 stay X1( Gross Regional Product ),X6( Investment in fixed assets of the whole society ) and X8( GDP per capita ) The load values on the are very large . Therefore, we can establish the first public factor as the factor of comprehensive economic strength , Macroscopically, it reflects the overall situation of the regional economic development scale , The higher the score on this factor , It shows that the overall situation of urban economic development is better .
Use the variance contribution rate of each common factor to calculate the comprehensive score , And calculate the comprehensive score = factor 1 The variance contribution rate of * factor 1 Score of + factor 2 The variance contribution rate of * factor 2 Score of + factor 3 The variance contribution rate of * factor 3 Score of . Arrange the data in descending order according to the comprehensive score , The partial factor scores and comprehensive scores obtained are shown in the figure below :
Result discussion
Score based on the above factors , We can draw 2012 Chongqing in 38 The economic development of the districts and counties is as follows :
1、 According to economic strength factor F1 Score greater than 1 Yuzhong District in turn 、 Yubei District 、 Jiulongpo District 、 Jiangbei District and Wanzhou District , The scores are 4.4211、1.8967、1.7808、1.201、1.2804. Explain the overall scale of the economy and the construction industry , Yuzhong District 、 Yubei District 、 Jiulongpo 、 Jiangbei District and Wanzhou District are in Chongqing 38 Among the districts and counties, it is the best , Large scale , The strongest economy , The future is very good , Regions with strong economic development strength .
2、 According to the economic development potential factor F2 Score greater than 1 There are Shapingba District and Yubei District , The scores are 3.7052、3.4396. It shows that it is relatively developed in high technology, science and industry , The investment in fixed assets is the largest , These two areas are in the main city , High degree of opening up , Scientific and technological innovation is better , Have their own industrial development , Has basically formed its own industrial structure , Give full play to their geographical advantages and resource and environmental advantages , Great potential for development .
Cluster analysis based on principal factor scores
Cluster analysis
Cluster analysis is also called group analysis , Is to group data into multiple classes . Objects in the same class have high similarity , Objects of different classes differ greatly . In society economic There are a lot of classification problems in the field , For example, if we investigate the price index of some big cities , And there are many price indexes , There is a price index for agricultural production 、 Service price index 、 Food consumer price index 、 Retail price index of building materials, etc . Because there are many price indexes to be investigated , Usually, these price indexes are classified first . All in all , There are many problems that need to be classified , Therefore, clustering analysis, a useful tool, has attracted more and more attention , It has been widely used in many fields .
The content of cluster analysis is very rich , There is a systematic clustering method 、 Ordered sample clustering method 、 Dynamic clustering 、 Fuzzy clustering method 、 Graph theory clustering method 、 Cluster prediction method, etc ; The most commonly used and successful cluster analysis is systematic clustering , The basic idea of systematic clustering method is to first n Each sample is regarded as a class , Then specify the... Between samples “ distance ” And the distance between classes . Select the two nearest classes and merge them into a new class , Calculate new classes and other classes ( Each current category ) Distance of , Then merge the two nearest classes . such , Reduce one category per merge , Until all samples are classified into one class .
The basic steps of systematic clustering :
1、 Calculation n The distance between two samples .
2、 structure n Classes , Each class contains only one sample .
3、 Merge the two closest classes into a new class .
4、 Calculate the distance between the new class and each current class .
5、 Repeat step 3、4, Merge the two closest classes into a new class , Until all classes are merged into one class .
6、 Draw a cluster pedigree .
7、 Determine the number of classes and classes .
Systematic clustering method :1、 The shortest distance method ;2、 The longest distance method ;3、 Middle distance method ;4、 Barycenter method ;5、 Class average method ;6、 The sum of squares of deviations (Ward Law ).
Based on the main factor score, Chongqing 38 Analysis of economic development of districts and counties , Use the clustering method to select the inter group link method , Calculate the distance and choose the square Euclidean distance , Only standardized data can be processed with standard Zhengtai data . The results are as follows :
rct.st(hc,k = 6, border = "red")
From the tree view , Chongqing districts and counties can be divided into six categories according to the actual situation of economy and strength :
The first category only includes Yuzhong District , Yuzhong District is the central city of Chongqing , It is the political, economic and cultural center of Chongqing 、 Highland of basic education 、 It has special location advantages and prominent strategic position . The current situation of the industrial structure is characterized by the absolute advantage of the tertiary industry , Among them, the financial industry 、 Commerce and intermediary services are the leading industries , It belongs to an area with strong economic development .
The second category only includes Yubei District , Yubei district has successively launched the master plan 65 Square kilometers of Chongqing Science and Technology Industrial Park 、 Chongqing modern agricultural park 、 Eastern Chongqing area for development Projects such as , Named by the municipal government " Chongqing Agricultural Science and Technology Park ", Therefore, the region has made a great contribution to the GDP of high technology , And the investment environment is superior , And most areas have their own central business zone , High degree of opening up , The location advantage is obvious , Reasonable industrial structure , It belongs to an area with strong economic development .
The most popular insights
1.matlab Partial least squares regression (PLSR) And principal component regression (PCR)
3. Principal component analysis (PCA) Basic principles and analysis examples
5. Use LASSO Regression prediction of stock return data analysis
6.r In language lasso Return to ,ridge Ridge return and elastic-net Model
7.r Partial least squares regression in language pls-da Data analysis
边栏推荐
- The root file system of buildreoot prompts "depmod:applt not found"
- leetcode 53. Maximum Subarray 最大子数组和(中等)
- 英特尔与信步科技共同打造机器视觉开发套件,协力推动工业智能化转型
- What if win11 pictures cannot be opened? Repair method of win11 unable to open pictures
- 赠票速抢|行业大咖纵论软件的质量与效能 QECon大会来啦
- GPT-3当一作自己研究自己,已投稿,在线蹲一个同行评议
- Detect when a tab bar item is pressed
- 用CPU方案打破内存墙?学PayPal堆傲腾扩容量,漏查欺诈交易量可降至1/30
- [OA] excel document generator: openpyxl module
- MySQL null value processing and value replacement
猜你喜欢
数学分析_笔记_第10章:含参变量积分
Win11远程桌面连接怎么打开?Win11远程桌面连接的五种方法
Video fusion cloud platform easycvr video Plaza left column list style optimization
Case reward: Intel brings many partners to promote the innovation and development of multi domain AI industry
[OA] excel document generator: openpyxl module
这项15年前的「超前」技术设计,让CPU在AI推理中大放光彩
Ssm+jsp realizes enterprise management system (OA management system source code + database + document +ppt)
acwing 843. N-queen problem
用CPU方案打破内存墙?学PayPal堆傲腾扩容量,漏查欺诈交易量可降至1/30
Win11图片打不开怎么办?Win11无法打开图片的修复方法
随机推荐
In cooperation with the research team of the clinical trial center of the University of Hong Kong and Hong Kong Gangyi hospital, Kexing launched the clinical trial of Omicron specific inactivated vacc
九章云极DataCanvas公司摘获「第五届数字金融创新大赛」最高荣誉!
软件测试之网站测试如何进行?测试小攻略走起!
EasyCVR无法使用WebRTC进行播放,该如何解决?
Introduction to the PureMVC series
How to conduct website testing of software testing? Test strategy let's go!
MySQL split method usage
A picture to understand! Why did the school teach you coding but still not
What if win11 pictures cannot be opened? Repair method of win11 unable to open pictures
Ssm+jsp realizes enterprise management system (OA management system source code + database + document +ppt)
SQL where multiple field filtering
案例大赏:英特尔携众多合作伙伴推动多领域AI产业创新发展
mpf2_ Linear programming_ CAPM_ sharpe_ Arbitrage Pricin_ Inversion Gauss Jordan_ Statsmodel_ Pulp_ pLU_ Cholesky_ QR_ Jacobi
JS form get form & get form elements
什么是Web3
How to solve the problem of adding RTSP device to easycvr cluster version and prompting server ID error?
JetBrain Pycharm的一系列快捷键
B站大佬用我的世界搞出卷积神经网络,LeCun转发!爆肝6个月,播放破百万
Programmers go to work fishing, so play high-end!
关于01背包个人的一些理解