当前位置:网站首页>[regression analysis] understand ridge regression with case teaching
[regression analysis] understand ridge regression with case teaching
2022-06-25 12:06:00 【Halosec_ Wei】
1、 effect
Ridge regression is a biased estimation regression method for collinear data analysis , In essence, it is an improved least squares estimation method , By giving up the unbiasedness of least squares , To lose some information 、 It is more practical to obtain the regression coefficient at the cost of reducing the accuracy 、 More reliable regression methods , The fitting of ill-conditioned data is better than the least square method .
2、 Input / output description
Input : The independent variables X At least one or more quantitative or categorical variables , The dependent variable Y Quantitative variables are required ( If it is a variable of fixed class , Please use logistic regression ).
Output : The result of model test goodness , Linear relationship between independent variable and dependent variable, etc .
3、 Learning Websites
SPSSPRO- Free professional online data analysis platform
4、 Case example
Case study : Through independent variables ( Room area 、 Floor height 、 House unit price 、 Is there an elevator 、 Number of schools around 、 From the subway station ) Fitting the predicted dependent variable ( housing price ), Now we find that there is a strong collinearity between the unit price of the house and the floor height ,VIF The value is higher than 20; The common least square method cannot be used OLS regression analysis , Ridge regression model is required .
5、 Case data

Ridge regression case data
6、 Case operation

Step1: New analysis ;
Step2: Upload data ;
Step3: Select the corresponding data to open and preview , Click start analysis after confirmation ;

step4: choice 【 Ridge return (Ridge)】;
step5: View the corresponding data format ,【 Ridge return (Ridge)】 The argument is required X At least one or more quantitative or categorical variables , The dependent variable Y Quantitative variables are required .
step6: Click on 【 To analyze 】, Complete the operation .
7、 Output result analysis
Output results 1: Ridge trace figure

Chart description : Through ridge trace map , determine K value .K The selection principle of value is the minimum when the standardized regression coefficient of each independent variable tends to be stable K value . But the ridge parameters determined by the ridge trace analysis method k To some extent, it is subjective and artificial ,psspro The method of variance expansion factor is used to automatically determine K=0.162.
Output results 2: Results of ridge regression analysis

*p<0.05,**p<0.01,***p<0.001
Chart description : The results of ridge regression show that : Based on field area 、 floor 、 The unit price 、 Number of schools around (1km)、 Distance from subway station (km)、 Significance of the regression model of the supporting elevator The value is 0.000***, The level is significant , Rejection of null hypothesis , It shows that there is a regression relationship between independent variables and dependent variables . meanwhile , Goodness of fit of model ² by 0.956, The model is relatively excellent , Therefore, the model basically meets the requirements .
The formula of the model :
The total price =-64.72+0.987 × area -0.043 × floor +0.008 × The unit price -0.447 × Number of schools around (1km)-4.198 × Distance from subway station (km)-3.674 × Supporting elevator r/&gt;<br/> Output results 3: Model path diagram

Chart description : The above figure shows the results of this model in the form of a path diagram , It mainly includes the coefficients of the model , The formula used to analyze the model .
Output results 4: Model result diagram

Chart description : The figure above shows the original data diagram of this model in a visual form 、 Model fitting value .
8、 matters needing attention
- Generally, before making the ridge return , First use linear regression ( Least squares regression ), If you find an argument VIF( Collinearity ) Too big , Exceed 10, Just use ridge regression ;
- SPSSPRO The variance expansion factor method is used to automatically find K value ;
- selection k The general principle of value is :
- The ridge estimation of each regression coefficient is basically stable
- The regression coefficient with unreasonable sign estimated by the least square method , The sign of its ridge estimation becomes reasonable
- There is no absolute value of the regression coefficient that does not accord with the economic significance
- The sum of squares of residuals does not increase much
9、 Model theory
Ridge return (Ridge Regression) It is a kind of regression method , It belongs to statistical method . stay machine learning Also known as weight attenuation . Some people call it Tikhonov Regularization . Ridge regression mainly solves two problems : One is when the number of predicted variables exceeds the number of observed variables ( Predictive variables are equivalent to characteristics , The observed variable is equivalent to the label ), Second, the data sets have multicollinearity , That is, there is correlation between the prediction variables .
General , Regression analysis ( matrix ) Form the following :

In general , The objective of using the least square method to solve the above regression problem is to minimize the following formula :

Ridge regression is to add a penalty item to the above minimization goal :

there λ It is also a parameter to be determined . in other words , Ridge regression is a least square regression with two norm penalty .
10、 reference
[1] Liu chao , regression analysis —— Method 、 Data and R Application , Higher Education Press ,2019
边栏推荐
- How TCP handles exceptions during three handshakes and four waves
- Black Horse Chang Shopping Mall - - - 3. Gestion des produits de base
- 剑指 Offer II 091. 粉刷房子 : 状态机 DP 运用题
- Redis雪崩、穿透和击穿是什么?
- SQL server saves binary fields to disk file
- Update of complex JSON in MySQL
- Dark horse shopping mall ---6 Brand, specification statistics, condition filtering, paging sorting, highlighting
- The idea of mass distribution of GIS projects
- R语言使用glm函数构建泊松对数线性回归模型处理三维列联表数据构建饱和模型、epiDisplay包的poisgof函数对拟合的泊松回归模型进行拟合优度检验(检验模型效果)
- The cloud native data lake has passed the evaluation and certification of the ICT Institute with its storage, computing, data management and other capabilities
猜你喜欢

Two ways of redis persistence -- detailed explanation of RDB and AOF

Customize to prevent repeated submission of annotations (using redis)

ROS notes (06) - definition and use of topic messages

confluence7.4. X upgrade record

What are redis avalanche, penetration and breakdown?

SDN系统方法 | 9. 接入网

Use PHP script to view the opened extensions

什么是Flink?Flink能用来做什么?

flutter常用命令及问题

Dark horse shopping mall ---2 Distributed file storage fastdfs
随机推荐
cnds
Is it safe to open an account and buy stocks? Who knows
What are redis avalanche, penetration and breakdown?
Flink batch key points (personal translation)
一套自动化无纸办公系统(OA+审批流)源码:带数据字典
. Using factory mode in net core
VFP calls the command line image processing program, and adding watermark is also available
JS indexof() always returns -1
Dark horse shopping mall ---1 Project introduction - environment construction
SMS verification before deleting JSP
19、wpf之事件转命令实现MVVM架构
属性分解 GAN 复现 实现可控人物图像合成
Kotlin学习笔记
Oracle Spatial creating spatial tables
The idea of mass distribution of GIS projects
一個硬件工程師走過的彎路
交易期货沪镍产品网上怎么开户
Uncover gaussdb (for redis): comprehensive comparison of CODIS
剑指 Offer II 091. 粉刷房子 : 状态机 DP 运用题
Where do the guests come from