当前位置:网站首页>Ridge regression and lasso regression
Ridge regression and lasso regression
2022-07-01 03:20:00 【weixin_ nine hundred and sixty-one million eight hundred and se】
The content of this article comes from teacher Qingfeng's explanation
- There are many ways to choose independent variables in regression , Too many variables may lead to multicollinearity, resulting in insignificant regression coefficient , Even cause OLS Estimated failure .
- Ridge return and lasso Regression in OLS The loss function of the regression model ( The sum of squared residuals SSE) Different penalties are added , The penalty term consists of a function of the regression coefficient , One side , The added penalty term can identify the unimportant variables in the model , Simplify the model , It can be seen as a step-by-step return to the promotion class ; On the other hand , The added penalty will make the model estimable , Even if the previous data does not meet the column full rank .
Principle of ridge regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
remember L = ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ L=(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta L=(y−xβ^)′(y−xβ^)+λβ^′β^, λ → 0 \lambda\to0 λ→0 when , Ridge regression and multiple linear regression are exactly the same ; λ → + ∞ \lambda\to +\infty λ→+∞ when , β ^ = 0 k × 1 \hat\beta=\boldsymbol 0_{k\times 1} β^=0k×1
in addition : ∂ L ∂ β ^ = − 2 x ′ y + 2 x ′ x β ^ + 2 λ β ^ = 0 ⇒ ( x ′ x + λ I ) β ^ = x ′ y \frac{\partial L}{\partial \hat\beta}=-2x^{'}y+2x^{'}x\hat\beta+2\lambda\hat\beta=0\Rightarrow (x^{'}x+\lambda I)\hat\beta=x^{'}y ∂β^∂L=−2x′y+2x′xβ^+2λβ^=0⇒(x′x+λI)β^=x′y
because x ′ x x^{'}x x′x Semi positive definite , be x ′ x x^{'}x x′x Eigenvalues are non negative numbers , add λ I \lambda I λI after , x ′ x + λ I x^{'}x+\lambda I x′x+λI Eigenvalues are integers , be x ′ x + λ I x^{'}x+\lambda I x′x+λI reversible , therefore β ^ = ( x ′ x + λ I ) x ′ y ( λ > 0 ) \hat\beta=(x^{'}x+\lambda I)x^{'}y\quad(\lambda>0) β^=(x′x+λI)x′y(λ>0)
How to choose λ \lambda λ
Ridge trace analysis ( It's used less )
- Concept of ridge trace : take λ \lambda λ from 0 → + ∞ 0\to+\infty 0→+∞ Variable , Got β ^ = ( β ^ 1 β ^ 2 ⋮ β ^ k ) \hat\beta=\left( \begin{array}{c} \hat\beta_1\\ \hat\beta_2\\ \vdots\\ \hat\beta_k \end{array} \right) β^=⎝⎜⎜⎜⎛β^1β^2⋮β^k⎠⎟⎟⎟⎞ Change curve of each variable in .
- Select by ridge trace method λ \lambda λ The general principle of is :
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that does not accord with the economic significance ;
(4) The sum of squared residuals does not increase much .
VIF Law ( Variance expansion factor )( Almost no use )
- Increasing λ \lambda λ, Until all β ^ \hat\beta β^ Of V I F < 10 VIF<10 VIF<10
Minimize mean square prediction error ( Most used )
- We make ⽤ K Cross validation ⽅ Method to select the best adjustment parameters . So-called K Crossover verification , It means that the sample data is randomly divided into K Bisection . Will be the first 1 individual ⼦ Sample as “ Verification set ”(validation set)⽽ Reserve No ⽤,⽽ send ⽤ rest K-1 individual ⼦ Sample as “ Training set ”(training set) To estimate this model , Based on this, we can predict the 1 individual ⼦ sample , And calculate the second 1 individual ⼦ Of the sample “ all ⽅ Prediction error ”(Mean Squared Prediction Error). secondly , Will be the first 2 individual ⼦ Sample as validation set ,⽽ send ⽤ rest K-1 individual ⼦ Samples are used as training sets to predict the 2 individual ⼦ sample , And calculate the second 2 individual ⼦ Of the sample MSPE. And so on , Will all ⼦ Of the sample MSPE Add up , You can get the of the whole sample MSPE. Last , Select adjustment parameters , Make the whole sample MSPE most ⼩, Therefore, it has the best prediction ability ⼒.
- Be careful : We need to make sure X X X Dimensional consistency , If different, standardization can be considered x i − x ˉ σ x \frac{x_i-\bar x}{\sigma_x} σxxi−xˉ
Lasso The principle of regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
- Lasso Return to ( It's used a lot ): β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k ∣ ^ β i ∣ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat|\beta_i|}]} β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1k∣^βi∣]
Lasso Regression is compared with ridge regression model , Its biggest advantage is that the regression coefficients of unimportant variables can be compressed to 0, Although ridge regression also compresses the original coefficient to a certain extent , But no coefficient will be , The final model retains the search variables . You can use the above “ Minimize mean square prediction error ” To make sure λ \lambda λ.
When to use Lasso Return to
- First use the most common OLS Regression of data , Then calculate the variance expansion factor VIF, If VIF>10 Then it shows that there is a problem of multicollinearity , So you need to filter the variables .
- Use lasso Regression filters out unimportant variables (lasso You can see the advanced version of Chen's stepwise regression )
- Judge whether the dimensions of independent variables are the same , If not, first carry out standardized pretreatment
- On variables lasso Return to , Record lasso The regression coefficient in the regression result table is not 0 The variable of , These variables are the important variables that we want to stay in the end
- Consider these important variables as independent variables , Regression , And analyze the results .( At this time, the variable can be before standardization ,lasso Regression only serves the purpose of variable screening )
边栏推荐
- Basic concept and classification of sorting
- mybati sql 语句打印
- Servlet [first introduction]
- Analyze datahub, a new generation metadata platform of 4.7K star
- 【读书笔记】《文案变现》——写出有效文案的四个黄金步骤
- Saving images of different depths in opencv
- How to verify whether the contents of two files are the same
- 彻底解决Lost connection to MySQL server at ‘reading initial communication packet
- 最新接口自动化面试题
- split(),splice(),slice()傻傻分不清楚?
猜你喜欢
MySQL index --01--- design principle of index
Cloud native annual technology inventory is released! Ride the wind and waves at the right time
[applet project development -- Jingdong Mall] user defined search component of uni app (Part 1)
Cloud native annual technology inventory is released! Ride the wind and waves at the right time
VMware vSphere 6.7虚拟化云管理之12、VCSA6.7更新vCenter Server许可
倍福TwinCAT3 Ads相关错误详细列表
POI导出excel,按照父子节点进行分级显示
A few lines of transaction codes cost me 160000 yuan
咱就是说 随便整几千个表情包为我所用一下
How to verify whether the contents of two files are the same
随机推荐
【小程序项目开发--京东商城】uni-app之自定义搜索组件(上)
限流组件设计实战
PHP batch Excel to word
几行事务代码,让我赔了16万
第03章_用戶與權限管理
文件上传下载
How the network is connected: Chapter 2 (Part 2) packet receiving and sending operations between IP and Ethernet
安装VCenter6.7【VCSA6.7(vCenter Server Appliance 6.7) 】
Redis分布式锁的8大坑
别再说不会解决 “跨域“ 问题啦
Const and the secret of pointers
How to verify whether the contents of two files are the same
家居网购项目
How to determine the progress bar loaded in the loading interface when opening the game
终极套娃 2.0 | 云原生交付的封装
Common interview questions for performance test
So easy deploy program to server
EtherCAT原理概述
如何校验两个文件内容是否相同
Analyze datahub, a new generation metadata platform of 4.7K star