当前位置:网站首页>Ridge regression and lasso regression
Ridge regression and lasso regression
2022-07-01 03:20:00 【weixin_ nine hundred and sixty-one million eight hundred and se】
The content of this article comes from teacher Qingfeng's explanation
- There are many ways to choose independent variables in regression , Too many variables may lead to multicollinearity, resulting in insignificant regression coefficient , Even cause OLS Estimated failure .
- Ridge return and lasso Regression in OLS The loss function of the regression model ( The sum of squared residuals SSE) Different penalties are added , The penalty term consists of a function of the regression coefficient , One side , The added penalty term can identify the unimportant variables in the model , Simplify the model , It can be seen as a step-by-step return to the promotion class ; On the other hand , The added penalty will make the model estimable , Even if the previous data does not meet the column full rank .
Principle of ridge regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
remember L = ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ L=(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta L=(y−xβ^)′(y−xβ^)+λβ^′β^, λ → 0 \lambda\to0 λ→0 when , Ridge regression and multiple linear regression are exactly the same ; λ → + ∞ \lambda\to +\infty λ→+∞ when , β ^ = 0 k × 1 \hat\beta=\boldsymbol 0_{k\times 1} β^=0k×1
in addition : ∂ L ∂ β ^ = − 2 x ′ y + 2 x ′ x β ^ + 2 λ β ^ = 0 ⇒ ( x ′ x + λ I ) β ^ = x ′ y \frac{\partial L}{\partial \hat\beta}=-2x^{'}y+2x^{'}x\hat\beta+2\lambda\hat\beta=0\Rightarrow (x^{'}x+\lambda I)\hat\beta=x^{'}y ∂β^∂L=−2x′y+2x′xβ^+2λβ^=0⇒(x′x+λI)β^=x′y
because x ′ x x^{'}x x′x Semi positive definite , be x ′ x x^{'}x x′x Eigenvalues are non negative numbers , add λ I \lambda I λI after , x ′ x + λ I x^{'}x+\lambda I x′x+λI Eigenvalues are integers , be x ′ x + λ I x^{'}x+\lambda I x′x+λI reversible , therefore β ^ = ( x ′ x + λ I ) x ′ y ( λ > 0 ) \hat\beta=(x^{'}x+\lambda I)x^{'}y\quad(\lambda>0) β^=(x′x+λI)x′y(λ>0)
How to choose λ \lambda λ
Ridge trace analysis ( It's used less )
- Concept of ridge trace : take λ \lambda λ from 0 → + ∞ 0\to+\infty 0→+∞ Variable , Got β ^ = ( β ^ 1 β ^ 2 ⋮ β ^ k ) \hat\beta=\left( \begin{array}{c} \hat\beta_1\\ \hat\beta_2\\ \vdots\\ \hat\beta_k \end{array} \right) β^=⎝⎜⎜⎜⎛β^1β^2⋮β^k⎠⎟⎟⎟⎞ Change curve of each variable in .
- Select by ridge trace method λ \lambda λ The general principle of is :
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that does not accord with the economic significance ;
(4) The sum of squared residuals does not increase much .

VIF Law ( Variance expansion factor )( Almost no use )
- Increasing λ \lambda λ, Until all β ^ \hat\beta β^ Of V I F < 10 VIF<10 VIF<10
Minimize mean square prediction error ( Most used )
- We make ⽤ K Cross validation ⽅ Method to select the best adjustment parameters . So-called K Crossover verification , It means that the sample data is randomly divided into K Bisection . Will be the first 1 individual ⼦ Sample as “ Verification set ”(validation set)⽽ Reserve No ⽤,⽽ send ⽤ rest K-1 individual ⼦ Sample as “ Training set ”(training set) To estimate this model , Based on this, we can predict the 1 individual ⼦ sample , And calculate the second 1 individual ⼦ Of the sample “ all ⽅ Prediction error ”(Mean Squared Prediction Error). secondly , Will be the first 2 individual ⼦ Sample as validation set ,⽽ send ⽤ rest K-1 individual ⼦ Samples are used as training sets to predict the 2 individual ⼦ sample , And calculate the second 2 individual ⼦ Of the sample MSPE. And so on , Will all ⼦ Of the sample MSPE Add up , You can get the of the whole sample MSPE. Last , Select adjustment parameters , Make the whole sample MSPE most ⼩, Therefore, it has the best prediction ability ⼒.
- Be careful : We need to make sure X X X Dimensional consistency , If different, standardization can be considered x i − x ˉ σ x \frac{x_i-\bar x}{\sigma_x} σxxi−xˉ
Lasso The principle of regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
- Lasso Return to ( It's used a lot ): β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k ∣ ^ β i ∣ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat|\beta_i|}]} β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1k∣^βi∣]
Lasso Regression is compared with ridge regression model , Its biggest advantage is that the regression coefficients of unimportant variables can be compressed to 0, Although ridge regression also compresses the original coefficient to a certain extent , But no coefficient will be , The final model retains the search variables . You can use the above “ Minimize mean square prediction error ” To make sure λ \lambda λ.
When to use Lasso Return to
- First use the most common OLS Regression of data , Then calculate the variance expansion factor VIF, If VIF>10 Then it shows that there is a problem of multicollinearity , So you need to filter the variables .
- Use lasso Regression filters out unimportant variables (lasso You can see the advanced version of Chen's stepwise regression )
- Judge whether the dimensions of independent variables are the same , If not, first carry out standardized pretreatment
- On variables lasso Return to , Record lasso The regression coefficient in the regression result table is not 0 The variable of , These variables are the important variables that we want to stay in the end
- Consider these important variables as independent variables , Regression , And analyze the results .( At this time, the variable can be before standardization ,lasso Regression only serves the purpose of variable screening )
边栏推荐
猜你喜欢

Basic concepts of database
![[applet project development -- Jingdong Mall] user defined search component of uni app (Part 1)](/img/73/a22ab1dbb46e743ffd5f78b40e66a2.png)
[applet project development -- Jingdong Mall] user defined search component of uni app (Part 1)

MySQL index --01--- design principle of index
![[machine learning] vectorized computing -- a must on the way of machine learning](/img/3f/d672bb254f845ea705b3a0ca10ee19.png)
[machine learning] vectorized computing -- a must on the way of machine learning

ctfshow爆破wp

【小程序项目开发-- 京东商城】uni-app之分类导航区域

Analyze datahub, a new generation metadata platform of 4.7K star

The 'mental (tiring) process' of building kubernetes/kubesphere environment with kubekey
![[applet project development -- JD mall] uni app commodity classification page (first)](/img/6c/5b92fc1f18d58e0fdf6f1896188fcd.png)
[applet project development -- JD mall] uni app commodity classification page (first)

【EXSI】主机间传输文件
随机推荐
Pytest -- plug-in writing
彻底解决Lost connection to MySQL server at ‘reading initial communication packet
MySQL index --01--- design principle of index
安装VCenter6.7【VCSA6.7(vCenter Server Appliance 6.7) 】
Druid monitoring statistics source
倍福TwinCAT3 Ads相关错误详细列表
Ctfshow blasting WP
Subnet division (10)
shell脚本使用两个横杠接收外部参数
Introduction to the core functions of webrtc -- an article to understand peerconnectionfactoryinterface rtcconfiguration peerconnectioninterface
So easy deploy program to server
EtherCAT原理概述
限流组件设计实战
Classic programming problem: finding the number of daffodils
Kmeans
Saving images of different depths in opencv
一文讲解发布者订阅者模式与观察者模式
Redis 教程
最好用的信任关系自动化脚本(shell)
Lavaweb [first understanding the solution of subsequent problems]