当前位置:网站首页>Ridge regression and lasso regression
Ridge regression and lasso regression
2022-07-01 03:20:00 【weixin_ nine hundred and sixty-one million eight hundred and se】
The content of this article comes from teacher Qingfeng's explanation
- There are many ways to choose independent variables in regression , Too many variables may lead to multicollinearity, resulting in insignificant regression coefficient , Even cause OLS Estimated failure .
- Ridge return and lasso Regression in OLS The loss function of the regression model ( The sum of squared residuals SSE) Different penalties are added , The penalty term consists of a function of the regression coefficient , One side , The added penalty term can identify the unimportant variables in the model , Simplify the model , It can be seen as a step-by-step return to the promotion class ; On the other hand , The added penalty will make the model estimable , Even if the previous data does not meet the column full rank .
Principle of ridge regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
remember L = ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ L=(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta L=(y−xβ^)′(y−xβ^)+λβ^′β^, λ → 0 \lambda\to0 λ→0 when , Ridge regression and multiple linear regression are exactly the same ; λ → + ∞ \lambda\to +\infty λ→+∞ when , β ^ = 0 k × 1 \hat\beta=\boldsymbol 0_{k\times 1} β^=0k×1
in addition : ∂ L ∂ β ^ = − 2 x ′ y + 2 x ′ x β ^ + 2 λ β ^ = 0 ⇒ ( x ′ x + λ I ) β ^ = x ′ y \frac{\partial L}{\partial \hat\beta}=-2x^{'}y+2x^{'}x\hat\beta+2\lambda\hat\beta=0\Rightarrow (x^{'}x+\lambda I)\hat\beta=x^{'}y ∂β^∂L=−2x′y+2x′xβ^+2λβ^=0⇒(x′x+λI)β^=x′y
because x ′ x x^{'}x x′x Semi positive definite , be x ′ x x^{'}x x′x Eigenvalues are non negative numbers , add λ I \lambda I λI after , x ′ x + λ I x^{'}x+\lambda I x′x+λI Eigenvalues are integers , be x ′ x + λ I x^{'}x+\lambda I x′x+λI reversible , therefore β ^ = ( x ′ x + λ I ) x ′ y ( λ > 0 ) \hat\beta=(x^{'}x+\lambda I)x^{'}y\quad(\lambda>0) β^=(x′x+λI)x′y(λ>0)
How to choose λ \lambda λ
Ridge trace analysis ( It's used less )
- Concept of ridge trace : take λ \lambda λ from 0 → + ∞ 0\to+\infty 0→+∞ Variable , Got β ^ = ( β ^ 1 β ^ 2 ⋮ β ^ k ) \hat\beta=\left( \begin{array}{c} \hat\beta_1\\ \hat\beta_2\\ \vdots\\ \hat\beta_k \end{array} \right) β^=⎝⎜⎜⎜⎛β^1β^2⋮β^k⎠⎟⎟⎟⎞ Change curve of each variable in .
- Select by ridge trace method λ \lambda λ The general principle of is :
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that does not accord with the economic significance ;
(4) The sum of squared residuals does not increase much .
VIF Law ( Variance expansion factor )( Almost no use )
- Increasing λ \lambda λ, Until all β ^ \hat\beta β^ Of V I F < 10 VIF<10 VIF<10
Minimize mean square prediction error ( Most used )
- We make ⽤ K Cross validation ⽅ Method to select the best adjustment parameters . So-called K Crossover verification , It means that the sample data is randomly divided into K Bisection . Will be the first 1 individual ⼦ Sample as “ Verification set ”(validation set)⽽ Reserve No ⽤,⽽ send ⽤ rest K-1 individual ⼦ Sample as “ Training set ”(training set) To estimate this model , Based on this, we can predict the 1 individual ⼦ sample , And calculate the second 1 individual ⼦ Of the sample “ all ⽅ Prediction error ”(Mean Squared Prediction Error). secondly , Will be the first 2 individual ⼦ Sample as validation set ,⽽ send ⽤ rest K-1 individual ⼦ Samples are used as training sets to predict the 2 individual ⼦ sample , And calculate the second 2 individual ⼦ Of the sample MSPE. And so on , Will all ⼦ Of the sample MSPE Add up , You can get the of the whole sample MSPE. Last , Select adjustment parameters , Make the whole sample MSPE most ⼩, Therefore, it has the best prediction ability ⼒.
- Be careful : We need to make sure X X X Dimensional consistency , If different, standardization can be considered x i − x ˉ σ x \frac{x_i-\bar x}{\sigma_x} σxxi−xˉ
Lasso The principle of regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
- Lasso Return to ( It's used a lot ): β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k ∣ ^ β i ∣ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat|\beta_i|}]} β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1k∣^βi∣]
Lasso Regression is compared with ridge regression model , Its biggest advantage is that the regression coefficients of unimportant variables can be compressed to 0, Although ridge regression also compresses the original coefficient to a certain extent , But no coefficient will be , The final model retains the search variables . You can use the above “ Minimize mean square prediction error ” To make sure λ \lambda λ.
When to use Lasso Return to
- First use the most common OLS Regression of data , Then calculate the variance expansion factor VIF, If VIF>10 Then it shows that there is a problem of multicollinearity , So you need to filter the variables .
- Use lasso Regression filters out unimportant variables (lasso You can see the advanced version of Chen's stepwise regression )
- Judge whether the dimensions of independent variables are the same , If not, first carry out standardized pretreatment
- On variables lasso Return to , Record lasso The regression coefficient in the regression result table is not 0 The variable of , These variables are the important variables that we want to stay in the end
- Consider these important variables as independent variables , Regression , And analyze the results .( At this time, the variable can be before standardization ,lasso Regression only serves the purpose of variable screening )
边栏推荐
- 一文讲解发布者订阅者模式与观察者模式
- Example of Huawei operator level router configuration | example of configuring optionc mode cross domain LDP VPLS
- 【小程序项目开发-- 京东商城】uni-app之分类导航区域
- 访问url 404 的错误
- If a parent class defines a parameterless constructor, is it necessary to call super ()?
- JS日常开发小技巧(持续更新)
- Introduction and installation of Solr
- C language EXECL function
- 实战 ELK 优雅管理服务器日志
- Common interview questions for performance test
猜你喜欢
So easy 将程序部署到服务器
EtherCAT原理概述
Const and the secret of pointers
MySQL index --01--- design principle of index
Stop saying that you can't solve the "cross domain" problem
ctfshow爆破wp
[QT] add knowledge supplement of third-party database
【EXSI】主机间传输文件
The 'mental (tiring) process' of building kubernetes/kubesphere environment with kubekey
实战 ELK 优雅管理服务器日志
随机推荐
How to determine the progress bar loaded in the loading interface when opening the game
C#实现基于广度优先BFS求解无权图最短路径----完整程序展示
【小程序项目开发 -- 京东商城】uni-app 商品分类页面(上)
雪崩问题以及sentinel的使用
如何校验两个文件内容是否相同
多元线性回归
Chapitre 03 Bar _ Gestion des utilisateurs et des droits
Introduction to the core functions of webrtc -- an article to understand peerconnectionfactoryinterface rtcconfiguration peerconnectioninterface
倍福TwinCAT3 Ads相关错误详细列表
【小程序项目开发--京东商城】uni-app之自定义搜索组件(上)
[exsi] transfer files between hosts
一文讲解发布者订阅者模式与观察者模式
Best used trust automation script (shell)
Classic programming problem: finding the number of daffodils
leetcode 1482 猜猜看啊,这道题目怎么二分?
VMware vSphere 6.7虚拟化云管理之12、VCSA6.7更新vCenter Server许可
数据交换 JSON
Redis分布式锁的8大坑
Druid监控统计数据源
Poj-3486-computers[dynamic planning]