当前位置:网站首页>Ridge regression and lasso regression
Ridge regression and lasso regression
2022-07-01 03:20:00 【weixin_ nine hundred and sixty-one million eight hundred and se】
The content of this article comes from teacher Qingfeng's explanation
- There are many ways to choose independent variables in regression , Too many variables may lead to multicollinearity, resulting in insignificant regression coefficient , Even cause OLS Estimated failure .
- Ridge return and lasso Regression in OLS The loss function of the regression model ( The sum of squared residuals SSE) Different penalties are added , The penalty term consists of a function of the regression coefficient , One side , The added penalty term can identify the unimportant variables in the model , Simplify the model , It can be seen as a step-by-step return to the promotion class ; On the other hand , The added penalty will make the model estimable , Even if the previous data does not meet the column full rank .
Principle of ridge regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
remember L = ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ L=(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta L=(y−xβ^)′(y−xβ^)+λβ^′β^, λ → 0 \lambda\to0 λ→0 when , Ridge regression and multiple linear regression are exactly the same ; λ → + ∞ \lambda\to +\infty λ→+∞ when , β ^ = 0 k × 1 \hat\beta=\boldsymbol 0_{k\times 1} β^=0k×1
in addition : ∂ L ∂ β ^ = − 2 x ′ y + 2 x ′ x β ^ + 2 λ β ^ = 0 ⇒ ( x ′ x + λ I ) β ^ = x ′ y \frac{\partial L}{\partial \hat\beta}=-2x^{'}y+2x^{'}x\hat\beta+2\lambda\hat\beta=0\Rightarrow (x^{'}x+\lambda I)\hat\beta=x^{'}y ∂β^∂L=−2x′y+2x′xβ^+2λβ^=0⇒(x′x+λI)β^=x′y
because x ′ x x^{'}x x′x Semi positive definite , be x ′ x x^{'}x x′x Eigenvalues are non negative numbers , add λ I \lambda I λI after , x ′ x + λ I x^{'}x+\lambda I x′x+λI Eigenvalues are integers , be x ′ x + λ I x^{'}x+\lambda I x′x+λI reversible , therefore β ^ = ( x ′ x + λ I ) x ′ y ( λ > 0 ) \hat\beta=(x^{'}x+\lambda I)x^{'}y\quad(\lambda>0) β^=(x′x+λI)x′y(λ>0)
How to choose λ \lambda λ
Ridge trace analysis ( It's used less )
- Concept of ridge trace : take λ \lambda λ from 0 → + ∞ 0\to+\infty 0→+∞ Variable , Got β ^ = ( β ^ 1 β ^ 2 ⋮ β ^ k ) \hat\beta=\left( \begin{array}{c} \hat\beta_1\\ \hat\beta_2\\ \vdots\\ \hat\beta_k \end{array} \right) β^=⎝⎜⎜⎜⎛β^1β^2⋮β^k⎠⎟⎟⎟⎞ Change curve of each variable in .
- Select by ridge trace method λ \lambda λ The general principle of is :
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that does not accord with the economic significance ;
(4) The sum of squared residuals does not increase much .

VIF Law ( Variance expansion factor )( Almost no use )
- Increasing λ \lambda λ, Until all β ^ \hat\beta β^ Of V I F < 10 VIF<10 VIF<10
Minimize mean square prediction error ( Most used )
- We make ⽤ K Cross validation ⽅ Method to select the best adjustment parameters . So-called K Crossover verification , It means that the sample data is randomly divided into K Bisection . Will be the first 1 individual ⼦ Sample as “ Verification set ”(validation set)⽽ Reserve No ⽤,⽽ send ⽤ rest K-1 individual ⼦ Sample as “ Training set ”(training set) To estimate this model , Based on this, we can predict the 1 individual ⼦ sample , And calculate the second 1 individual ⼦ Of the sample “ all ⽅ Prediction error ”(Mean Squared Prediction Error). secondly , Will be the first 2 individual ⼦ Sample as validation set ,⽽ send ⽤ rest K-1 individual ⼦ Samples are used as training sets to predict the 2 individual ⼦ sample , And calculate the second 2 individual ⼦ Of the sample MSPE. And so on , Will all ⼦ Of the sample MSPE Add up , You can get the of the whole sample MSPE. Last , Select adjustment parameters , Make the whole sample MSPE most ⼩, Therefore, it has the best prediction ability ⼒.
- Be careful : We need to make sure X X X Dimensional consistency , If different, standardization can be considered x i − x ˉ σ x \frac{x_i-\bar x}{\sigma_x} σxxi−xˉ
Lasso The principle of regression
- Multiple linear regression : β ^ = arg min ∑ i = 1 n ( y i − x i ′ β ^ ) 2 \hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2} β^=argmin∑i=1n(yi−xi′β^)2, among β ^ = ( β ^ 1 , β ^ 2 , . . . , β ^ k ) ′ \hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'} β^=(β^1,β^2,...,β^k)′
- Ridge return : β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k β ^ i 2 ] = arg min [ ( y − x β ^ ) ′ ( y − x β ^ ) + λ β ^ ′ β ^ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta] β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1kβ^i2]=argmin[(y−xβ^)′(y−xβ^)+λβ^′β^] λ \lambda λ Is a positive constant .
- Lasso Return to ( It's used a lot ): β ^ = arg min ∑ i = 1 n [ ( y i − x i ′ β ^ ) 2 + λ ∑ i = 1 k ∣ ^ β i ∣ ] \hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat|\beta_i|}]} β^=argmin∑i=1n[(yi−xi′β^)2+λ∑i=1k∣^βi∣]
Lasso Regression is compared with ridge regression model , Its biggest advantage is that the regression coefficients of unimportant variables can be compressed to 0, Although ridge regression also compresses the original coefficient to a certain extent , But no coefficient will be , The final model retains the search variables . You can use the above “ Minimize mean square prediction error ” To make sure λ \lambda λ.
When to use Lasso Return to
- First use the most common OLS Regression of data , Then calculate the variance expansion factor VIF, If VIF>10 Then it shows that there is a problem of multicollinearity , So you need to filter the variables .
- Use lasso Regression filters out unimportant variables (lasso You can see the advanced version of Chen's stepwise regression )
- Judge whether the dimensions of independent variables are the same , If not, first carry out standardized pretreatment
- On variables lasso Return to , Record lasso The regression coefficient in the regression result table is not 0 The variable of , These variables are the important variables that we want to stay in the end
- Consider these important variables as independent variables , Regression , And analyze the results .( At this time, the variable can be before standardization ,lasso Regression only serves the purpose of variable screening )
边栏推荐
- Let's just say I can use thousands of expression packs
- Is it safe to open an account online in a small securities firm? Will my money be unsafe?
- Communication protocol -- Classification and characteristics Introduction
- Leetcode 1482 guess, how about this question?
- MySQL knowledge points
- 雪崩问题以及sentinel的使用
- Force buckle - sum of two numbers
- JUC learning
- A shooting training method based on the digital measurement of Joule energy and posture of sphygmomanometer air bag with standard air pressure
- EtherCAT简介
猜你喜欢

STM32——一线协议之DS18B20温度采样

几行事务代码,让我赔了16万

【小程序项目开发-- 京东商城】uni-app之首页商品楼层

Huawei operator level router configuration example | configuration static VPLS example

Huawei operator level router configuration example | configuration optionA mode cross domain LDP VPLS example

Huawei operator level router configuration example | BGP VPLS configuration example

STM32 - DS18B20 temperature sampling of first-line protocol

倍福TwinCAT3 Ads相关错误详细列表

Mysql知识点

Stop saying that you can't solve the "cross domain" problem
随机推荐
[applet project development -- JD mall] uni app commodity classification page (first)
go实现命令行的工具cli
mybati sql 语句打印
Druid监控统计数据源
How the network is connected: Chapter 2 (Part 2) packet receiving and sending operations between IP and Ethernet
How to determine the progress bar loaded in the loading interface when opening the game
PHP batch Excel to word
Feign远程调用和Getaway网关
pytest-fixture
If a parent class defines a parameterless constructor, is it necessary to call super ()?
STM32——一线协议之DS18B20温度采样
So easy 将程序部署到服务器
【EXSI】主机间传输文件
终极套娃 2.0 | 云原生交付的封装
C#实现图的深度优先遍历--非递归代码
家居网购项目
C # realize solving the shortest path of unauthorized graph based on breadth first BFS -- complete program display
How to achieve 0 error (s) and 0 warning (s) in keil5
打包iso文件的话,怎样使用hybrid格式输出?isohybrid:command not found
Redis 教程