当前位置：网站首页>Ridge regression and lasso regression

Ridge regression and lasso regression

2022-07-01 03:20:00 【weixin_ nine hundred and sixty-one million eight hundred and se】

The content of this article comes from teacher Qingfeng's explanation

There are many ways to choose independent variables in regression , Too many variables may lead to multicollinearity, resulting in insignificant regression coefficient , Even cause OLS Estimated failure .
Ridge return and lasso Regression in OLS The loss function of the regression model ( The sum of squared residuals SSE) Different penalties are added , The penalty term consists of a function of the regression coefficient , One side , The added penalty term can identify the unimportant variables in the model , Simplify the model , It can be seen as a step-by-step return to the promotion class ; On the other hand , The added penalty will make the model estimable , Even if the previous data does not meet the column full rank .

Principle of ridge regression

Multiple linear regression ： $\hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2}$ , among $\hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'}$
Ridge return ： $\hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta]$ $\lambda$ Is a positive constant .

remember $L=(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta$ , $\lambda\to0$ when , Ridge regression and multiple linear regression are exactly the same ; $\lambda\to +\infty$ when , $\hat\beta=\boldsymbol 0_{k\times 1}$
in addition ： $\frac{\partial L}{\partial \hat\beta}=-2x^{'}y+2x^{'}x\hat\beta+2\lambda\hat\beta=0\Rightarrow (x^{'}x+\lambda I)\hat\beta=x^{'}y$
because $x^{'}x$ Semi positive definite , be $x^{'}x$ Eigenvalues are non negative numbers , add $\lambda I$ after , $x^{'}x+\lambda I$ Eigenvalues are integers , be $x^{'}x+\lambda I$ reversible , therefore $\hat\beta=(x^{'}x+\lambda I)x^{'}y\quad(\lambda>0)$

How to choose $\lambda$

Ridge trace analysis ( It's used less )

Concept of ridge trace ： take $\lambda$ from $0\to+\infty$ Variable , Got $\hat\beta=\left( \begin{array}{c} \hat\beta_1\\ \hat\beta_2\\ \vdots\\ \hat\beta_k \end{array} \right)$ Change curve of each variable in .
Select by ridge trace method $\lambda$ The general principle of is ：
（1） The ridge estimation of each regression coefficient is basically stable ;
（2） The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
（3） There is no absolute value of the regression coefficient that does not accord with the economic significance ;
（4） The sum of squared residuals does not increase much .

VIF Law （ Variance expansion factor ）（ Almost no use ）

Increasing $\lambda$ , Until all $\hat\beta$ Of $V I F < 10$

Minimize mean square prediction error （ Most used ）

We make ⽤ K Cross validation ⽅ Method to select the best adjustment parameters . So-called K Crossover verification , It means that the sample data is randomly divided into K Bisection . Will be the first 1 individual ⼦ Sample as “ Verification set ”（validation set）⽽ Reserve No ⽤,⽽ send ⽤ rest K-1 individual ⼦ Sample as “ Training set ”（training set） To estimate this model , Based on this, we can predict the 1 individual ⼦ sample , And calculate the second 1 individual ⼦ Of the sample “ all ⽅ Prediction error ”（Mean Squared Prediction Error）. secondly , Will be the first 2 individual ⼦ Sample as validation set ,⽽ send ⽤ rest K-1 individual ⼦ Samples are used as training sets to predict the 2 individual ⼦ sample , And calculate the second 2 individual ⼦ Of the sample MSPE. And so on , Will all ⼦ Of the sample MSPE Add up , You can get the of the whole sample MSPE. Last , Select adjustment parameters , Make the whole sample MSPE most ⼩, Therefore, it has the best prediction ability ⼒.
Be careful ： We need to make sure $X$ Dimensional consistency , If different, standardization can be considered $\frac{x_i-\bar x}{\sigma_x}$

Lasso The principle of regression

Multiple linear regression ： $\hat\beta=\argmin{\sum_{i=1}^n(y_i-x_i^{'}\hat\beta)^2}$ , among $\hat\beta=(\hat\beta_1,\hat\beta_2,...,\hat\beta_k)^{'}$
Ridge return ： $\hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat\beta^2_i}]}=\argmin[(y-x\hat\beta)^{'}(y-x\hat\beta)+\lambda\hat\beta^{'}\hat\beta]$ $\lambda$ Is a positive constant .
Lasso Return to （ It's used a lot ）： $\hat\beta=\argmin{\sum_{i=1}^n[(y_i-x_i^{'}\hat\beta)^2+\lambda\sum_{i=1}^k{\hat|\beta_i|}]}$

Lasso Regression is compared with ridge regression model , Its biggest advantage is that the regression coefficients of unimportant variables can be compressed to 0, Although ridge regression also compresses the original coefficient to a certain extent , But no coefficient will be , The final model retains the search variables . You can use the above “ Minimize mean square prediction error ” To make sure $\lambda$ .

When to use Lasso Return to

First use the most common OLS Regression of data , Then calculate the variance expansion factor VIF, If VIF>10 Then it shows that there is a problem of multicollinearity , So you need to filter the variables .
Use lasso Regression filters out unimportant variables （lasso You can see the advanced version of Chen's stepwise regression ）
1. Judge whether the dimensions of independent variables are the same , If not, first carry out standardized pretreatment
2. On variables lasso Return to , Record lasso The regression coefficient in the regression result table is not 0 The variable of , These variables are the important variables that we want to stay in the end
Consider these important variables as independent variables , Regression , And analyze the results .（ At this time, the variable can be before standardization ,lasso Regression only serves the purpose of variable screening ）

原网站

版权声明
本文为[weixin_ nine hundred and sixty-one million eight hundred and se]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/182/202207010308242578.html