当前位置:网站首页>Machine learning 3-ridge regression, Lasso, variable selection technique
Machine learning 3-ridge regression, Lasso, variable selection technique
2022-06-23 06:11:00 【Just a】
List of articles
One . Ridge return
1.1 What is ridge regression
Ridge regression is a regression method of biased estimation specially used for collinear data analysis , It is actually an improved least square method , But it gives up the unbiasedness of the least squares , Loss of some information , Give up part of the accuracy to seek a regression equation that is less effective but more in line with the reality .
Here we introduce the regression coefficient formula of downhill regression ,B(k)=(X’X+kI)-1X’Y As an estimate of the regression coefficient , This value is more stable than the least square estimate . call B(k) Ridge estimation of regression coefficient . obviously , When k=0 when , be B(k) It becomes the least square estimation ; And when k→∞ when ,B(k) It tends to 0. therefore ,k The value should not be too large , We're going to let k Lower value .

1.2 Ridge trace figure
When there is no singularity , The ridge trace should be stable and gradually tend to 0
Observe the ridge estimation through the ridge trace map , You can determine which variables should be eliminated 
1.3 Properties of ridge regression estimation




1.4 Ridge trace analysis

1.5 General selection principle of ridge parameters
choice k( or lambda) value , To cause to
(1) The ridge estimation of each regression coefficient is basically stable ;
(2) The regression coefficient with unreasonable sign when using least square estimation , The sign of its ridge estimation becomes reasonable ;
(3) There is no absolute value of the regression coefficient that is not in line with the actual meaning ;
(4) The sum of squared residuals does not increase much .
1.6 Variance expansion factor method

1.7 use R Language carries on ridge regression
Code :
library(MASS)
longley
summary(fm1 <- lm(Employed ~ ., data = longley))
names(longley)[1] <- "y"
lm.ridge(y ~ ., longley)
plot(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))
select(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))

Two . Lasso
1.1 Lasso summary
The problems of ridge regression :
- There are too many calculation methods for ridge parameters , The difference is too big
- The variables were screened according to ridge trace map , Too random
- Ridge regression regression regression model ( If there is no variable filter ) Include all variables
LASSO
Tibshirani(1996) Put forward Lasso(The Least Absolute Shrinkage and Selectionatoroperator) Algorithm
By constructing a first-order penalty function, a refined model is obtained ; By finalizing some indicators ( Variable ) The coefficient is zero ( The ridge regression estimation coefficient is equal to 0 There is little chance of death , Make it difficult to filter variables ), Strong explanatory power
Good at processing data with multicollinearity , Not the same as ridge regression is biased estimation
1.2 Why? LASSO Can filter variables directly

1.3 LASSO vs Ridge return


1.4 A more general model


1.5 Elastic net
Zouand Hastie (2005) Put forward elasticnet

Reference resources :
- http://www.dataguru.cn/article-4063-1.html
- https://zhuanlan.zhihu.com/p/426162272
边栏推荐
猜你喜欢

Addressing and addressing units

jvm-01. Instruction rearrangement

Runc symbolic link mount and container escape vulnerability alert (cve-2021-30465)

Adnroid activity screenshot save display to album view display picture animation disappear

Efficient office of fintech (I): automatic generation of trust plan specification

Summary of ant usage (I): using ant to automatically package apk

十一、纺织面料下架功能的实现

How to specify the output path of pig register Project Log

Visual studio debugging tips

Pyqt5 setting window top left Icon
随机推荐
The hierarchyviewer tool cannot find the hierarchyviewer location
Pat class B 1018 C language
【Cocos2d-x】截图分享功能
给定二叉树的某个节点,返回该节点的后继节点
Pat class B 1023 minimum decimals
Kotlin Android simple activity jump, simple combination of handler and thread
APP SHA1获取程序 百度地图 高德地图获取SHA1值的简单程序
Cryptography series: certificate format representation of PKI X.509
Day_13 传智健康项目-第13章
Work accumulation - judge whether GPS is on
Dolphin scheduler dolphin scheduling upgrade code transformation -upgradedolphin scheduler
jvm-05. garbage collection
工作积累-判断GPS是否打开
jvm-02.有序性保证
True MySQL interview question (24) -- row column exchange
Prometheus, incluxdb2.2 installation and flume_ Export download compile use
Wireshark TS | video app cannot play
JVM原理简介
True MySQL interview question (XXII) -- condition screening and grouping screening after table connection
去除防火墙和虚拟机对live555启动IP地址的影响