当前位置:网站首页>机器学习3-岭回归,Lasso,变量选择技术
机器学习3-岭回归,Lasso,变量选择技术
2022-06-23 04:31:00 【只是甲】
文章目录
一. 岭回归
1.1 什么是岭回归
岭回归是专门用于共线性数据分析的有偏估计的回归方法,实际上是一种改良的最小二乘法,但它放弃了最小二乘的无偏性,损失部分信息,放弃部分精确度为代价来寻求效果稍差但更符合实际的回归方程。
此处介绍下岭回归的回归系数公式,B(k)=(X’X+kI)-1X’Y作为回归系数的估计值,此值比最小二乘估计稳定。称B(k)为回归系数的岭估计。显然,当k=0时,则B(k)就成为了最小二乘估计;而当k→∞时,B(k)就趋于0。因此,k值不宜太大,我们要让k值小些。

1.2 岭迹图
当不存在奇异性时,岭迹应是稳定地逐渐趋向于0
通过岭迹图观察岭估计的情况,可以判断出应该剔除哪些变量
1.3 岭回归估计的性质




1.4 岭迹分析

1.5 岭参数的一般选择原则
选择k(或lambda)值,使到
(1)各回归系数的岭估计基本稳定;
(2)用最小二乘估计时符号不合理的回归系数,其岭估计的符号变得合理;
(3)回归系数没有不合乎实际意义的绝对值;
(4)残差平方和增大不太多。
1.6 方差扩大因子法

1.7 用R语言进行岭回归
代码:
library(MASS)
longley
summary(fm1 <- lm(Employed ~ ., data = longley))
names(longley)[1] <- "y"
lm.ridge(y ~ ., longley)
plot(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))
select(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))

二. Lasso
1.1 Lasso概述
岭回归存在的问题:
- 岭参数计算方法太多,差异太大
- 根据岭迹图进行变量筛选,随意性太大
- 岭回归返回癿模型(如果没有经过变量筛选)包含所有癿变量
LASSO
Tibshirani(1996)提出了Lasso(The Least Absolute Shrinkage and Selectionatoroperator)算法
通过构造一个一阶惩罚函数获得一个精炼癿模型;通过最终确定一些指标(变量)癿系数为零(岭回归估计系数等于0癿机会微乎其微,造成筛选变量困难),解释力很强
擅长处理具有多重共线性癿数据,不岭回归一样是有偏估计
1.2 为什么LASSO能直接筛选变量

1.3 LASSO vs岭回归


1.4 更一般化的模型


1.5 弹性网
Zouand Hastie (2005)提出elasticnet

参考:
- http://www.dataguru.cn/article-4063-1.html
- https://zhuanlan.zhihu.com/p/426162272
边栏推荐
- PAT 乙等 1010 C语言
- Wireshark TS | 视频 APP 无法播放问题
- Summary of ant usage (I): using ant to automatically package apk
- jvm-06.垃圾回收器
- jvm-02. Guarantee of orderliness
- 【开源项目】excel导出lua配置表工具
- node中操作mongoDB
- What benefits have digital collections enabled the real industry to release?
- Pat class B 1026 program running time
- PAT 乙等 1018 C语言
猜你喜欢

Three most advanced certifications, two innovative technologies and two outstanding cases, Alibaba cloud appeared at the cloud native industry conference

Activity启动模式和生命周期实测结果

Layer 2技术方案进展情况

Centos7部署radius服务-freeradius-3.0.13-15.el7集成mysql

Wireshark TS | video app cannot play

gplearn出现 assignment destination is read-only
![[open source project] excel export Lua configuration table tool](/img/3a/8e831c4216494d5497928bae21523b.png)
[open source project] excel export Lua configuration table tool

jvm-06. Garbage collector

android Handler内存泄露 kotlin内存泄露处理

Summary of ant usage (I): using ant to automatically package apk
随机推荐
编址和编址单位
SSM project construction
The difference between SaaS software and traditional software delivery mode
Pat class B 1015 C language
PAT 乙等 1010 C语言
What is the magic of digital collections? Which reliable teams are currently developing
Dolphin scheduler dolphin scheduling upgrade code transformation -upgradedolphin scheduler
Behind the hot digital collections, a strong technical team is needed to support the northern technical team
Wireshark TS | video app cannot play
jvm-02. Guarantee of orderliness
Pat class B 1011 C language
Pat class B 1009 C language
Real MySQL interview questions (XXVII) -- Classification of users by RFM analysis method
Vant web app calendar component performance optimization calendar add min date the minimum date page loads slowly
Explicability of counter attack based on optimal transmission theory
Perfect squares for leetcode topic analysis
PAT 乙等 1019 C语言
vant weapp日历组件性能优化 Calendar 日历添加min-date最小日期页面加载缓慢
【Cocos2d-x】截图分享功能
使用链表实现两个多项式相加和相乘