当前位置:网站首页>Deep Learning Basics Overfitting, Underfitting Problems, and Regularization
Deep Learning Basics Overfitting, Underfitting Problems, and Regularization
2022-08-02 05:29:00 【hello689】
引自《统计学习方法》李航, 当假设空间含有不同复杂度(例如,不同的参数个数)的模型时,就要面临模型选择的问题.We want to choose to study a suitable or model.如果在假设空间中存在’真’模型,Then the selected model should be close to real. 具体地,The selected model to have the same number with the true model parameters,The selected model parameter vector close to the true model parameter vector.
1. 过拟合
过拟合现象:Model of the known data to predict very well,For unknown data to predict the phenomenon of poor(训练集效果好,In the test set and validation set effect is poor).
背后的原理:If the constantly pursue to the predictive ability of training data,The selected model complexity tend to be higher than the complexity of the true model.(李航-Statistical learning methods of)
From the model complexity perspective:模型过于复杂,The noise data also study in,Led to the decrease of the model generalization performance.
From the perspective of the data set is:数据集规模Relative to the model complexity too小,The features of the model of excessive mining data set.
解决过拟合常用方法:
- 增加数据集;数据增强,扩充数据,Synthesis of new data generated against network.
- 正则化方法:BN和dropout
- 添加BN层,BnTo a certain extent, can improve the model generalization.
- dropout,Some random hidden neurons,So in the process of training, it won't update every time.
- 降低模型复杂度,Can reduce network layer,To switch to participate less number of model;
- Reduce training round number,(也叫early stopping,The iterative convergence model training data sets before stop iterative,来防止过拟合.)做法:每个epoch,记录最好的结果.When the tenepoch,Fail to improve on the accuracy of test set,那就说明,The model can be truncated.
- 集成学习方法:把多个模型集成在一起,降低单一模型的过拟合风险.
- 交叉检验:这个有点复杂,几乎没用过,没有仔细了解.
2. 欠拟合
现象:Whether also in training set and test set,The effect of the model are.
原因:
- 模型过于简单;Model of learning ability is poor;
- Extraction of features is bad;When the data characteristic of the training is not、Characteristics and the existing sample label when the correlation is not strong,Fitting model easy to seen.
解决办法:
- 增加模型复杂度,Such as change the high in the linear model for nonlinear model;Add the network layer in the neural network or neuron number.
- 增加新特征:Can consider features combination such as project work.
- If the loss function to add the regular item,Can consider to reduce the regularization coefficient λ \lambda λ.
3. 正则化
写在前边:什么是正则化,不太好理解;监督学习的两个基本策略:经验风险最小化和结构风险最小化;Assuming that sample enough,So think the empirical risk minimum model is the optimal model of;When sample size is small,Empirical risk minimization to the learning effect is not very good,会产生过拟合的现象;The structural risk minimization(等价于正则化)Who had been made to fit in order to prevent.
正则化是结构风险最小化策略的实现,Is the empirical risk and add a正则化项或罚项.正则化项一般是模型复杂度的单调递增函数,模型越复杂,正则化值就越大.
Regularization item generally has the following form:
min f ∈ F 1 N ∑ i = 1 N L ( y i , f ( x i ) ) + λ J ( f ) \min _{f \in \mathcal{F}} \frac{1}{N} \sum_{i=1}^{N} L\left(y_{i}, f\left(x_{i}\right)\right)+\lambda J(f) f∈FminN1i=1∑NL(yi,f(xi))+λJ(f)
Among them is the first experience,第二项是正则化项. λ \lambda λTo adjust the coefficient between the two.
The first experience less risk model may be more complex(有多个非零参数),Then the second model complexity will be larger.正则化的作用是选择经验风险与模型复杂度同时较小的模型.
参考:李航《统计学习方法》 p18;
边栏推荐
- Zabbix删除一些大表历史数据脚本
- 复制延迟案例(3)-单调读
- Deep Blue Academy - Fourteen Lectures of Visual SLAM - Chapter 4 Homework
- 吴恩达机器学习系列课程笔记——第九章:神经网络的学习(Neural Networks: Learning)
- Jetson Nano 2GB Developer Kit Installation Instructions
- 最后写入胜利(丢弃并发写入)
- ScholarOne Manuscripts submits journal LaTeX file and cannot convert PDF successfully!
- MySQL8.0与MySQL5.7区别
- 吴恩达机器学习系列课程笔记——第八章:神经网络:表述(Neural Networks: Representation)
- CaDDN代码调试
猜你喜欢
Autowired注解与Resource注解的区别
ffmpeg推流USB到rtsp
Pycharm平台导入scikit-learn
剩余参数、数组对象的方法和字符串扩展的方法
吴恩达机器学习系列课程笔记——第九章:神经网络的学习(Neural Networks: Learning)
WIN10什么都没开内存占用率过高, WIN7单网卡设置双IP
吴恩达机器学习系列课程笔记——第十八章:应用实例:图片文字识别(Application Example: Photo OCR)
SCI writing strategy - with common English writing sentence patterns
Excel操作技巧大全
并发性,时间和相对性(1)-确定前后关系
随机推荐
Andrew Ng's Machine Learning Series Course Notes - Chapter 18: Application Example: Image Text Recognition (Application Example: Photo OCR)
MapFi paper structure organization
吴恩达机器学习系列课程笔记——第十四章:降维(Dimensionality Reduction)
温暖的世界
SCI期刊最权威的信息查询步骤!
QT中更换OPENCV版本(3->4),以及一些宏定义的改变
arr的扩展方法、数组的遍历及其他方法
Promise
深度学习基础之batch_size
视觉SLAM十四讲--第13讲 实践:设计SLAM系统(最详细的代码调试运行步骤)
科研笔记(八) 深度学习及其在 WiFi 人体感知中的应用(下)
无主复制系统(3)-Quorum一致性的局限性
Jetson Nano 2GB Developer Kit Installation Instructions
ES6中变量的使用及结构赋值
Computer Basics
ESP32-C5 简介:乐鑫首款双频 Wi-Fi 6 MCU
吴恩达机器学习系列课程笔记——第十八章:应用实例:图片文字识别(Application Example: Photo OCR)
侦听器watch及其和计算属性、methods方法的总结
使用docker-compose 安装Redis最新版,并且设置密码
Research Notes (8) Deep Learning and Its Application in WiFi Human Perception (Part 2)