当前位置:网站首页>Summary: Cross Validation
Summary: Cross Validation
2022-08-11 05:33:00 【weiAweiww】
Table of Contents
WHAT
Cross-Validation, or CV for short.
Also called circular estimation, it is a method used to statistically cut a data sample into smaller subsets.
Introduce three nouns:
Training set: learn the sample data set and match the parameters to establish the model.
Validation set: Adjust the parameters of the trained model, and are also used to determine the network structure or parameters that control the complexity of the model.
Test Set: Test the model.
Three important indicators:
Bias: Accuracy.The degree of deviation between the expected prediction of the learned model and the actual result (difference between the average predicted value and the actual value), which is used to describe the fit degree of the algorithm itself.
Variance: Stability.The performance change (expectation of the square of the difference between the predicted value and the average predicted value) during training with different training sets of the same size is used to characterize the impact of data perturbations.
Error: The accuracy of the entire model.
Note:
1, Error=Bias^2+Variance+Noise
2. Bias and Variance are often not compatible.Both are low, which is an ideal state (see the figure below), but if you want to reduce the Bias, the Variance will increase to a certain extent, and vice versa.
The root cause: We prefer to use a limited sample data set to estimate and predict an infinite real data set.When we continuously improve the accuracy of the model (Bias is reduced), overfitting will occur, the generalization ability of the model will be reduced, the performance of the model in real data will be reduced, and the uncertainty of the model will be increased (Variance increases).Conversely, if more restrictions are added in the process of learning the model, the Variance of the model can be reduced (Variance reduction) and the stability of the model can be improved, but the Bias of the model will be increased (Bias increased).
Summary: Overfitting has high bias and underfitting has high variance.
So, how to avoid these two extreme cases???
(1) Avoid underfitting: find better features (representative), use more features (increase the dimension of the input vector).
(2) Avoid overfitting: increase the data set (reduce the proportion of noise), reduce data features (reduce data dimension), regularization method (add a regular term to the objective function or cost function), Cross-validation method (the key part of this post)
Three CV methods: Hold-out Method, K-fold Cross Validation, Leave-One-Out Cross Validation
Details about K-fold Cross Validation here

1. Divide the original data into k groups (usually equally divided),
Each subset is used as a validation set, and the remaining k-1 sets of subset data are used as training set
Get k models
2. Use the average of the classification accuracy of the final validation set of the k models as the performance index of the classifier under this k-CV
3. Evaluate the effect of the k models and pick the best hyperparameters (hyperparameters are parameters that set values before starting the learning process, not parameter data obtained through training.).
4. Use the optimal hyperparameters, and then retrain the model with all the k data as the training set to obtain the final model.
WHY
1. Cross-validation is used to evaluate the prediction performance of the model, especially the performance of the trained model on new data, which can reduce overfitting to a certain extent.
2. Obtain as much effective information as possible from limited data.
3. A convenient technique to measure model performance using only the training set, instead of using the test set after modeling.
边栏推荐
- Redis详解
- 普林斯顿微积分读本05第四章--求解多项式的极限问题
- 金仓数据库 KingbaseGIS 使用手册(6.10. 几何对象操作运算符)
- Idea essential skills to improve work efficiency
- 基础数据之double和float区别
- 总结:交叉验证
- MFC 进程间通信(共享内存)
- 玩转mysql之查看mysql版本号
- [Embedded open source library] The use of MultiButton, an easy-to-use event-driven button driver module
- leetcode 9. Palindromic Numbers
猜你喜欢
随机推荐
【无标题】2022年胺基化工艺考试题模拟考试题库及在线模拟考试
基础数据之double和float区别
实战noVNC全过程操作(包含遇到的问题和解决)
Difference between @Resource and @Autowired
flaks框架学习:在 URL 中添加变量
【嵌入式开源库】cJSON的使用,高效精简的json解析库
pytorch和tensorflow函数对应表
nodes服务器
CentOS7静默安装Oracle11g_转载
四大函数式接口
Golden Warehouse Database KingbaseGIS User Manual (6.10. Geometric Object Operation Operator)
批量修改数据库等视频文件名称
注解式编程小记
判断一个字符串是否为空,如果为空,对其赋值,如果不为空,获取字符的个数并打印第一个字符
postman脚本的应用
redis连接idea
guava RateLimiter uniform current limit
Oracle常用语句归纳_持续更新
搭建PX4开发环境
BGP Comprehensive Experiment










