当前位置:网站首页>How to prevent overfitting in cross validation
How to prevent overfitting in cross validation
2022-07-07 01:21:00 【ZEERO~】
1、 Definition of over fitting and under fitting
Over fitting It means that the model performs well in the training set , Poor performance in validation set and test set ;
Under fitting It refers to the model in the training set 、 Test set 、 The performance on the verification set is very poor .
2、 Analysis of the causes of over fitting and under fitting
2.1 Number of samples
We know , The number of samples for machine learning algorithm , Suppose the model is suitable for big data sets , The more samples, the better . When the number of samples is insufficient , Under fitting will occur , The performance of the model on the three data sets is very poor .
2.2 Model complexity
Generally speaking , When we select the model , For example, logical regression , Linear regression , The more features are used , The higher the complexity of the model . We can use feature selection algorithm , for example MRMR、 Chi square test , Rank the importance of features . Then add features in turn , Calculate the accuracy and loss function of training set and test set . We usually find that , As the number of features increases , The accuracy of the training set will gradually tend to 100%, The accuracy of the test set will gradually decline . The loss of training set will gradually decrease to 0, The loss of test sets will gradually increase . For example, , When the training set loss is 0, The test set loss is not 0 when , We know that the model must have been fitted . such , We can roughly judge whether the current model has been fitted .
3、 Why cross validation can prevent over fitting
The first thing to note is , It's not that cross validation will reduce the complexity of the model or how to prevent the model from over fitting , Instead, the behavior of cross validation allows us to evaluate whether the model is over fitted during training .
We know ,5 Fold cross validation is random 80% Data for training ,20% To verify the data . In this case , If the model has been fitted ,
边栏推荐
- Your cache folder contains root-owned files, due to a bug in npm ERR! previous versions of npm which
- THREE. AxesHelper is not a constructor
- boot - prometheus-push gateway 使用
- C # method of calculating lunar calendar date 2022
- Taro 小程序开启wxml代码压缩
- 系统休眠文件可以删除吗 系统休眠文件怎么删除
- Taro2.* 小程序配置分享微信朋友圈
- Informatics Orsay Ibn YBT 1172: find the factorial of n within 10000 | 1.6 14: find the factorial of n within 10000
- 7.6模拟赛总结
- from . cv2 import * ImportError: libGL. so. 1: cannot open shared object file: No such file or direc
猜你喜欢
1123. 最深叶节点的最近公共祖先
Let's see through the network i/o model from beginning to end
Make Jar, Not War
从底层结构开始学习FPGA----FIFO IP的定制与测试
【案例分享】网络环路检测基本功能配置
[case sharing] basic function configuration of network loop detection
AI 从代码中自动生成注释文档
Data type of pytorch tensor
How to manage distributed teams?
JTAG debugging experience of arm bare board debugging
随机推荐
SuperSocket 1.6 创建一个简易的报文长度在头部的Socket服务器
动态规划思想《从入门到放弃》
table表格设置圆角
Cause of handler memory leak
安利一波C2工具
2022 Google CTF SEGFAULT LABYRINTH wp
移植DAC芯片MCP4725驱动到NUC980
界面控件DevExpress WinForms皮肤编辑器的这个补丁,你了解了吗?
MySQL script batch queries all tables containing specified field types in the database
系统休眠文件可以删除吗 系统休眠文件怎么删除
Segmenttree
[100 cases of JVM tuning practice] 04 - Method area tuning practice (Part 1)
【案例分享】网络环路检测基本功能配置
HMM notes
What are the differences between Oracle Linux and CentOS?
[100 cases of JVM tuning practice] 05 - Method area tuning practice (Part 2)
Neon Optimization: summary of performance optimization experience
Let's see through the network i/o model from beginning to end
【芯片方案设计】脉搏血氧仪
ClickHouse字段分组聚合、按照任意时间段粒度查询SQL