当前位置:网站首页>How to prevent overfitting in cross validation
How to prevent overfitting in cross validation
2022-07-07 01:21:00 【ZEERO~】
1、 Definition of over fitting and under fitting
Over fitting It means that the model performs well in the training set , Poor performance in validation set and test set ;
Under fitting It refers to the model in the training set 、 Test set 、 The performance on the verification set is very poor .
2、 Analysis of the causes of over fitting and under fitting
2.1 Number of samples
We know , The number of samples for machine learning algorithm , Suppose the model is suitable for big data sets , The more samples, the better . When the number of samples is insufficient , Under fitting will occur , The performance of the model on the three data sets is very poor .
2.2 Model complexity
Generally speaking , When we select the model , For example, logical regression , Linear regression , The more features are used , The higher the complexity of the model . We can use feature selection algorithm , for example MRMR、 Chi square test , Rank the importance of features . Then add features in turn , Calculate the accuracy and loss function of training set and test set . We usually find that , As the number of features increases , The accuracy of the training set will gradually tend to 100%, The accuracy of the test set will gradually decline . The loss of training set will gradually decrease to 0, The loss of test sets will gradually increase . For example, , When the training set loss is 0, The test set loss is not 0 when , We know that the model must have been fitted . such , We can roughly judge whether the current model has been fitted .
3、 Why cross validation can prevent over fitting
The first thing to note is , It's not that cross validation will reduce the complexity of the model or how to prevent the model from over fitting , Instead, the behavior of cross validation allows us to evaluate whether the model is over fitted during training .
We know ,5 Fold cross validation is random 80% Data for training ,20% To verify the data . In this case , If the model has been fitted ,
边栏推荐
- 黑马笔记---异常处理
- The cost of returning tables in MySQL
- Asset security issues or constraints on the development of the encryption industry, risk control + compliance has become the key to breaking the platform
- Spark TPCDS Data Gen
- 「笔记」折半搜索(Meet in the Middle)
- Can the system hibernation file be deleted? How to delete the system hibernation file
- UI控件Telerik UI for WinForms新主题——VS2022启发式主题
- 736. Lisp 语法解析 : DFS 模拟题
- NEON优化:性能优化经验总结
- 剑指 Offer II 035. 最小时间差-快速排序加数据转换
猜你喜欢
[Niuke] [noip2015] jumping stone
Analysis of mutex principle in golang
Niuke cold training camp 6B (Freund has no green name level)
动态规划思想《从入门到放弃》
The MySQL database in Alibaba cloud was attacked, and finally the data was found
Dark horse notes - exception handling
免费白嫖的图床对比
"Exquisite store manager" youth entrepreneurship incubation camp - the first phase of Shunde market has been successfully completed!
Typical problems of subnet division and super network construction
[signal and system]
随机推荐
[signal and system]
Taro 小程序开启wxml代码压缩
736. Lisp 语法解析 : DFS 模拟题
What are the differences between Oracle Linux and CentOS?
Install Firefox browser on raspberry pie /arm device
数据手册中的词汇
BFS realizes breadth first traversal of adjacency matrix (with examples)
黑马笔记---创建不可变集合与Stream流
The cost of returning tables in MySQL
golang中的Mutex原理解析
2022 Google CTF segfault Labyrinth WP
Oracle:CDB限制PDB资源实战
Table table setting fillet
线段树(SegmentTree)
Supersocket 1.6 creates a simple socket server with message length in the header
In rails, when the resource creation operation fails and render: new is called, why must the URL be changed to the index URL of the resource?
Metauniverse urban legend 02: metaphor of the number one player
2022 Google CTF SEGFAULT LABYRINTH wp
HMM notes
[JS] obtain the N days before and after the current time or the n months before and after the current time (hour, minute, second, year, month, day)