当前位置:网站首页>How to prevent overfitting in cross validation

How to prevent overfitting in cross validation

2022-07-07 01:21:00 ZEERO~

1、 Definition of over fitting and under fitting

Over fitting It means that the model performs well in the training set , Poor performance in validation set and test set ;
Under fitting It refers to the model in the training set 、 Test set 、 The performance on the verification set is very poor .

2、 Analysis of the causes of over fitting and under fitting

2.1 Number of samples

We know , The number of samples for machine learning algorithm , Suppose the model is suitable for big data sets , The more samples, the better . When the number of samples is insufficient , Under fitting will occur , The performance of the model on the three data sets is very poor .

2.2 Model complexity

Generally speaking , When we select the model , For example, logical regression , Linear regression , The more features are used , The higher the complexity of the model . We can use feature selection algorithm , for example MRMR、 Chi square test , Rank the importance of features . Then add features in turn , Calculate the accuracy and loss function of training set and test set . We usually find that , As the number of features increases , The accuracy of the training set will gradually tend to 100%, The accuracy of the test set will gradually decline . The loss of training set will gradually decrease to 0, The loss of test sets will gradually increase . For example, , When the training set loss is 0, The test set loss is not 0 when , We know that the model must have been fitted . such , We can roughly judge whether the current model has been fitted .

3、 Why cross validation can prevent over fitting

The first thing to note is , It's not that cross validation will reduce the complexity of the model or how to prevent the model from over fitting , Instead, the behavior of cross validation allows us to evaluate whether the model is over fitted during training .
We know ,5 Fold cross validation is random 80% Data for training ,20% To verify the data . In this case , If the model has been fitted ,

原网站

版权声明
本文为[ZEERO~]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207061734478786.html