当前位置：网站首页>[machine learning Q & A] data sampling and model verification methods, hyperparametric optimization, over fitting and under fitting problems

[machine learning Q & A] data sampling and model verification methods, hyperparametric optimization, over fitting and under fitting problems

2022-06-30 01:25:00 【Sickle leek】

Data sampling and model validation methods 、 Hyperparametric optimization and over fitting and under fitting problems

Sample data sampling and model validation methods
- problem 1： In the process of model evaluation , What are the main verification methods , What are their advantages and disadvantages ？
- problem 2： In the process of bootstrap sampling , Yes n A sample of n Second self-service sampling , When n As we go to infinity , In the end, how many pieces of data have never been selected ？
Super parameter tuning
- problem 1： What are the tuning methods for hyper parameters ？
Over fitting and under fitting problems
- problem 1： In the process of model evaluation , What phenomenon does over fitting and under fitting refer to ？
- problem 2： What can be done to reduce the risk of over fitting and under fitting ？
Reference material

Sample data sampling and model validation methods

In machine learning , The samples are usually divided into training set and test set , The training set is used to train the model , Test sets are used to evaluate models . In the process of sample division and model verification , There are different sampling and verification methods . that

problem 1： In the process of model evaluation , What are the main verification methods , What are their advantages and disadvantages ？

（1）Holdout test ：
Holdout test It is the simplest and most direct verification method , It randomly divides the original sample set into Training set and Verification set Two parts .
For example , For a click through rate prediction model , Let's take the sample according to 70%~30% The proportion is divided into two parts ,70% Samples for model training .30% For model validation , Including drawing ROC curve , Calculate the accuracy rate and recall rate to evaluate the model performance .

Holdout The shortcomings of the test are obvious ： namely The final evaluation index calculated on the verification set has a great relationship with the original group . To eliminate this randomness , The researchers introduced “ Cross examination ”．
（2） Cross examination
k-fold Cross examination ： First, divide all the samples into $k$ Sample subsets of equal size . Traverse this one by one $k$ A subset of , Take the current subset as the validation set every time , All the other subsets as training sets , Model training and evaluation . Finally, put $k$ The average value of the times' evaluation index is taken as the final evaluation index . In a practical experiment , $k$ Frequent value 10.
Leave a verification ： Every time I stay 1 Samples as validation set , All other samples are used as training sets , The total number of samples is n, According to this n Samples for traversal , Conduct n Time verification , Then average the evaluation index to the final evaluation index .
In the case of a large number of samples , It takes a lot of time to leave a validation method . in fact , To keep a verification is to keep $p$ Special case of verification , leave $p$ Verification is to stay... Every time $p$ Samples as validation set , from n Of the elements $p$ The elements are $C_n^p$ Possibilities , So its time cost is much higher than leaving a verification , So I seldom use .
（3） Self help law
Whether it's Holdout Test or cross test , Are based on partition Training set and Test set Method for model evaluation . then , When the sample size is small , Dividing the sample set will further reduce the training set , This may affect the effect of model training . Is there a verification method that can maintain the sample size of the training set ？ Self help method can better solve this problem ;

Self help law Inspection method based on self-service sampling method . For a total of n A sample set of , Conduct n Time Put it back Random sampling , The magnitude is n Training set of .n During subsampling , Some samples will be repeated , Some samples have not been taken , Take these samples that have not been sampled as the validation set , Model validation , This is the self-help verification process .

problem 2： In the process of bootstrap sampling , Yes n A sample of n Second self-service sampling , When n As we go to infinity , In the end, how many pieces of data have never been selected ？

The probability that a sample is not selected in a sampling process is $(1-\frac{1}{n})$ ,n The probability that every sampling is successful is $(1-\frac{1}{n})^n$ . When n As we go to infinity , The probability of $\lim_{x \to \infty} (1-\frac{1}{n})^n$ . According to the important limit ： $\lim_{n\to \infty}(1+\frac{1}{n})^n=e$ , So there is
$\lim_{x \to \infty} (1-\frac{1}{n})^n=\lim_{n \to \infty} (\frac{n-1}{n})^n=\lim_{n\to \infty}(\frac{1}{\frac{n}{n-1}})^n=\lim_{n \to \infty}\frac{1}{(1+\frac{1}{n-1})^n}=\frac{1}{\lim_{n\to \infty} (1+\frac{1}{n-1})^{n-1}}\cdot \frac{1}{\lim_{n\to \infty}(1+\frac{1}{n-1})}=\frac{1}{e}\approx 0.368$

therefore , When the number of samples is large , There are about 36.8% A sample of has never been selected , Can be used as a validation set .

Super parameter tuning

problem 1： What are the tuning methods for hyper parameters ？

For super parameter tuning , We usually use The grid search 、 Random search 、 Bayesian optimization And so on . Before we introduce the algorithm , It is necessary to make clear which elements the super parameter search algorithm generally includes .

One is the objective function , That is, the algorithm needs to be maximized / The goal of minimization ;
The second is the search scope , It is usually determined by the upper and lower limits ;
Third, other parameters of the algorithm , Such as search step .

（1） The grid search
Grid search is probably the easiest 、 The most widely used super parameter search algorithm . it Determine the optimal value by finding all points in the search range . If you use a larger search range and a smaller step size , Grid search has a high probability to find the global optimal value . However , This search scheme It consumes a lot of computing resources and time , Especially when there are many super parameters to tune . therefore , in application , The grid search method generally uses a wide search range and a large step length first , To find the possible location of the global optimal value ; The search scope and step size will be gradually reduced , To find a more accurate optimal value . This kind of operation scheme can reduce the time and computation required , but Because the objective function is generally nonconvex , So it's likely to miss the global optimum .
（2） Random search
The idea of random search is similar to that of grid search , Just don 't test all the values between the upper and lower bounds anymore , It is Randomly select sample points in the search range . Its theoretical basis is ： If the sample set is large enough , Then we can find the global optimal value approximately by random sampling , Or its approximate value . Random search is generally faster than grid search , But like the fast version of grid search , its The result is not guaranteed .
（3） Bayesian Optimization Algorithm
When Bayesian optimization algorithm is looking for the optimal maximum parameter , Used with grid search 、 Random search is a totally different way . Grid search and random search when testing a new point , Ignore the information from the previous point ; and Bayesian optimization algorithm makes full use of the previous information . The Bayesian optimization algorithm learns the shape of the objective function , Find the parameter that makes the objective function improve to the global optimal value .
say concretely , The way it learns the shape of the objective function is

First, according to the prior distribution , Suppose a collection function ;
then , Every time a new sample point is used to test the objective function , Use this information to update the prior distribution of the objective function ;
Last , The algorithm tests the most likely position of the global maximum given by the posterior distribution .

For Bayesian Optimization Algorithm , There's a place to pay attention to , Once a local optimal value is found , It's going to be sampling the area , therefore It is easy to fall into a local optimum . To make up for this defect , Bayesian optimization algorithm will find a balance between exploration and utilization ,“ Explore ” It is to obtain the sampling point in the area that has not been sampled ; and “ utilize ” Sampling is done according to the posterior distribution in the region where the global maximum is most likely to occur .

Over fitting and under fitting problems

In the process of model evaluation and adjustment , Often meet ” Over fitting “ and ” Under fitting “ The situation of . that

problem 1： In the process of model evaluation , What phenomenon does over fitting and under fitting refer to ？

Over fitting refers to the situation that the model is over fitted to the training data , Reflect on the evaluation indicators , It's just that the model performs well in the training set , But it doesn't perform well on test sets and new data .
Under fitting means that the model does not perform well in training and prediction , Reflect on the evaluation indicators , The performance of the model in the training set and the test set is not good .
Over fitting and under fitting

problem 2： What can be done to reduce the risk of over fitting and under fitting ？

（1） Reduce “ Over fitting ” Risk management approach

Start with the data , Get more data . Using more training data is the most effective way to solve the over fitting problem . Because more samples can let the model learn more and more effective features , Noise reduction . Of course , It is generally difficult to directly add experimental data , But we can extend the training data through certain rules . such as , On the problem of image classification , Through the translation of the image 、 rotate 、 Expand data by zooming, etc ; Further more , Use generative countermeasure networks to synthesize large amounts of new training data .
Reduce model complexity . When there is less data , Too complex model is the main factor of over fitting . Reducing the complexity of the model can avoid too much sampling noise . for example ： Reduce the number of network layers in the neural network model 、 Number of neurons ; Reduce the depth of the tree in the decision tree model 、 Prune, etc .
Regularization method . Add some regular constraints to the parameters of the model , For example, add the weight to the loss function . With L2 Take regularization as an example ：
$C=C_0+\frac{\lambda}{2n}\cdot \sum_{i}w_i^2$
such , In optimizing the original objective function $C_0$ At the same time , It can also avoid the risk of over fitting caused by too large weight .
Integrated learning methods . Integrated learning is the integration of multiple models , The first mock exam is to reduce the risk of over fitting of a single model . Such as Bagging Method .

（2） Reduce “ Under fitting ” Risk management approach

Add new features . When the features are insufficient or the correlation between the existing features and the sample tag is not strong , The model is prone to under fitting . By digging “ Contextual features ”、“ID Class characteristics ”、“ Combination features ” Wait for new features , Often can achieve better results . In the trend of deep learning , There are many models that can help with Feature Engineering , Such as factorizer 、 Gradient lift decision tree 、Deep-crossing And so on can be a way to enrich features .
Increase model complexity . The learning ability of simple model is poor , By increasing the complexity of the model, it can make the model have stronger fitting ability . for example , Add a higher-order term... To the linear model , Increase the number of network layers or neurons in the neural network model .
Reduce the regularization coefficient . Regularization is used to prevent over fitting , When the model is under fitted , It is necessary to reduce the regularization coefficient .

Reference material

[1] 《 Baimian machine learning 》 Chapter two Model to evaluate

原网站

版权声明
本文为[Sickle leek]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206300123519544.html