当前位置:网站首页>[machine learning Q & A] data sampling and model verification methods, hyperparametric optimization, over fitting and under fitting problems
[machine learning Q & A] data sampling and model verification methods, hyperparametric optimization, over fitting and under fitting problems
2022-06-30 01:25:00 【Sickle leek】
Data sampling and model validation methods 、 Hyperparametric optimization and over fitting and under fitting problems
- Sample data sampling and model validation methods
- problem 1: In the process of model evaluation , What are the main verification methods , What are their advantages and disadvantages ?
- problem 2: In the process of bootstrap sampling , Yes n A sample of n Second self-service sampling , When n As we go to infinity , In the end, how many pieces of data have never been selected ?
- Super parameter tuning
- Over fitting and under fitting problems
- Reference material
Sample data sampling and model validation methods
In machine learning , The samples are usually divided into training set and test set , The training set is used to train the model , Test sets are used to evaluate models . In the process of sample division and model verification , There are different sampling and verification methods . that
problem 1: In the process of model evaluation , What are the main verification methods , What are their advantages and disadvantages ?
(1)Holdout test :Holdout test It is the simplest and most direct verification method , It randomly divides the original sample set into Training set and Verification set Two parts .
For example , For a click through rate prediction model , Let's take the sample according to 70%~30% The proportion is divided into two parts ,70% Samples for model training .30% For model validation , Including drawing ROC curve , Calculate the accuracy rate and recall rate to evaluate the model performance .
Holdout The shortcomings of the test are obvious : namely The final evaluation index calculated on the verification set has a great relationship with the original group . To eliminate this randomness , The researchers introduced “ Cross examination ”.
(2) Cross examination k-fold Cross examination : First, divide all the samples into k k k Sample subsets of equal size . Traverse this one by one k k k A subset of , Take the current subset as the validation set every time , All the other subsets as training sets , Model training and evaluation . Finally, put k k k The average value of the times' evaluation index is taken as the final evaluation index . In a practical experiment , k k k Frequent value 10. Leave a verification : Every time I stay 1 Samples as validation set , All other samples are used as training sets , The total number of samples is n, According to this n Samples for traversal , Conduct n Time verification , Then average the evaluation index to the final evaluation index .
In the case of a large number of samples , It takes a lot of time to leave a validation method . in fact , To keep a verification is to keep p p p Special case of verification , leave p p p Verification is to stay... Every time p p p Samples as validation set , from n Of the elements p p p The elements are C n p C_n^p Cnp Possibilities , So its time cost is much higher than leaving a verification , So I seldom use .
(3) Self help law
Whether it's Holdout Test or cross test , Are based on partition Training set and Test set Method for model evaluation . then , When the sample size is small , Dividing the sample set will further reduce the training set , This may affect the effect of model training . Is there a verification method that can maintain the sample size of the training set ? Self help method can better solve this problem ;
Self help law Inspection method based on self-service sampling method . For a total of n A sample set of , Conduct n Time Put it back Random sampling , The magnitude is n Training set of .n During subsampling , Some samples will be repeated , Some samples have not been taken , Take these samples that have not been sampled as the validation set , Model validation , This is the self-help verification process .
problem 2: In the process of bootstrap sampling , Yes n A sample of n Second self-service sampling , When n As we go to infinity , In the end, how many pieces of data have never been selected ?
The probability that a sample is not selected in a sampling process is ( 1 − 1 n ) (1-\frac{1}{n}) (1−n1),n The probability that every sampling is successful is ( 1 − 1 n ) n (1-\frac{1}{n})^n (1−n1)n. When n As we go to infinity , The probability of lim x → ∞ ( 1 − 1 n ) n \lim_{x \to \infty} (1-\frac{1}{n})^n limx→∞(1−n1)n. According to the important limit : lim n → ∞ ( 1 + 1 n ) n = e \lim_{n\to \infty}(1+\frac{1}{n})^n=e limn→∞(1+n1)n=e, So there is
lim x → ∞ ( 1 − 1 n ) n = lim n → ∞ ( n − 1 n ) n = lim n → ∞ ( 1 n n − 1 ) n = lim n → ∞ 1 ( 1 + 1 n − 1 ) n = 1 lim n → ∞ ( 1 + 1 n − 1 ) n − 1 ⋅ 1 lim n → ∞ ( 1 + 1 n − 1 ) = 1 e ≈ 0.368 \lim_{x \to \infty} (1-\frac{1}{n})^n=\lim_{n \to \infty} (\frac{n-1}{n})^n=\lim_{n\to \infty}(\frac{1}{\frac{n}{n-1}})^n=\lim_{n \to \infty}\frac{1}{(1+\frac{1}{n-1})^n}=\frac{1}{\lim_{n\to \infty} (1+\frac{1}{n-1})^{n-1}}\cdot \frac{1}{\lim_{n\to \infty}(1+\frac{1}{n-1})}=\frac{1}{e}\approx 0.368 x→∞lim(1−n1)n=n→∞lim(nn−1)n=n→∞lim(n−1n1)n=n→∞lim(1+n−11)n1=limn→∞(1+n−11)n−11⋅limn→∞(1+n−11)1=e1≈0.368
therefore , When the number of samples is large , There are about 36.8% A sample of has never been selected , Can be used as a validation set .
Super parameter tuning
problem 1: What are the tuning methods for hyper parameters ?
For super parameter tuning , We usually use The grid search 、 Random search 、 Bayesian optimization And so on . Before we introduce the algorithm , It is necessary to make clear which elements the super parameter search algorithm generally includes .
- One is the objective function , That is, the algorithm needs to be maximized / The goal of minimization ;
- The second is the search scope , It is usually determined by the upper and lower limits ;
- Third, other parameters of the algorithm , Such as search step .
(1) The grid search
Grid search is probably the easiest 、 The most widely used super parameter search algorithm . it Determine the optimal value by finding all points in the search range . If you use a larger search range and a smaller step size , Grid search has a high probability to find the global optimal value . However , This search scheme It consumes a lot of computing resources and time , Especially when there are many super parameters to tune . therefore , in application , The grid search method generally uses a wide search range and a large step length first , To find the possible location of the global optimal value ; The search scope and step size will be gradually reduced , To find a more accurate optimal value . This kind of operation scheme can reduce the time and computation required , but Because the objective function is generally nonconvex , So it's likely to miss the global optimum .
(2) Random search
The idea of random search is similar to that of grid search , Just don 't test all the values between the upper and lower bounds anymore , It is Randomly select sample points in the search range . Its theoretical basis is : If the sample set is large enough , Then we can find the global optimal value approximately by random sampling , Or its approximate value . Random search is generally faster than grid search , But like the fast version of grid search , its The result is not guaranteed .
(3) Bayesian Optimization Algorithm
When Bayesian optimization algorithm is looking for the optimal maximum parameter , Used with grid search 、 Random search is a totally different way . Grid search and random search when testing a new point , Ignore the information from the previous point ; and Bayesian optimization algorithm makes full use of the previous information . The Bayesian optimization algorithm learns the shape of the objective function , Find the parameter that makes the objective function improve to the global optimal value .
say concretely , The way it learns the shape of the objective function is
- First, according to the prior distribution , Suppose a collection function ;
- then , Every time a new sample point is used to test the objective function , Use this information to update the prior distribution of the objective function ;
- Last , The algorithm tests the most likely position of the global maximum given by the posterior distribution .
For Bayesian Optimization Algorithm , There's a place to pay attention to , Once a local optimal value is found , It's going to be sampling the area , therefore It is easy to fall into a local optimum . To make up for this defect , Bayesian optimization algorithm will find a balance between exploration and utilization ,“ Explore ” It is to obtain the sampling point in the area that has not been sampled ; and “ utilize ” Sampling is done according to the posterior distribution in the region where the global maximum is most likely to occur .
Over fitting and under fitting problems
In the process of model evaluation and adjustment , Often meet ” Over fitting “ and ” Under fitting “ The situation of . that
problem 1: In the process of model evaluation , What phenomenon does over fitting and under fitting refer to ?
Over fitting refers to the situation that the model is over fitted to the training data , Reflect on the evaluation indicators , It's just that the model performs well in the training set , But it doesn't perform well on test sets and new data .
Under fitting means that the model does not perform well in training and prediction , Reflect on the evaluation indicators , The performance of the model in the training set and the test set is not good .
problem 2: What can be done to reduce the risk of over fitting and under fitting ?
(1) Reduce “ Over fitting ” Risk management approach
- Start with the data , Get more data . Using more training data is the most effective way to solve the over fitting problem . Because more samples can let the model learn more and more effective features , Noise reduction . Of course , It is generally difficult to directly add experimental data , But we can extend the training data through certain rules . such as , On the problem of image classification , Through the translation of the image 、 rotate 、 Expand data by zooming, etc ; Further more , Use generative countermeasure networks to synthesize large amounts of new training data .
- Reduce model complexity . When there is less data , Too complex model is the main factor of over fitting . Reducing the complexity of the model can avoid too much sampling noise . for example : Reduce the number of network layers in the neural network model 、 Number of neurons ; Reduce the depth of the tree in the decision tree model 、 Prune, etc .
- Regularization method . Add some regular constraints to the parameters of the model , For example, add the weight to the loss function . With L2 Take regularization as an example :
C = C 0 + λ 2 n ⋅ ∑ i w i 2 C=C_0+\frac{\lambda}{2n}\cdot \sum_{i}w_i^2 C=C0+2nλ⋅i∑wi2
such , In optimizing the original objective function C 0 C_0 C0 At the same time , It can also avoid the risk of over fitting caused by too large weight . - Integrated learning methods . Integrated learning is the integration of multiple models , The first mock exam is to reduce the risk of over fitting of a single model . Such as Bagging Method .
(2) Reduce “ Under fitting ” Risk management approach
- Add new features . When the features are insufficient or the correlation between the existing features and the sample tag is not strong , The model is prone to under fitting . By digging “ Contextual features ”、“ID Class characteristics ”、“ Combination features ” Wait for new features , Often can achieve better results . In the trend of deep learning , There are many models that can help with Feature Engineering , Such as factorizer 、 Gradient lift decision tree 、Deep-crossing And so on can be a way to enrich features .
- Increase model complexity . The learning ability of simple model is poor , By increasing the complexity of the model, it can make the model have stronger fitting ability . for example , Add a higher-order term... To the linear model , Increase the number of network layers or neurons in the neural network model .
- Reduce the regularization coefficient . Regularization is used to prevent over fitting , When the model is under fitted , It is necessary to reduce the regularization coefficient .
Reference material
[1] 《 Baimian machine learning 》 Chapter two Model to evaluate
边栏推荐
猜你喜欢

Machinery -- nx2007 (UG) finite element analysis tutorial 2 -- assembly

Cookie加密15 登录加密

Online sql to CSV tool

In depth analysis of a large number of clos on the server_ The root of wait

Ctfshow competition original title 680-695

The listing of Symantec electronic sprint technology innovation board: it plans to raise 623million yuan, with a total of 64 patent applications

cookie加密8

js返回内容被unicode编码

Pytroch Learning Notes 6: NN network layer convolution layer

cookie加密8
随机推荐
What is digital garbage? Follow the world's first AI artist to explore meta carbon Art
Analysis of IM instant messaging development technology on modern web
The Web3 era is coming? Inventory of five Web3 representative projects | footprint analytics
眼底出血术后需注意事项//每天必看
英伟达Jetson Nano的初步了解
VIM editor common instructions
Vant weave - remove (clear) < van button > button component Click to display gray background effect
Quality management of functional modules of MES management system
Machine learning notes: time series decomposition STL
Unity编辑器随机生成物体,更换场景之后物体丢失问题解决
Text classification using huggingface
How does webapi relate to the database of MS SQL?
Newton method (optimization of two variable functions)
Varnish 基础概览3
Online sql to CSV tool
田口实验法
STC89C52 single chip microcomputer simple calculator design and code demonstration
JS anti shake and throttling
Visual studio 2017 cannot open the include file: 'qopenglfunctions_3_3_core': no such file or directory
The unity editor randomly generates objects. After changing the scene, the problem of object loss is solved