当前位置：网站首页>(perfect solution) why is the effect of using train mode on the train/val/test dataset good, but it is all very poor in Eval mode

(perfect solution) why is the effect of using train mode on the train/val/test dataset good, but it is all very poor in Eval mode

2022-07-29 01:10:00 【Interval】

background

This thing , It is actually a metaphysics . I have encountered this problem for the second time .

I had a headache the first time I met it , That time I don't know how to solve it inexplicably . This time again , There is no way to be inexplicable . Can only think about how to solve .

My background this time is ： The model body uses CNN framework , This is a problem . What's the problem ？ stay train On dataset , Use train The accuracy of the model is very high , And then use eval Patterns in val Test on dataset , Found that the effect is particularly poor .

What's the first reaction ？ Is over fitting right ！ So I use it eval Patterns in train Test on dataset , Found that the effect is particularly poor ！！！

Conclusion ： Not too fitting , Namely eval Patterns and train The problem of patterns . So further verification , stay train In mode ,val/test The test results on the data set are all good .

Solutions

My situation is like this , I don't design my own network , But the official network , Then I just cut down the number of layers , The framework is reserved . So what the Internet says ,1. Don't use... At the same time dropout and batchnormalization It doesn't apply to me , After all, it is the official network architecture , There's nothing wrong with it . My network architecture , He is behind the convolution , The average pooling , then dropout, Then linear layer classification . Not used elsewhere dropout, You can compare your network structure , It is also designed according to this idea , So there is a reference .

Because it is the official network ,2. Different places call the same batchnormalization It doesn't apply to me , Because the official network will not have this error . however , Words , This calls the same , It will have such a big impact ？ I really haven't tried .

Others say ,3. Input data is not normalized , If this is the problem , What then? train In mode, right train/val/test The effect is ok ？

The ultimate solution

To understand ,train and eval There is only one difference , That's it dropout and batch normalization stay train/eval Working mode is different . therefore , The above two solutions are indeed carried out around this .

stay train Good in mode ,eval Poor in mode , We can understand it as ：train In mode , The pile of vectors obtained in the last layer is good ,eval The resulting pile of vector differences , So as to gradually push forward .

batch normalization One advantage of is that after use , The data will be subject to 0, The variance of 1 Is a normal distribution , More stable , So that the model is better trained .

therefore , A simple comparison is , You trained several rounds in your model , After feeling the effect is good , stop , Select several training set data , such as 5 individual , Respectively in train Patterns and eval In mode , Print their input , The result of the middle layer and the end , You can do this by yourself .

As for what results to print , Of course not those tensor, I can't understand , Instead, print their mean and variance .

x.mean()
x.var()

# data 1
1.4886/29791.4102# Input 
0.0249/0.6548  0.0215/0.3354# A certain layer in the middle 
 The front is eval Mode mean/var, And then train Pattern 
0.4258/13.0988 0.1238/12.9480# A certain layer in the middle 

# data 2
0.6043/751.1927 
0.0119/0.0104 0.0333/0.3308 
0.4165/7.9723 0.2913/14.2297
# data 3
1.7033/34862.4297
0.0267/0.7132 0.0223/0.3357
0.4337/11.9446 0.1300/12.6917
# data 4
0.3590/40.7517 
0.0099/0.0007 0.0708/0.3518
0.4294/8.2139 0.4683/15.1485

Anyway , The first striking thing is , my 4 Data , The mean and variance of the input are particularly different , Especially variance ,3 More than 10000 variance , Scared to death . No printing before , To be honest, I don't know yet .

Then there is the middle layer 1： We found that ,train In mode , It is very stable . The average is about 0.02/0.03 The appearance of , Only data 4 It's a bit unusual ,0.07. Then variance , They are all stable 0.3 The appearance of .

take the reverse into consideration eval In mode , Middle layer 1 Shaking around .

Middle layer 2,eval In mode , The average is very stable , But the variance is unstable .train In mode , The average is a little jittery , But the variance is stable .

in general , Our data input variance is very large , then train Patterns and eval The results of each layer of the model are significantly different .

Conclusion

The former data input we can handle ,train Patterns and eval The modes are very different. We can only watch , There is no way to deal with haha .

Conclusion ：

We preprocess the data , Normalize , We denied normalization before ？ Give in again ？ Right haha . We actually saw , The input variance is too large .

I didn't normalize , Because I'm afraid that normalization will lose some details of my input data . Because my input is not a picture of the real world , Some of the numbers inside are meaningful , I'm afraid it will be destroyed after normalization .

How to normalize ？ It can be used for each of the pictures channel Conduct z- Standardization , That is to say batch normalization like that .

succeed , Simultaneous discovery , After normalization, the training is fast ！！！ Before 3 individual epoch convergence , Now? 1 individual epoch It's almost convergent . frigging awesome .

End of the flower

原网站

版权声明
本文为[Interval]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207282351561522.html