当前位置:网站首页>(perfect solution) why is the effect of using train mode on the train/val/test dataset good, but it is all very poor in Eval mode
(perfect solution) why is the effect of using train mode on the train/val/test dataset good, but it is all very poor in Eval mode
2022-07-29 01:10:00 【Interval】
background
This thing , It is actually a metaphysics . I have encountered this problem for the second time .
I had a headache the first time I met it , That time I don't know how to solve it inexplicably . This time again , There is no way to be inexplicable . Can only think about how to solve .
My background this time is : The model body uses CNN framework , This is a problem . What's the problem ? stay train On dataset , Use train The accuracy of the model is very high , And then use eval Patterns in val Test on dataset , Found that the effect is particularly poor .
What's the first reaction ? Is over fitting right ! So I use it eval Patterns in train Test on dataset , Found that the effect is particularly poor !!!
Conclusion : Not too fitting , Namely eval Patterns and train The problem of patterns . So further verification , stay train In mode ,val/test The test results on the data set are all good .
Solutions
My situation is like this , I don't design my own network , But the official network , Then I just cut down the number of layers , The framework is reserved . So what the Internet says ,1. Don't use... At the same time dropout and batchnormalization It doesn't apply to me , After all, it is the official network architecture , There's nothing wrong with it . My network architecture , He is behind the convolution , The average pooling , then dropout, Then linear layer classification . Not used elsewhere dropout, You can compare your network structure , It is also designed according to this idea , So there is a reference .
Because it is the official network ,2. Different places call the same batchnormalization It doesn't apply to me , Because the official network will not have this error . however , Words , This calls the same , It will have such a big impact ? I really haven't tried .
Others say ,3. Input data is not normalized , If this is the problem , What then? train In mode, right train/val/test The effect is ok ?
The ultimate solution
To understand ,train and eval There is only one difference , That's it dropout and batch normalization stay train/eval Working mode is different . therefore , The above two solutions are indeed carried out around this .
stay train Good in mode ,eval Poor in mode , We can understand it as :train In mode , The pile of vectors obtained in the last layer is good ,eval The resulting pile of vector differences , So as to gradually push forward .
batch normalization One advantage of is that after use , The data will be subject to 0, The variance of 1 Is a normal distribution , More stable , So that the model is better trained .
therefore , A simple comparison is , You trained several rounds in your model , After feeling the effect is good , stop , Select several training set data , such as 5 individual , Respectively in train Patterns and eval In mode , Print their input , The result of the middle layer and the end , You can do this by yourself .
As for what results to print , Of course not those tensor, I can't understand , Instead, print their mean and variance .
x.mean()
x.var()
# data 1
1.4886/29791.4102# Input
0.0249/0.6548 0.0215/0.3354# A certain layer in the middle
The front is eval Mode mean/var, And then train Pattern
0.4258/13.0988 0.1238/12.9480# A certain layer in the middle
# data 2
0.6043/751.1927
0.0119/0.0104 0.0333/0.3308
0.4165/7.9723 0.2913/14.2297
# data 3
1.7033/34862.4297
0.0267/0.7132 0.0223/0.3357
0.4337/11.9446 0.1300/12.6917
# data 4
0.3590/40.7517
0.0099/0.0007 0.0708/0.3518
0.4294/8.2139 0.4683/15.1485
Anyway , The first striking thing is , my 4 Data , The mean and variance of the input are particularly different , Especially variance ,3 More than 10000 variance , Scared to death . No printing before , To be honest, I don't know yet .
Then there is the middle layer 1: We found that ,train In mode , It is very stable . The average is about 0.02/0.03 The appearance of , Only data 4 It's a bit unusual ,0.07. Then variance , They are all stable 0.3 The appearance of .
take the reverse into consideration eval In mode , Middle layer 1 Shaking around .
Middle layer 2,eval In mode , The average is very stable , But the variance is unstable .train In mode , The average is a little jittery , But the variance is stable .
in general , Our data input variance is very large , then train Patterns and eval The results of each layer of the model are significantly different .
Conclusion
The former data input we can handle ,train Patterns and eval The modes are very different. We can only watch , There is no way to deal with haha .
Conclusion :
We preprocess the data , Normalize , We denied normalization before ? Give in again ? Right haha . We actually saw , The input variance is too large .
I didn't normalize , Because I'm afraid that normalization will lose some details of my input data . Because my input is not a picture of the real world , Some of the numbers inside are meaningful , I'm afraid it will be destroyed after normalization .
How to normalize ? It can be used for each of the pictures channel Conduct z- Standardization , That is to say batch normalization like that .
succeed , Simultaneous discovery , After normalization, the training is fast !!! Before 3 individual epoch convergence , Now? 1 individual epoch It's almost convergent . frigging awesome .
边栏推荐
- Implement Lmax disruptor queue from scratch (VI) analysis of the principle of disruptor solving pseudo sharing and consumers' elegant stopping
- The method of tracking the real-time market of London Silver
- y80.第四章 Prometheus大厂监控体系及实战 -- kube-state-metrics组件介绍和监控扩展(十一)
- 散列表 ~
- Machine learning | matlab implementation of RBF radial basis function neural network Newrbe parameter setting
- 消费行业数字化升级成“刚需”,weiit新零售SaaS为企业赋能!
- 【刷题笔记】二进制链表转整数
- 面试突击69:TCP 可靠吗?为什么?
- 教你一文解决 js 数字精度丢失问题
- Hash table~
猜你喜欢

How to create a custom 404 error page in WordPress

用CDO进行nc数据的不规则裁剪

数字孪生轨道交通:“智慧化”监控疏通城市运行痛点

PLATO上线LAAS协议Elephant Swap,用户可借此获得溢价收益

Connect with Alipay payment

小程序毕设作品之微信校园浴室预约小程序毕业设计成品(8)毕业设计论文模板

How to carry out engineering implementation of DDD Domain Driven Design

Spark 3.0 中七个必须知道的 SQL 性能优化

如何给女友讲明白JS的bind模拟实现

一文让你搞懂MYSQL底层原理。-内部结构、索引、锁、集群
随机推荐
散列表 ~
B-tree~
iNFTnews | 元宇宙购物体验将成为吸引消费者的一大利器
ThinkPHP high imitation blue cloud disk system program
mysql分表之后怎么平滑上线?
对接支付宝支付
Connect with Alipay payment
Classification prediction | MATLAB realizes time series classification prediction of TCN time convolution neural network
返回*this的成员函数
State compression DP Mondrian's dream
[Commons lang3 topic] 003- randomstringutils topic
Summary of process and thread knowledge points 1
Some considerations about ThreadPool
The digitalization of the consumer industry is upgraded to "rigid demand", and weiit's new retail SaaS empowers enterprises!
Talk about the cross end technical scheme
一文让你搞懂MYSQL底层原理。-内部结构、索引、锁、集群
QT静态编译程序(Mingw编译)
【Jenkins笔记】入门,自由空间;持续集成企业微信;allure报告,持续集成电子邮件通知;构建定时任务
Thread lock and its ascending and descending levels
如何执行建设项目的时间影响分析?