当前位置:网站首页>The training set Loss converges, but the test set Loss oscillates violently?
The training set Loss converges, but the test set Loss oscillates violently?
2022-08-05 11:38:00 【GIS and Climate】
问题场景
Today, when debugging the model, I found that it is on the training setLoss已经收敛了,但是在验证集上LossVibration is more severe,如下图所示:

原因分析
Found it after checking various blogs online验证集Loss震荡的原因可能有如下:
数据问题,For example, the training set and the validation set are too different,数据量太小; batchsize太小,The rules of model learning are not enough“普适”; lossThe function is not suitable; 学习率太大,The model gets stuck in a local optimum; There is a problem with the network structure of the model; ......
After knowing the reason,can be checked one by one.
数据上,Check your own datasettrain和valid的划分情况,Basically the distribution of the data should be about the same;数据量上1w+的图像,应该也还行; lossThe function is changed to other tests and the effect is still the same; The learning rate uses a dynamic adjustment strategy,应该没什么问题(In the later tests, even if the initial learning rate is adjusted,The end result is still similar); The model uses a more classic super-score model,应该问题不大; 调整了下bs,从32调整到48,The vibration was found to be smaller,效果如下图:


So the final analysis should bebatchsize太小的原因,If the point estimation can be increased, the effect will be better,但是奈何GPUNot enough memory.
总结
如果遇到Losshas converged on the training set,However, the shock on the validation set is more severe,Analyze the possible causes one by one,and try it.When trying, you should also pay attention to the theoretical analysis before running the model,Otherwise, it may be a waste of computing power.

参考
【1】https://blog.csdn.net/qq_40689236/article/details/106794155
【2】https://zhuanlan.zhihu.com/p/483488388
边栏推荐
- UDP通信
- Letter from Silicon Valley: Act fast, Facebook, Quora and other successful "artifacts"!
- Flink Yarn Per Job - JobManger 申请 Slot
- OpenHarmony如何查询设备类型
- TiDB 6.0 Placement Rules In SQL 使用实践
- 提取人脸特征的三种方法
- 小红的aba子序列(离散化、二分、dp维护区间最短)
- Http-Sumggling缓存漏洞分析
- 常见的 web 安全问题总结
- STM32 entry development: write XPT2046 resistive touch screen driver (analog SPI)
猜你喜欢

训练集Loss收敛,但是测试集Loss震荡的厉害?

Flink Yarn Per Job - 启动TM,向RM注册,RM分配solt

2022杭电多校联赛第六场 题解

Android 开发用 Kotlin 编程语言 二 条件控制

Introduction to the Evolution of Data Governance System

LeetCode刷题(8)

5G NR 系统消息

微服务结合领域驱动设计落地

Mathcad 15.0软件安装包下载及安装教程
![智源社区AI周刊No.92:“计算复杂度”理论奠基人Juris Hartmanis逝世;美国AI学生九年涨2倍,大学教师短缺;2022智源大会观点报告发布[附下载]](/img/e7/df5a17d372a4324d1a2120829d03e9.png)
智源社区AI周刊No.92:“计算复杂度”理论奠基人Juris Hartmanis逝世;美国AI学生九年涨2倍,大学教师短缺;2022智源大会观点报告发布[附下载]
随机推荐
课表小程序使用攻略
“小钢炮”气质明显,安全、舒适一个不落
How OpenHarmony Query Device Type
没开发人员,接到开发物联网系统的活儿,干不干?
Android 开发用 Kotlin 编程语言 二 条件控制
The fuse: OAuth 2.0 four authorized login methods must read
5G NR system messages
Can't get in to ask questions.I want to ask you a question about the return value (traversal of the graph), please give Xiaobai an answer.
华为分析&联运活动,助您提升游戏总体付费
LeetCode刷题(8)
常见的 web 安全问题总结
Android 开发用 Kotlin 编程语言一 基本数据类型
双因子与多因子身份验证有什么区别?
花的含义
Guys, I am a novice. I use flinksql to write a simple count of user visits according to the document, but it ends after executing it once.
WPF开发随笔收录-WriteableBitmap绘制高性能曲线图
知乎提问:中国是否还能实现伟大民族复兴
DocuWare平台——文档管理的内容服务和工作流自动化的平台详细介绍(下)
【硬件架构的艺术】学习笔记(2)同步和复位
I'm going crazy.Again A few days can not be A problem