当前位置:网站首页>Multi card training in pytorch
Multi card training in pytorch
2022-07-29 04:14:00 【ytusdc】
- pytorch What is the process of zhongduoka training ?
- If every card has a model BN Are the parameters the same ?
- pytorch Of DistributedDataParallel Every GPU Are the model parameters exactly the same on ?
Same parameter , But at some moments the gradient is different .
DDP In working mode , The process can be imagined as :
- Parallel computing respective loss
- parallel backward
- Synchronize gradients between different cards
- Back propagation
Because the random initialization between different cards is the same ,DDP It can guarantee the connection between different processes model The parameters are always the same .
When you look at the source code, you should also see the class annotation NOTICE and WARNING, Compliance can ensure the consistency of parameters between processes . Of course, I'm still not sure. I can put evaluation Do it once in each process , The same result should be output .
- When Doka trains batchsize The accuracy is reduced instead of increased , Why is that ? Have you thought about how to solve it ?
DOCA training large batchsize:
Theoretical advantages :
The impact of noise in the data may be reduced , It may be easy to approach the best ;
Shortcomings and problems :
Reduced gradient variance;( Theoretically , For convex optimization problems , Low gradient variance Can get better optimization effect ; But actually Keskar et al Verified the increase batchsize It will lead to poor generalization ability );
For nonconvex optimization problems , The loss function contains multiple local Optimalities , Small batchsize Noisy interference may easily jump out of the local best , And the big ones batchsize It is possible to stop at the local best and not jump out .
resolvent :
increase learning_rate, But there may be problems , Use a lot of... At the beginning of training learning_rate May cause the model not to converge
Use warming up Reduce large learning_rate The model does not converge
warmup
link : Deep learning training strategies -- Learning rate warms up Warmup
At the beginning of training, use a lot of learning_rate It may lead to the problem of non convergence of training ,warmup The idea is to use a small learning rate at the beginning of training , With the training, the college learning rate gradually changes , until base learning_rate, Reuse other decay(CosineAnnealingLR) The way to train
边栏推荐
- How to execute insert into select from job in SQL client
- "Weilai Cup" 2022 Niuke summer multi school training camp 2H
- Labelme cannot open the picture
- SQL time fuzzy query datediff() function
- Summary on the thought of double pointer
- kotlin的List,Map,Set等集合类不指定类型
- openFeign异步调用问题
- Beginner: array & String
- Cad2020 introductory learning (2021.4.13)
- (.*?) regular expression
猜你喜欢
VScode连接ssh遇到的问题
通过js来实现一元二次方程的效果,输入a,b,c系数后可计算出x1和x2的值
Lua语言(stm32+2G/4G模块)和C语言(stm32+esp8266)从字符串中提取相关数据的方法-整理
不会就坚持65天吧 只出现一次的数字
Class starts! See how smardaten decomposes complex business scenarios
[hands on deep learning] environment configuration (detailed records, starting from the installation of VMware virtual machine)
How to solve the problem of store ranking?
Machine vision Series 2: vs DLL debugging
顺序表和链表
不会就坚持68天吧 狒狒吃香蕉
随机推荐
C语言力扣第61题之旋转链表。双端队列与构造循环链表
C语言:枚举知识点总结
不会就坚持63天吧 最大的异或
不会就坚持71天吧 链表排序
RMAN do not mark expired backups
优炫数据库有办法查到主集群每天传给备集群的日志量吗?
How to solve the problem of store ranking?
HC06 HC05 BT
UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0x90 in position 614: ordinal not in range(128)
The solution of porting stm32f103zet6 program to c8t6+c8t6 download program flash timeout
SQL server how to judge when the parameter received by the stored procedure is of type int?
C declaration and initialization and assignment
Note: restframe work records many to one tables, how to serialize in that table (reverse query)
Fu Yingna: Yuan universe is the new generation of Internet!
The table of antd hides the pager when there is only one page
"Weilai Cup" 2022 Niuke summer multi school training camp 1 J serval and essay (heuristic merger)
rman不标记过期备份
Fuzzy query of SQL
Don't the JDBC SQL connector of the big guys Flink now support all databases, such as vertica?
不会就坚持65天吧 只出现一次的数字