当前位置:网站首页>Multi card training in pytorch
Multi card training in pytorch
2022-07-29 04:14:00 【ytusdc】
- pytorch What is the process of zhongduoka training ?
- If every card has a model BN Are the parameters the same ?
- pytorch Of DistributedDataParallel Every GPU Are the model parameters exactly the same on ?
Same parameter , But at some moments the gradient is different .
DDP In working mode , The process can be imagined as :
- Parallel computing respective loss
- parallel backward
- Synchronize gradients between different cards
- Back propagation
Because the random initialization between different cards is the same ,DDP It can guarantee the connection between different processes model The parameters are always the same .
When you look at the source code, you should also see the class annotation NOTICE and WARNING, Compliance can ensure the consistency of parameters between processes . Of course, I'm still not sure. I can put evaluation Do it once in each process , The same result should be output .
- When Doka trains batchsize The accuracy is reduced instead of increased , Why is that ? Have you thought about how to solve it ?
DOCA training large batchsize:
Theoretical advantages :
The impact of noise in the data may be reduced , It may be easy to approach the best ;
Shortcomings and problems :
Reduced gradient variance;( Theoretically , For convex optimization problems , Low gradient variance Can get better optimization effect ; But actually Keskar et al Verified the increase batchsize It will lead to poor generalization ability );
For nonconvex optimization problems , The loss function contains multiple local Optimalities , Small batchsize Noisy interference may easily jump out of the local best , And the big ones batchsize It is possible to stop at the local best and not jump out .
resolvent :
increase learning_rate, But there may be problems , Use a lot of... At the beginning of training learning_rate May cause the model not to converge
Use warming up Reduce large learning_rate The model does not converge
warmup
link : Deep learning training strategies -- Learning rate warms up Warmup
At the beginning of training, use a lot of learning_rate It may lead to the problem of non convergence of training ,warmup The idea is to use a small learning rate at the beginning of training , With the training, the college learning rate gradually changes , until base learning_rate, Reuse other decay(CosineAnnealingLR) The way to train
边栏推荐
- Differences and principles of bio, NiO and AIO
- opengauss预检查安装
- GBase 8a特殊场景下屏蔽 ODBC 负载均衡方式?
- "Weilai Cup" 2022 Niuke summer multi school training camp 2H
- Object detection: object_ Detection API +ssd target detection model
- Locker 2022.1.1
- 数据库SQL语句实现数据分解的函数查询
- 安装postgis时报找不到“POSTGIS_VERSION”这个函数
- C语言:联合体知识点总结
- 安装ros的laser_scan_matche库所遇到的问题(一)
猜你喜欢

rman不标记过期备份

Some problems about pointers

不会就坚持63天吧 最大的异或

Copy products with one click from Taobao, tmall, 1688, wechat, jd.com, Suning, taote and other platforms to pinduoduo platform (batch upload baby details Interface tutorial)

Common components of solder pad (2021.4.6)

11.备份交换机

通过js来实现一元二次方程的效果,输入a,b,c系数后可计算出x1和x2的值

Beginner: array & String

UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0x90 in position 614: ordinal not in range(128)

MPU6050
随机推荐
HC06 HC05 BT
Compilation and linking
Note: restframe work records many to one tables, how to serialize in that table (reverse query)
Applet: Area scrolling, pull-down refresh, pull-up load more
Labelme cannot open the picture
[kvm] common commands
Array as function parameter -- pointer constant / constant pointer
How to execute insert into select from job in SQL client
Leftmost prefix principle of index
Pointer of pointer???...
Incubator course design (April 12, 2021)
Why do I delete the original record (OP d) and then add a new one in Kafka when I update MySQL data
Record of problems encountered in ROS learning
信号处理中的反傅里叶变换(IFFT)原理
Mmdetection preliminary use
不会就坚持63天吧 最大的异或
编译与链接
大佬们flink的JDBC SQL Connector现在不支持所有的数据库吗,例如vertica?
请问,在sql client中,执行insert into select from job时,如何单
C语言:typedef知识点总结