当前位置:网站首页>Multi card training in pytorch
Multi card training in pytorch
2022-07-29 04:14:00 【ytusdc】
- pytorch What is the process of zhongduoka training ?
- If every card has a model BN Are the parameters the same ?
- pytorch Of DistributedDataParallel Every GPU Are the model parameters exactly the same on ?
Same parameter , But at some moments the gradient is different .
DDP In working mode , The process can be imagined as :
- Parallel computing respective loss
- parallel backward
- Synchronize gradients between different cards
- Back propagation
Because the random initialization between different cards is the same ,DDP It can guarantee the connection between different processes model The parameters are always the same .
When you look at the source code, you should also see the class annotation NOTICE and WARNING, Compliance can ensure the consistency of parameters between processes . Of course, I'm still not sure. I can put evaluation Do it once in each process , The same result should be output .
- When Doka trains batchsize The accuracy is reduced instead of increased , Why is that ? Have you thought about how to solve it ?
DOCA training large batchsize:
Theoretical advantages :
The impact of noise in the data may be reduced , It may be easy to approach the best ;
Shortcomings and problems :
Reduced gradient variance;( Theoretically , For convex optimization problems , Low gradient variance Can get better optimization effect ; But actually Keskar et al Verified the increase batchsize It will lead to poor generalization ability );
For nonconvex optimization problems , The loss function contains multiple local Optimalities , Small batchsize Noisy interference may easily jump out of the local best , And the big ones batchsize It is possible to stop at the local best and not jump out .
resolvent :
increase learning_rate, But there may be problems , Use a lot of... At the beginning of training learning_rate May cause the model not to converge
Use warming up Reduce large learning_rate The model does not converge
warmup
link : Deep learning training strategies -- Learning rate warms up Warmup
At the beginning of training, use a lot of learning_rate It may lead to the problem of non convergence of training ,warmup The idea is to use a small learning rate at the beginning of training , With the training, the college learning rate gradually changes , until base learning_rate, Reuse other decay(CosineAnnealingLR) The way to train
边栏推荐
- Database SQL statement realizes function query of data decomposition
- Applet: Area scrolling, pull-down refresh, pull-up load more
- SQL time fuzzy query datediff() function
- 全屋WiFi方案:Mesh路由器组网和AC+AP
- [hands on deep learning] environment configuration (detailed records, starting from the installation of VMware virtual machine)
- Openfeign asynchronous call problem
- Nacos registry
- GBase 8a特殊场景下屏蔽 ODBC 负载均衡方式?
- BIO、NIO、AIO的区别和原理
- Incubator course design (April 12, 2021)
猜你喜欢

VScode连接ssh遇到的问题

Record of problems encountered in ROS learning

店铺排名问题,如何解决?

12.优先级队列和惰性队列

rman不标记过期备份

Object detection: object_ Detection API +ssd target detection model

STM32F103ZET6程序移植为C8T6+C8T6下载程序flash timeout的解决方案

RMAN do not mark expired backups

不会就坚持59天吧 替换单词

Machine vision Series 1: Visual Studio 2019 dynamic link library DLL establishment
随机推荐
不会就坚持62天吧 单词之和
Cad2020 introductory learning (2021.4.13)
The principle of inverse Fourier transform (IFFT) in signal processing
%s. %c, character constant, string constant, const char*, pointer array, string array summary
不会就坚持68天吧 狒狒吃香蕉
数据集成这个地方的过滤条件该咋写,用的啥语法?sql语法处理bizdate可以不
全屋WiFi方案:Mesh路由器组网和AC+AP
Taobao product details interface (product details page data interface)
LCA board
Pat a1069/b1019 the black hole of numbers
安装postgis时报找不到“POSTGIS_VERSION”这个函数
Leftmost prefix principle of index
10.回退消息
Methods of using multiple deformations on an element
不会就坚持71天吧 链表排序
The data source is SQL server. I want to configure the incremental data of the last two days of the date field updatedate to add
Locker 2022.1.1
14.haproxy+keepalived负载均衡和高可用
Communication between parent-child components and parent-child components provide and inject
Problems encountered in vscode connection SSH