当前位置:网站首页>[torch]: parallel training and can dynamically set the batch size of the first GPU
[torch]: parallel training and can dynamically set the batch size of the first GPU
2022-06-11 10:00:00 【Jack_ Kuo】
problem
Reuse torch When training the model , Want to use a single multi card
solve
reference :pytorch many gpu Parallel training
You need to download customized :data_parallel_my_v2.py
- Network settings
model = SimcseModel(pretrained_model="./model_save", pooling=args.pooler, dropout=args.dropout)
print("torch.cuda.device_count()", torch.cuda.device_count())
if torch.cuda.device_count() > 1:
gpu0_bsz = int(np.sqrt(args.batch_size))
# Set up here gpu0 Of batch, Because many times you need gpu0 Calculate other things , This consumption will be higher than others gpu Big
print("gpu0_bsz", gpu0_bsz)
acc_grad = 1 # The cumulative gradient is not set here , So set to 1
print("acc_grad", acc_grad)
from data_parallel_my_v2 import BalancedDataParallel
model = BalancedDataParallel(gpu0_bsz // acc_grad, model, dim=0) # .cuda()
model = model.to(args.device)
2. Set when saving model
# Dorka gpu Training , The saving method needs to be modified
if torch.cuda.device_count() > 1:
model.module.bert.save_pretrained(save_path)
else:
model.bert.save_pretrained(save_path)
边栏推荐
- Where is it safer to open an account for soda ash futures? How much can soda ash futures do now?
- 帝国CMS仿《手艺活》DIY手工制作网源码/92kaifa仿手艺活自适应手机版模板
- Image quality evaluation including Matlab source code
- puppeteer入门之 Puppeteer 类
- Es6新特性——箭头函数
- Understanding of the keyword this in JS
- Drink at night, 50 classic SQL questions, really fragrant~
- Can station B make money?
- 赛灵思引脚约束文件 .xdc
- What are the functions and applications of Mogg test controller
猜你喜欢

How do online app stores of laundry chain stores do?

ORACLE RAC中连接ScanIP报错ORA-12545的问题解决

帝国CMS仿《手艺活》DIY手工制作网源码/92kaifa仿手艺活自适应手机版模板

What are the ABAP keywords and syntax that cannot be used in the ABAP cloud environment?

面试常问:rem布局,flex布局等

ESP8266_ Connect to Alibaba cloud through mqtt protocol

BCGControlBar库专业版,完整记录的MFC扩展类

CVE-2021-40449 NtGdiResetDC UAF

全局池化–Pytorch

Troubleshooting the error ora-12545 reported by scanip in Oracle RAC
随机推荐
Events in JS
Oracle 11g RAC disk group has space and cannot add data files?
转载:LinearLayout+Fragment实现下导航栏效果
等待事件 enq: KO - fast object checkpoint可行的一些处理方法
How to determine whether two time periods overlap?
Can station B make money?
rac expdp导出时报错:ORA-31693、ORA-31617、ORA-19505
ESP8266_GET请求天气预报、json解析
Vk2c22a/b anti-interference series electric meter, water meter segment code LCD driver chip data (customized dice/cog)
【Objective-C】结构体和类的区别
The ins-30131 installer failed to verify the required initial settings
New feature of ES6 - arrow function
php 中使用exec显示报错
Standard dual airbags, Changan Lumin listed, starting at 48900 yuan
JS foundation -- Date object
【torch】: 并行训练并且可以动态设置第一个gpu的batch size
What is the difference between an interface and an abstract class?
UGUI鼠标点击扩散UI效果
DOtween使用方法
完结C语言