当前位置:网站首页>[torch]: parallel training and can dynamically set the batch size of the first GPU
[torch]: parallel training and can dynamically set the batch size of the first GPU
2022-06-11 10:00:00 【Jack_ Kuo】
problem
Reuse torch When training the model , Want to use a single multi card
solve
reference :pytorch many gpu Parallel training
You need to download customized :data_parallel_my_v2.py
- Network settings
model = SimcseModel(pretrained_model="./model_save", pooling=args.pooler, dropout=args.dropout)
print("torch.cuda.device_count()", torch.cuda.device_count())
if torch.cuda.device_count() > 1:
gpu0_bsz = int(np.sqrt(args.batch_size))
# Set up here gpu0 Of batch, Because many times you need gpu0 Calculate other things , This consumption will be higher than others gpu Big
print("gpu0_bsz", gpu0_bsz)
acc_grad = 1 # The cumulative gradient is not set here , So set to 1
print("acc_grad", acc_grad)
from data_parallel_my_v2 import BalancedDataParallel
model = BalancedDataParallel(gpu0_bsz // acc_grad, model, dim=0) # .cuda()
model = model.to(args.device)
2. Set when saving model
# Dorka gpu Training , The saving method needs to be modified
if torch.cuda.device_count() > 1:
model.module.bert.save_pretrained(save_path)
else:
model.bert.save_pretrained(save_path)
边栏推荐
- What is the difference between an interface and an abstract class?
- What are the application fields of MAG gear pump? To sum up
- Q1营收超华尔街预期,挚文集团的价值等待回归
- ZigBee模块通信协议的树形拓扑组网结构
- 帝国CMS仿《手艺活》DIY手工制作网源码/92kaifa仿手艺活自适应手机版模板
- What are the functions and functions of the EMG actuator
- [Clickhouse column] user initialization of new library role
- 【Objective-C】结构体和类的区别
- LoRa模块无线收发通信技术详解
- Oracle 11g RAC disk group has space and cannot add data files?
猜你喜欢

Q1 revenue exceeded expectations. Why did man Bang win headwind growth?

不卷了!入职字节跳动一周就果断跑了。

How much do you know about software compatibility testing? How to select a software compatibility testing organization?

Chemical composition of q355hr steel plate

Servlet 的初次部署

标配双安全气囊,长安Lumin上市,起价4.89万元

Cisp-pte XSS Foundation

What hydraulic oil is used for Denison hydraulic pump? What are the requirements

UGUI

帝国CMS仿《手艺活》DIY手工制作网源码/92kaifa仿手艺活自适应手机版模板
随机推荐
RSA签名问题
Q355HR钢板化学成分
ZigBee模块无线传输星形拓扑组网结构简介
【Objective-C】动态创建控件
什么是数字孪生?一个实时而虚拟的表现形式
CVPR 2021: learning continuous image representation with local implicit image function
图片规则翻页
An error can't locate env pm in @INC
转载:LinearLayout+Fragment实现下导航栏效果
Troubleshooting the error ora-12545 reported by scanip in Oracle RAC
An error will be reported when the RAC modifies the scanip to different network segments
[Clickhouse column] user initialization of new library role
ESP8266_ SNTP(Simple Network Time Protocol)
流式计算知识
面试复习手写题--函数截流与抖动
Oracle 11g RAC disk group has space and cannot add data files?
puppeteer入门之 BrowserFetcher 类
CISP-PTE XSS基础
你对软件兼容性测试知道多少?如何选择软件兼容性测试机构?
BCGControlBar库专业版,完整记录的MFC扩展类