当前位置:网站首页>【torch】: 并行训练并且可以动态设置第一个gpu的batch size
【torch】: 并行训练并且可以动态设置第一个gpu的batch size
2022-06-11 09:49:00 【Jack_Kuo】
问题
再使用torch训练模型的时候,想要使用单机多卡
解决
参照:pytorch多gpu并行训练
需要下载自定义的:data_parallel_my_v2.py
- 网络设置
model = SimcseModel(pretrained_model="./model_save", pooling=args.pooler, dropout=args.dropout)
print("torch.cuda.device_count()", torch.cuda.device_count())
if torch.cuda.device_count() > 1:
gpu0_bsz = int(np.sqrt(args.batch_size))
# 这里设置的gpu0的batch,因为很多时候需要gpu0计算其他的东西,这个消耗会比其他gpu大
print("gpu0_bsz", gpu0_bsz)
acc_grad = 1 # 这里没有设置累计梯度,所以设置为1
print("acc_grad", acc_grad)
from data_parallel_my_v2 import BalancedDataParallel
model = BalancedDataParallel(gpu0_bsz // acc_grad, model, dim=0) # .cuda()
model = model.to(args.device)
2.保存模型时设置
# 多卡gpu训练,保存方式需要做修改
if torch.cuda.device_count() > 1:
model.module.bert.save_pretrained(save_path)
else:
model.bert.save_pretrained(save_path)
边栏推荐
- Analysis of high frequency interview questions in massive data processing
- JS foundation - array object
- Integer lifting example
- [image denoising] image denoising based on mean + median + Gauss low pass + various improved wavelet transform, including Matlab source code
- Redis transaction details
- An error can't locate env pm in @INC
- UGUI鼠标点击扩散UI效果
- 【Objective-C】‘NSAutoreleasePool‘ is unavailable: not available in automatic reference counting mode
- 2022 must have Chrome extension - browser plug-in to double your Internet efficiency
- ESP8266_ Get request weather forecast and JSON parsing
猜你喜欢

Interview questions: REM layout, flex layout, etc

一万字彻底学会堆和二叉树

ES6新增特性--箭头函数

mysql基础学习笔记03

全局池化–Pytorch

UGUI

Vk2c22a/b anti-interference series electric meter, water meter segment code LCD driver chip data (customized dice/cog)

Chemical composition of q355hr steel plate

基于SSM+Vue+OSS的“依伴汉服”商城设计与开发(含源码+论文+ppt+数据库)

ESP8266_SNTP(Simple Network Time Protocol)
随机推荐
完结C语言
Jmeter的使用(模拟高并发)
鼠标点击坐标转换生成
How do we connect to WiFi?
Flask (VII) - static file
Oracle XDB組件的重建
jedisLock—redis分布式锁实现
New feature in ES6 -- arrow function
JS foundation -- Date object
Cisp-pte XSS Foundation
CISP-PTE XSS基础
Finish C language
帝国CMS仿《手艺活》DIY手工制作网源码/92kaifa仿手艺活自适应手机版模板
ESP8266_SmartConfig
FPGA基础架构【参考ug998】
Troubleshooting the error ora-12545 reported by scanip in Oracle RAC
go连接数据库报错 wsarecv: An existing connection was forcibly closed by the remote host.
ESP8266_MQTT协议
ESP8266_ Mqtt protocol
GDB debugging common commands