当前位置:网站首页>[torch]: parallel training and can dynamically set the batch size of the first GPU

[torch]: parallel training and can dynamically set the batch size of the first GPU

2022-06-11 10:00:00 Jack_ Kuo

problem

Reuse torch When training the model , Want to use a single multi card

solve

reference :pytorch many gpu Parallel training
You need to download customized :data_parallel_my_v2.py

  1. Network settings
   model = SimcseModel(pretrained_model="./model_save", pooling=args.pooler, dropout=args.dropout)

   print("torch.cuda.device_count()", torch.cuda.device_count())

   if torch.cuda.device_count() > 1:
       gpu0_bsz = int(np.sqrt(args.batch_size)) 
       #  Set up here gpu0 Of batch, Because many times you need gpu0 Calculate other things , This consumption will be higher than others gpu Big 
       print("gpu0_bsz", gpu0_bsz)
       acc_grad = 1 #  The cumulative gradient is not set here , So set to 1 
       print("acc_grad", acc_grad)
       from data_parallel_my_v2 import BalancedDataParallel
       model = BalancedDataParallel(gpu0_bsz // acc_grad, model, dim=0) # .cuda()

   model = model.to(args.device)

2. Set when saving model

#  Dorka gpu Training , The saving method needs to be modified 
if torch.cuda.device_count() > 1:
    model.module.bert.save_pretrained(save_path)
else:
    model.bert.save_pretrained(save_path)
原网站

版权声明
本文为[Jack_ Kuo]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206110949039286.html