当前位置:网站首页>Pytorch —— 分布式模型训练
Pytorch —— 分布式模型训练
2022-08-01 13:52:00 【CyrusMay】
1.数据并行
1.1 单机单卡
import torch
from torch import nn
import torch.nn.functional as F
import os
model = nn.Sequential(nn.Linear(in_features=10,out_features=20),
nn.ReLU(),
nn.Linear(in_features=20,out_features=2),
nn.Sigmoid())
data = torch.rand([100,10])
optimizer = torch.optim.Adam(model.parameters(),lr = 0.001)
print(torch.cuda.is_available())
# 指定只用一张显卡
# 可在终端运行 CUDA_VISIBLE_DEVICES="0"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
# 选定显卡
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 模型拷贝
model.to(device)
# 数据拷贝
data = data.to(device)
# 模型存储
torch.save({
"model_state_dict":model.state_dict(),
"optimizer_state_dict":optimizer.state_dict()},"./model")
# 模型加载
checkpoint = torch.load("./model",map_location=device)
model.load_state_dict(checkpoint["model_state_dict"])
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
1.2 单机多卡
代码
import torch
import torch.nn.functional as F
from torch import nn
import os
# 获取当前gpu的编号
local_rank = int(os.environ["LOCAL_RANK"])
torch.cuda.set_device(local_rank)
device = torch.device("cuda",local_rank)
dataset = torch.rand([1000,10])
model = nn.Sequential(
nn.Linear(),
nn.ReLU(),
nn.Linear(),
nn.Sigmoid()
)
optimizer = torch.optim.Adam(model.parameters,lr=0.001)
# 检测GPU的数目
n_gpus = torch.cuda.device_count()
# 初始化一个进程组
torch.distributed.init_process_group(backend="nccl",init_method="env://") # backend为通讯方式
# 模型拷贝,放入DistributedDataParallel
model = torch.nn.parallel.DistributedDataParallel(model,device_ids=[local_rank],output_device=local_rank)
# 构建分布式的sampler
sampler = torch.utils.data.distributed.DistributedSampler(dataset)
# 构建dataloader
BATCH_SIZE = 128
dataloader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=BATCH_SIZE,
num_workers = 8,
sampler = sampler)
for epoch in range(1000):
for x in dataloader:
sampler.set_epoch(epoch) # 起到不同的shuffle作用
if local_rank == 0:
# 模型存储
torch.save({
"model_state_dict":model.module.state_dict()
},"./model")
# 模型加载
checkpoint = torch.load("./model",map_location=local_rank)
model.load_state_dict(checkpoint["model_state_dict"],
)
在终端起任务
torchrun --nproc_per_node=n_gpus train.py
1.3 多机多卡
代码
import torch
import torch.nn.functional as F
from torch import nn
import os
# 获取当前gpu的编号
local_rank = int(os.environ["LOCAL_RANK"])
torch.cuda.set_device(local_rank)
device = torch.device("cuda",local_rank)
dataset = torch.rand([1000,10])
model = nn.Sequential(
nn.Linear(),
nn.ReLU(),
nn.Linear(),
nn.Sigmoid()
)
optimizer = torch.optim.Adam(model.parameters,lr=0.001)
# 检测GPU的数目
n_gpus = torch.cuda.device_count()
# 初始化一个进程组
torch.distributed.init_process_group(backend="nccl",init_method="env://") # backend为通讯方式
# 模型拷贝,放入DistributedDataParallel
model = torch.nn.parallel.DistributedDataParallel(model,device_ids=[local_rank],output_device=local_rank)
# 构建分布式的sampler
sampler = torch.utils.data.distributed.DistributedSampler(dataset)
# 构建dataloader
BATCH_SIZE = 128
dataloader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=BATCH_SIZE,
num_workers = 8,
sampler = sampler)
for epoch in range(1000):
for x in dataloader:
sampler.set_epoch(epoch) # 起到不同的shuffle作用
if local_rank == 0:
# 模型存储
torch.save({
"model_state_dict":model.module.state_dict()
},"./model")
# 模型加载
checkpoint = torch.load("./model",map_location=local_rank)
model.load_state_dict(checkpoint["model_state_dict"],
)
终端起任务
在每个节点上都执行一次
torchrun --nproc_per_node=n_gpus --nodes=2 --node_rank=0 --master_addr="主节点IP" --master_port="主节点端口号" train.py
2 模型并行
略
by CyrusMay 2022 07 29
边栏推荐
猜你喜欢
【5GC】5G网络切片与5G QoS的区别?
十九届浙大城院程序设计竞赛 F.Sum of Numerators(数学/找规律)
Based on 10 years of experience in stability assurance, what are the three key questions to be answered in failure recovery?|TakinTalks big coffee sharing
易优压双驱挖掘机压路机器类网站源码 v1.5.8
gpio analog serial communication
PAT1165 Block Reversing(25)
AtCoder Beginner Contest 261 D - Flipping and Bonus
8. SAP ABAP OData 服务如何支持创建(Create)操作
论文详读《基于改进 LeNet-5 模型的手写体中文识别》,未完待补充
台积电认清了形势,新的建厂计划没有美国,中国芯片也得到重视
随机推荐
SAP ABAP OData 服务如何支持创建(Create)操作试读版
JMP Pro 16.0 software installation package download and installation tutorial
gpio模拟串口通信
软件测试之发现和解决bug
数据挖掘-03
2022-07-29 网工进阶(二十二)BGP-其他特性(路由过滤、团体属性、认证、AS欺骗、对等体组、子路由器、路由最大接收数量)
8. SAP ABAP OData 服务如何支持创建(Create)操作
Istio投入生产的障碍以及如何解决这些问题
PAT 1167 Cartesian Tree(30)
10年稳定性保障经验总结,故障复盘要回答哪三大关键问题?|TakinTalks大咖分享
【每日一题】592. 分数加减运算
[LiteratureReview]Optimal and Robust Category-level Perception: Object Pose and Shape Estimation f
硬链接、软连接浅析
数据挖掘-04
关于Request复用的那点破事儿。研究明白了,给你汇报一下。
Six Stones Programming: Problems must be faced, methods must be skillful, and functions that cannot be done well must be solved
论文详读《基于改进 LeNet-5 模型的手写体中文识别》,未完待补充
sql中常用到的正则表达
为什么最大值加一等于最小值
四足机器人软件架构现状分析