当前位置:网站首页>Pytorch - Distributed Model Training
Pytorch - Distributed Model Training
2022-08-01 14:16:00 【CyrusMay】
Pytorch —— 分布式模型训练
1.数据并行
1.1 单机单卡
import torch
from torch import nn
import torch.nn.functional as F
import os
model = nn.Sequential(nn.Linear(in_features=10,out_features=20),
nn.ReLU(),
nn.Linear(in_features=20,out_features=2),
nn.Sigmoid())
data = torch.rand([100,10])
optimizer = torch.optim.Adam(model.parameters(),lr = 0.001)
print(torch.cuda.is_available())
# Specifies to use only one graphics card
# Can be run in the terminal CUDA_VISIBLE_DEVICES="0"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
# selected graphics card
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 模型拷贝
model.to(device)
# 数据拷贝
data = data.to(device)
# 模型存储
torch.save({
"model_state_dict":model.state_dict(),
"optimizer_state_dict":optimizer.state_dict()},"./model")
# 模型加载
checkpoint = torch.load("./model",map_location=device)
model.load_state_dict(checkpoint["model_state_dict"])
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
1.2 单机多卡
代码
import torch
import torch.nn.functional as F
from torch import nn
import os
# 获取当前gpu的编号
local_rank = int(os.environ["LOCAL_RANK"])
torch.cuda.set_device(local_rank)
device = torch.device("cuda",local_rank)
dataset = torch.rand([1000,10])
model = nn.Sequential(
nn.Linear(),
nn.ReLU(),
nn.Linear(),
nn.Sigmoid()
)
optimizer = torch.optim.Adam(model.parameters,lr=0.001)
# 检测GPU的数目
n_gpus = torch.cuda.device_count()
# Initialize a process group
torch.distributed.init_process_group(backend="nccl",init_method="env://") # backendfor communication
# 模型拷贝,放入DistributedDataParallel
model = torch.nn.parallel.DistributedDataParallel(model,device_ids=[local_rank],output_device=local_rank)
# 构建分布式的sampler
sampler = torch.utils.data.distributed.DistributedSampler(dataset)
# 构建dataloader
BATCH_SIZE = 128
dataloader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=BATCH_SIZE,
num_workers = 8,
sampler = sampler)
for epoch in range(1000):
for x in dataloader:
sampler.set_epoch(epoch) # play differentlyshuffle作用
if local_rank == 0:
# 模型存储
torch.save({
"model_state_dict":model.module.state_dict()
},"./model")
# 模型加载
checkpoint = torch.load("./model",map_location=local_rank)
model.load_state_dict(checkpoint["model_state_dict"],
)
Start the task in the terminal
torchrun --nproc_per_node=n_gpus train.py
1.3 多机多卡
代码
import torch
import torch.nn.functional as F
from torch import nn
import os
# 获取当前gpu的编号
local_rank = int(os.environ["LOCAL_RANK"])
torch.cuda.set_device(local_rank)
device = torch.device("cuda",local_rank)
dataset = torch.rand([1000,10])
model = nn.Sequential(
nn.Linear(),
nn.ReLU(),
nn.Linear(),
nn.Sigmoid()
)
optimizer = torch.optim.Adam(model.parameters,lr=0.001)
# 检测GPU的数目
n_gpus = torch.cuda.device_count()
# Initialize a process group
torch.distributed.init_process_group(backend="nccl",init_method="env://") # backendfor communication
# 模型拷贝,放入DistributedDataParallel
model = torch.nn.parallel.DistributedDataParallel(model,device_ids=[local_rank],output_device=local_rank)
# 构建分布式的sampler
sampler = torch.utils.data.distributed.DistributedSampler(dataset)
# 构建dataloader
BATCH_SIZE = 128
dataloader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=BATCH_SIZE,
num_workers = 8,
sampler = sampler)
for epoch in range(1000):
for x in dataloader:
sampler.set_epoch(epoch) # play differentlyshuffle作用
if local_rank == 0:
# 模型存储
torch.save({
"model_state_dict":model.module.state_dict()
},"./model")
# 模型加载
checkpoint = torch.load("./model",map_location=local_rank)
model.load_state_dict(checkpoint["model_state_dict"],
)
The terminal starts the task
Do it once on each node
torchrun --nproc_per_node=n_gpus --nodes=2 --node_rank=0 --master_addr="主节点IP" --master_port="主节点端口号" train.py
2 模型并行
略
by CyrusMay 2022 07 29
边栏推荐
- 【每日一题】1331. 数组序号转换
- 有谁知道pg12.5版本的数据库驱动在哪里能找到么?
- ABC260 E - At Least One (Dual Pointer)
- 软件测试之发现和解决bug
- Gradle系列——Gradle测试,Gradle生命周期,settings.gradle说明,Gradle任务(基于Groovy文档4.0.4)day2-3
- 微服务系统架构的演变
- iframe tag attribute description detailed [easy to understand]
- 热心肠:关于肠道菌群和益生菌的10个观点
- 魔众文档管理系统 v5.0.0
- PAT 1167 Cartesian Tree(30)
猜你喜欢
随机推荐
Yann LeCun开怼谷歌研究:目标传播早就有了,你们创新在哪里?
使用open3d可视化3d人脸
DaemonSet of kubernetes and rolling update
MBI5020 LED Driver
微服务系统架构的演变
可观测性就是对“监控”的包装?
openEuler 社区完成首批顾问专家聘用,共同为社区的发展贡献力量
股票预测 lstm(时间序列的预测步骤)
Pytorch —— 分布式模型训练
拥抱NFV,Istio 1.1 将支持多网络平面
阿里巴巴测试开发岗P6面试题
[机缘参悟-57]:《素书》-4-修身养志[本德宗道章第四]
沃文特生物IPO过会:年营收4.8亿 养老基金是股东
【每日一题】593. 有效的正方形
[LiteratureReview]Optimal and Robust Category-level Perception: Object Pose and Shape Estimation f
分布式中的CAP原理
The little thing about Request reuse.The research is understood, and I will report it to you.
视频传输协议(常用的视频协议)
Wovent Bio IPO: Annual revenue of 480 million pension fund is a shareholder
【每日一题】592. 分数加减运算









