当前位置:网站首页>[framework] multi learner
[framework] multi learner
2022-07-05 07:09:00 【hanjialeOK】
Run container
Run the container on each machine
docker run -it -v /data:/root/data/ --network host --name multi_learner hanjl/cuda:framework
modify host The host address
Enter the container on each machine , Modify the container's ip Address
vim /etc/hosts
Show
127.0.0.1
127.0.1.1
For example 4.7 Server , Please change the second line to the host address :
***.**.4.7
modify ssh port
Enter the container on each machine , hold ssh Port changed to 2233
sed -i 's/\(^Port\)/#\1/' /etc/ssh/sshd_config
echo Port 2233 >> /etc/ssh/sshd_config
service ssh restart
Each other on each machine ssh, Ensure that all machines can be directly connected without password or confirmation
worker id To configure
stay master Modify the configuration file above
vim ~/.ssh/config
It is amended as follows
Host by08
HostName ***.***.4.8
Port 2233
Host by07
HostName ***.***.4.7
Port 2233
Download the file
Download the framework code on each machine , Ensure that the paths of the framework are all the same .
function
Just in master On the implementation
horovodrun -np 4 -H by07:2,by08:2 python learner.py --config examples/ppo/walker2d_learner_multi.yaml
Then go to each worker Up operation
python actor.py --config examples/ppo/walker2d_actor.yaml
Clear video memory
at present learner Will not automatically exit leading to horovodrun Always occupy the video memory . Need to be in every worker Manually release the video memory on .
First, check the processes that occupy the video memory
fuser -v /dev/nvidia0
And then execute kill, Notice that there is an obvious system process , Unwanted kill
边栏推荐
- 1290_FreeRTOS中prvTaskIsTaskSuspended()接口实现分析
- Spinningup drawing curve
- 睿智的目标检测59——Pytorch Focal loss详解与在YoloV4当中的实现
- Get class files and attributes by reflection
- ROS2——topic话题(八)
- IPage can display data normally, but total is always equal to 0
- new和malloc的区别
- Dameng database all
- 【MySQL8.0不支持表名大写-对应方案】
- Build a microservice cluster environment locally and learn to deploy automatically
猜你喜欢
An article was opened to test the real situation of outsourcing companies
Page type
SD_CMD_SEND_SHIFT_REGISTER
PostMessage communication
Ros2 - configuration development environment (V)
Special training of C language array
摄像头的MIPI接口、DVP接口和CSI接口
Get class files and attributes by reflection
PHY drive commissioning - phy controller drive (II)
Three body goal management notes
随机推荐
Database mysql all
你心目中的数据分析 Top 1 选 Pandas 还是选 SQL?
. Net core stepping on the pit practice
postmessage通信
mysql设置触发器问题
Qt项目中的日志库log4qt使用
基于Cortex-M3、M4的GPIO口位带操作宏定义(可总线输入输出,可用于STM32、ADuCM4050等)
Ros2 - common command line (IV)
【软件测试】05 -- 软件测试的原则
Ros2 - ros2 vs. ros1 (II)
MySQL setting trigger problem
Mid 2022 documentary -- the experience of an ordinary person
小米笔试真题一
解读最早的草图-图像翻译工作SketchyGAN
Matlab在线性代数中的应用(四):相似矩阵及二次型
Binary search (half search)
Lexin interview process
并发编程 — 死锁排查及处理
Initialization of global and static variables
U-Boot初始化及工作流程分析