当前位置:网站首页>[framework] multi learner
[framework] multi learner
2022-07-05 07:09:00 【hanjialeOK】
Run container
Run the container on each machine
docker run -it -v /data:/root/data/ --network host --name multi_learner hanjl/cuda:framework
modify host The host address
Enter the container on each machine , Modify the container's ip Address
vim /etc/hosts
Show
127.0.0.1
127.0.1.1
For example 4.7 Server , Please change the second line to the host address :
***.**.4.7
modify ssh port
Enter the container on each machine , hold ssh Port changed to 2233
sed -i 's/\(^Port\)/#\1/' /etc/ssh/sshd_config
echo Port 2233 >> /etc/ssh/sshd_config
service ssh restart
Each other on each machine ssh, Ensure that all machines can be directly connected without password or confirmation
worker id To configure
stay master Modify the configuration file above
vim ~/.ssh/config
It is amended as follows
Host by08
HostName ***.***.4.8
Port 2233
Host by07
HostName ***.***.4.7
Port 2233
Download the file
Download the framework code on each machine , Ensure that the paths of the framework are all the same .
function
Just in master On the implementation
horovodrun -np 4 -H by07:2,by08:2 python learner.py --config examples/ppo/walker2d_learner_multi.yaml
Then go to each worker Up operation
python actor.py --config examples/ppo/walker2d_actor.yaml
Clear video memory
at present learner Will not automatically exit leading to horovodrun Always occupy the video memory . Need to be in every worker Manually release the video memory on .
First, check the processes that occupy the video memory
fuser -v /dev/nvidia0
And then execute kill, Notice that there is an obvious system process , Unwanted kill
边栏推荐
- 1290_FreeRTOS中prvTaskIsTaskSuspended()接口实现分析
- Concurrent programming - how to interrupt / stop a running thread?
- Ros2 - workspace (V)
- Ros2 - ros2 vs. ros1 (II)
- kata container
- Instruction execution time
- Orin two brushing methods
- GDB code debugging
- Interpretation of the earliest sketches - image translation work sketchygan
- [untitled]
猜你喜欢

Ros2 - configuration development environment (V)

Docker installs MySQL and uses Navicat to connect

PHY drive commissioning --- mdio/mdc interface Clause 22 and 45 (I)

ethtool 原理介绍和解决网卡丢包排查思路(附ethtool源码下载)

Positive height system

1290_ Implementation analysis of prvtaskistasksuspended() interface in FreeRTOS

IPage能正常显示数据,但是total一直等于0

inux摄像头(mipi接口)简要说明

Mutual transformation between two-dimensional array and sparse array (sparse matrix)

Qt项目中的日志库log4qt使用
随机推荐
window navicat连接阿里云服务器mysql步骤及常见问题
ROS2——node节点(七)
[nvidia] CUDA_ VISIBLE_ DEVICES
IPage能正常显示数据,但是total一直等于0
ethtool 原理介绍和解决网卡丢包排查思路(附ethtool源码下载)
Energy conservation and creating energy gap
Netease to B, soft outside, hard in
docker安装mysql并使用navicat连接
[software testing] 03 -- overview of software testing
小米笔试真题一
【obs】x264编码:“buffer_size“
数学分析_笔记_第8章:重积分
Powermanagerservice (I) - initialization
Get class files and attributes by reflection
*P++, (*p) + +, * (p++) differences
Docker installs MySQL and uses Navicat to connect
6-4 search by serial number of linked list
Mathematical analysis_ Notes_ Chapter 8: multiple integral
cgroup_ memcg
Concurrent programming - how to interrupt / stop a running thread?