当前位置:网站首页>[framework] multi learner
[framework] multi learner
2022-07-05 07:09:00 【hanjialeOK】
Run container
Run the container on each machine
docker run -it -v /data:/root/data/ --network host --name multi_learner hanjl/cuda:framework
modify host The host address
Enter the container on each machine , Modify the container's ip Address
vim /etc/hosts
Show
127.0.0.1
127.0.1.1
For example 4.7 Server , Please change the second line to the host address :
***.**.4.7
modify ssh port
Enter the container on each machine , hold ssh Port changed to 2233
sed -i 's/\(^Port\)/#\1/' /etc/ssh/sshd_config
echo Port 2233 >> /etc/ssh/sshd_config
service ssh restart
Each other on each machine ssh, Ensure that all machines can be directly connected without password or confirmation
worker id To configure
stay master Modify the configuration file above
vim ~/.ssh/config
It is amended as follows
Host by08
HostName ***.***.4.8
Port 2233
Host by07
HostName ***.***.4.7
Port 2233
Download the file
Download the framework code on each machine , Ensure that the paths of the framework are all the same .
function
Just in master On the implementation
horovodrun -np 4 -H by07:2,by08:2 python learner.py --config examples/ppo/walker2d_learner_multi.yaml
Then go to each worker Up operation
python actor.py --config examples/ppo/walker2d_actor.yaml
Clear video memory
at present learner Will not automatically exit leading to horovodrun Always occupy the video memory . Need to be in every worker Manually release the video memory on .
First, check the processes that occupy the video memory
fuser -v /dev/nvidia0
And then execute kill, Notice that there is an obvious system process , Unwanted kill
边栏推荐
- Spinningup drawing curve
- 6-2 sequence table operation set
- 网易To B,柔外刚中
- Ros2 - common command line (IV)
- PHY drive commissioning - phy controller drive (II)
- [software testing] 02 -- software defect management
- 在本地搭建一个微服务集群环境,学习自动化部署
- GPIO port bit based on Cortex-M3 and M4 with operation macro definition (can be used for bus input and output, STM32, aducm4050, etc.)
- IPage能正常显示数据,但是total一直等于0
- ROS2——ROS2对比ROS1(二)
猜你喜欢
随机推荐
Get class files and attributes by reflection
【无标题】
Qt项目中的日志库log4qt使用
kata container
Now there are HTML files and MVC made with vs (connected to the database). How can they be connected?
Ros2 topic (VIII)
Skywalking全部
Mid 2022 documentary -- the experience of an ordinary person
SD_CMD_SEND_SHIFT_REGISTER
[nvidia] CUDA_ VISIBLE_ DEVICES
Ros2 - Service Service (IX)
U-Boot初始化及工作流程分析
Orin installs CUDA environment
Mathematical analysis_ Notes_ Chapter 8: multiple integral
数学分析_笔记_第8章:重积分
你心目中的数据分析 Top 1 选 Pandas 还是选 SQL?
An article was opened to test the real situation of outsourcing companies
Lexin interview process
Marvell 88E1515 PHY loopback模式测试
扫盲-以太网MII接口类型大全-MII、RMII、SMII、GMII、RGMII、SGMII、XGMII、XAUI、RXAUI