当前位置:网站首页>[framework] multi learner
[framework] multi learner
2022-07-05 07:09:00 【hanjialeOK】
Run container
Run the container on each machine
docker run -it -v /data:/root/data/ --network host --name multi_learner hanjl/cuda:framework
modify host The host address
Enter the container on each machine , Modify the container's ip Address
vim /etc/hosts
Show
127.0.0.1
127.0.1.1
For example 4.7 Server , Please change the second line to the host address :
***.**.4.7
modify ssh port
Enter the container on each machine , hold ssh Port changed to 2233
sed -i 's/\(^Port\)/#\1/' /etc/ssh/sshd_config
echo Port 2233 >> /etc/ssh/sshd_config
service ssh restart
Each other on each machine ssh, Ensure that all machines can be directly connected without password or confirmation
worker id To configure
stay master Modify the configuration file above
vim ~/.ssh/config
It is amended as follows
Host by08
HostName ***.***.4.8
Port 2233
Host by07
HostName ***.***.4.7
Port 2233
Download the file
Download the framework code on each machine , Ensure that the paths of the framework are all the same .
function
Just in master On the implementation
horovodrun -np 4 -H by07:2,by08:2 python learner.py --config examples/ppo/walker2d_learner_multi.yaml
Then go to each worker Up operation
python actor.py --config examples/ppo/walker2d_actor.yaml
Clear video memory
at present learner Will not automatically exit leading to horovodrun Always occupy the video memory . Need to be in every worker Manually release the video memory on .
First, check the processes that occupy the video memory
fuser -v /dev/nvidia0
And then execute kill, Notice that there is an obvious system process , Unwanted kill
边栏推荐
- SD_CMD_SEND_SHIFT_REGISTER
- Unity 之 ExecuteAlways正在取代ExecuteInEditMode
- SOC_SD_CMD_FSM
- Ros2 - common command line (IV)
- Lexin interview process
- 解读最早的草图-图像翻译工作SketchyGAN
- All English in the code
- Xavier CPU & GPU high load power consumption test
- [MySQL 8.0 does not support capitalization of table names - corresponding scheme]
- 2022.06.27_每日一题
猜你喜欢
A brief introduction to heading/pitch/roll and omega/phi/kappa
ROS2——ROS2对比ROS1(二)
Special training of C language array
Marvell 88e1515 PHY loopback mode test
[software testing] 03 -- overview of software testing
你心目中的数据分析 Top 1 选 Pandas 还是选 SQL?
UTC, GPS time and Tai
Volcano resource reservation feature
Concurrent programming - deadlock troubleshooting and handling
Get class files and attributes by reflection
随机推荐
postmessage通信
UTC, GPS time and Tai
Cloud native related technology learning
The difference between new and malloc
Volcano 资源预留特性
1290_ Implementation analysis of prvtaskistasksuspended() interface in FreeRTOS
window navicat连接阿里云服务器mysql步骤及常见问题
ethtool 原理介绍和解决网卡丢包排查思路(附ethtool源码下载)
[MySQL 8.0 does not support capitalization of table names - corresponding scheme]
Inftnews | drink tea and send virtual stocks? Analysis of Naixue's tea "coin issuance"
. Net core stepping on the pit practice
Concurrent programming - deadlock troubleshooting and handling
U-Boot初始化及工作流程分析
In C language, int a= 'R'
Dameng database all
Log4qt usage of logbase in QT project
Build a microservice cluster environment locally and learn to deploy automatically
Ros2 - Service Service (IX)
1290_FreeRTOS中prvTaskIsTaskSuspended()接口实现分析
Use the Paping tool to detect TCP port connectivity