当前位置:网站首页>Using the cloud GPU + pycharm training model to realize automatic background run programs, save training results, the server automatically power off
Using the cloud GPU + pycharm training model to realize automatic background run programs, save training results, the server automatically power off
2022-08-02 14:28:00 【night cub】
简介
This article is not an introduction to how to use the cloud from scratchGPU和pycharm训练模型,If you want to learn how to use the cloudGPU和pycharmFor training, you can refer to my other two blogs.
云GPU(恒源云)The specific operation process of training
This article is mainly to write about the environment and configuration are completed,之后的一些操作,比如tensorboard、After training, save the training results、The server shuts down automatically、后台训练(Prevent network fluctuations from causing training interruptions)等.
保存结果+自动关机
前提条件,在终端中使用oss loginLog in to Hengyuan Cloud account.
The first step needs to be in the server/root目录下创建一个文件upload.sh
Used to perform packing training results、Automatic upload to personal data、The server automatically shuts down three operations.注意,pycharmThe terminal default path is /root,而jupyterlabThe terminal default path is /,如果在jupyterlabPerform the above operations in the terminal,需要先cd到/root目录下.
我用jupyterlab操作一下,先cd到/root然后vim upload.sh
After the execution is completed, the following interface will appear
第二步,按 i 键进入编辑模式,将下列内容写入upload.sh文件.(想了解vimYou can go to Baidu)
The path in the eighth line of code needs to be modified according to the actual situation
#!/bin/bash
set -e
cd /hy-tmp
# 压缩包名称
file="result-$(date "+%Y%m%d-%H%M%S").zip"
# 把 result 目录做成 zip 压缩包,The directory is adjusted according to the actual situation
zip -q -r "${file}" yolov7-main/runs/train/exp9
# 通过 oss 上传到个人数据中的 backup 文件夹中
oss cp "${file}" oss://backup/
rm -f "${file}"
# 传输成功后关机
shutdown
The training results will be saved in exp4中,So the path is set to exp4
第三步,修改完成后,esc退出编辑模式,:wq! 保存文件并退出.可以在/root目录下找到该文件.
第四步,Add execute permission to the file
chmod u+x upload.sh
第五步,在train.py中添加如下代码,注意代码的位置
import os
os.system('/root/upload.sh') # Put this line of code in train.py训练结束后
后台训练(使用tmux)
Background training is more stable,No need to keep the local computer on all the time,After the configuration is complete, the computer can be shut down.
第一步,创建一个tmux会话(我cdto the root directory,Other directories should be fine).
tmux new -s yolo创建一个名为yolo的会话,tmux detach退出会话,tmux lsList existing sessions.
第二步,tmux a -t yolo重新进入会话,并cd到train.py所在目录,执行该文件.
训练截图
第三步,The execution interface in the second step needs to create a new terminal and enter a command to exit,而jupyterlabThere is no way to create a new terminal,可以通过pycharmthe terminal exitstmux会话.
第四步,训练结束,Package the training results and upload them to personal data(Prevent the training results from being unable to be viewed because the machine is occupied.The files in the personal data can be downloaded and viewed at any time),The server shuts down automatically(Automatic shutdown after training to save costs)
tensorboardX
第一步,Close officialtensorboard功能(Because of Hengyuan Cloudtensorboard需要下载get_started.ipynb,I keep getting failed,所以改用tensorboardX ).
supervisord ctl stop tensorboard
第二步,使用tensorboardX功能.(注意,终端不要关闭,否则tensorboardX会断开)
tensorboard --logdir /hy-tmp/runs/train/exp4 --host 0.0.0.0
第三步,打开官方的tensorboard, You can see the real-time training process.
pycharmTwo server setup issues in
1. 重启pycharm后,The server options may not appear at the bottom,That is, there may be no default server in the following figure
Solved by the following steps
2.when a server is not in use,将其在pycharm中删除,比如端口号为59341的服务器.
在1Inside the location delete server
Delete both the interpreter and the server in the settings
到此,该服务器在pycharm中彻底删除.
如果有错,请不吝赐教.如果有疑问,可以评论或者私信.
边栏推荐
- paddleocr window10 first experience
- 安装使用——百家CMS微商城说明文档(2)
- 浏览器报错数字代表的大概意思
- ThinkPHP5.0内置分页函数Paginate无法获取POST页数问题的解决办法
- OpenCart迁移到其他服务器
- uniCloud 未能获取当前用户信息:30205 | 当前用户为匿名身份
- VS Code无法安装插件之Unable to install because, the extension '' compatible with current version
- How does Apache, the world's largest open source foundation, work?
- 主存储器(一)
- C语言初级—用一角,两角,五角和一元组成3.5元有多少种组合方法
猜你喜欢
宏定义问题记录day2
window10下半自动标注
Unit 5 Hold Status
yolov5,yolov4,yolov3乱七八糟的
[ROS] The software package of the industrial computer does not compile
8581 Linear linked list inversion
C语言日记 7 输入/输出格式控制
Briefly write about the use and experience of PPOCRLabel
STM32 (F407) - stack
Visual Studio配置OpenCV之后,提示:#include<opencv2/opencv.hpp>无法打开源文件
随机推荐
[ROS] (05) ROS Communication - Node, Nodes & Master
Verilog学习 系列
Unit 3 view layer
The specific operation process of cloud GPU (Hengyuan cloud) training
C语言日记 4 变量
C语言日记 5 运算符和表达式
C语言初级—常见问题(100~200素数,计算1+11+111+...,从键盘获取一个数并输出有几个位)
8583 顺序栈的基本操作
[ROS]roscd和cd的区别
C语言初级—用一角,两角,五角和一元组成3.5元有多少种组合方法
Paddle window10 environment using conda installation
C语言待解决
C语言日记 6 基本输入/输出
PHP版本切换:5.x到7.3
【Camera2】由Camera2 特性想到的有关MED(多场景设备互动)的场景Idea
【ROS】工控机的软件包不编译
C语言一级指针(补)
利用c语言实现对键盘输入的一串字符的各类字符的计数
线性代数期末复习存档
A little thought about password encryption