当前位置:网站首页>【ModelArts系列】华为ModelArts Notebook训练yolov3模型(开发环境)
【ModelArts系列】华为ModelArts Notebook训练yolov3模型(开发环境)
2022-07-30 02:21:00 【花花少年】
一、参考资料
二、相关介绍
在ModelArts的 notebook中运行ModelZoo中模型,以yolov3为例,训练集为 COCO2014。
运行环境:ModelArts notebook
模型:ModelZoo,yolov3
数据集:COCO2014
镜像:tensorflow1.15-mindspore1.5.1-cann5.0.2-euler2.8-aarch64
规格:Ascend: 1*Ascend-910(32GB) | ARM: 24 核 96GB
如果要删除notebook,请及时备份到obs中,以免造成不必要的麻烦;
三、关键操作
3.1 准备数据集
华为 OBS上传notebook以及notebook上传到OBS
下载COCO2014数据集,下载地址:
链接:https://pan.baidu.com/s/16sxIpFs-hd-6FzN2rSHgqA
提取码:1234数据集上传到obs
用obs-browser客户端上传COCO2014数据集到OBS。

拷贝obs数据集到notebook
在notebook操作。
import moxing as mox
# COCO_COCO_2014_Train_Images.zip
mox.file.copy_parallel('obs://liulingjun-demo/datasets/COCO2014/COCO_COCO_2014_Train_Val_annotations.zip','/home/ma-user/work/COCO_COCO_2014_Train_Val_annotations.zip')
print('Copy procedure is completed !')
# COCO_COCO_2014_Val_Images.zip
mox.file.copy_parallel('obs://liulingjun-demo/datasets/COCO2014/COCO_COCO_2014_Val_Images.zip','/home/ma-user/work/COCO_COCO_2014_Val_Images.zip')
print('Copy procedure is completed !')
# COCO_COCO_2014_Train_Images.zip
mox.file.copy_parallel('obs://liulingjun-demo/datasets/COCO2014/COCO_COCO_2014_Train_Images.zip','/home/ma-user/work/COCO_COCO_2014_Train_Images.zip')
print('Copy procedure is completed !')
解压数据集
在notebook操作。
cd work unzip COCO_COCO_2014_Train_Val_annotations.zip -d ./COCO2014[ma-user COCO2014]$ll total 6780 drwx------ 2 ma-user ma-group 4096 Jun 18 08:43 annotations drwxrwxr-x 2 ma-user ma-group 4620288 Aug 16 2014 train2014 drwxrwxr-x 2 ma-user ma-group 2310144 Aug 16 2014 val2014(可选)解压后的数据集拷贝回obs
import moxing as mox mox.file.copy_parallel('/home/ma-user/work/COCO2014', 'obs://liulingjun-demo/yolov3/dataset') print('Copy procedure is completed !')
3.2 准备预训练模型

将 YOLOv3_TensorFlow_1.6_model/single/ckpt 路径下的模型文件拷贝到 YoloV3_for_TensorFlow_1.6_code/data/darknet_weights,并重命名为 darknet53.ckpt;
3.3 准备源码
下载源码到本地(笔记本)

解压并修改源码
准备txt标注文件
根据COCO2014数据集的实际路径使用
coco_trainval_anns.py和coco_minival_anns.py分别生成 训练和验证样本标注文件coco2014_trainval.txt和coco2014_minival.txt并放置于YoloV3_for_TensorFlow_1.6_code/data录下。# 1. 修改源码中的路径 # 2. 执行 coco_trainval_anns.py python coco_trainval_anns.py # 3. 执行 coco_minival_anns.py python coco_minival_anns.py修改txt标注文件的路径
/opt/npu/dataset/coco/coco2014/修改为/home/ma-user/work/COCO2014/修改
train.py根据
train.py源代码可知,默认的训练模式是single,加载args_single.py中的配置参数,所以修改args_single.py配置参数即可。train.pyparser.add_argument("--mode", type=str, default='single', help="setting train mode of training.") if args_input.mode == 'single': import args_single as argsargs_single.py参数基本上默认即可。
### Some paths train_file = os.path.join(work_path, './data/coco2014_trainval.txt') # The path of the training txt file. val_file = os.path.join(work_path, './data/coco2014_minival.txt') # The path of the validation txt file. restore_path = os.path.join(work_path, './data/darknet_weights/darknet53.ckpt') # The path of the weights to restore. anchor_path = os.path.join(work_path, './data/yolo_anchors.txt') # The path of the anchor txt file. class_name_path = os.path.join(work_path, './data/coco.names') # The path of the class names. ... ... ... ### other training strategies multi_scale_train = False # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default. use_label_smooth = False # Whether to use class label smoothing strategy. use_focal_loss = False # Whether to apply focal loss on the conf loss. use_mix_up = False # Whether to use mix up data augmentation strategy. use_warm_up = True # whether to use warm up strategy to prevent from gradient exploding. warm_up_epoch = min(total_epoches*0.1, 3) # Warm up training epoches. Set to a larger value if gradient explodes.压缩源码文件,并上传到obs
将修改好的源码压缩成zip文件,上传到obs。
拷贝并解压源码
拷贝obs的源码到notebook
import moxing as mox # COCO_COCO_2014_Train_Images.zip mox.file.copy_parallel('obs://liulingjun-demo/cache/YoloV3_for_TensorFlow_1.6_code.zip','/home/ma-user/work/YoloV3_for_TensorFlow_1.6_code.zip') print('Copy procedure is completed !')解压源码
cd /home/ma-user/work unzip YoloV3_for_TensorFlow_1.6_code.zip
3.4 训练模型
在 /home/ma-user/work/YoloV3_for_TensorFlow_1.6_code 路径下创建 notebook,执行以下指令开启训练:
!python train.py
训练output输出到 /YoloV3_for_TensorFlow_1.6_code/training/ 路径下。
3.5 运行成功
Thu, 28 Jul 2022 09:52:13 INFO shuffle seed_0 args.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/util/random_seed.py:58: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Thu, 28 Jul 2022 09:52:13 WARNING Entity <function <lambda> at 0xffff6711b560> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Str'
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:139: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means `tf.py_function`s can use accelerators such as GPUs as well as being differentiable using a gradient tape. - tf.numpy_function maintains the semantics of the deprecated tf.py_func (it is not differentiable, and manipulates numpy arrays). It drops the stateful argument making all functions stateful. Thu, 28 Jul 2022 09:52:13 WARNING Entity <function valid_shape at 0xffff6711b950> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Index'
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:173: The name tf.data.Iterator is deprecated. Please use tf.compat.v1.data.Iterator instead.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:173: DatasetV1.output_types (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(dataset)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:173: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:187: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
Thu, 28 Jul 2022 09:52:18 WARNING From /home/ma-user/work/yolov3-tensorflow/code/utils/layer_utils.py:114: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.
Thu, 28 Jul 2022 09:52:19 WARNING From /home/ma-user/work/yolov3-tensorflow/code/model.py:336: The name tf.log is deprecated. Please use tf.math.log instead.
Thu, 28 Jul 2022 09:52:20 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:190: The name tf.losses.get_regularization_loss is deprecated. Please use tf.compat.v1.losses.get_regularization_loss instead.
Thu, 28 Jul 2022 09:52:20 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:193: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:197: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.
Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:230: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
Thu, 28 Jul 2022 09:52:21 INFO total_steps: 200000
Thu, 28 Jul 2022 09:52:21 INFO warmup_steps: 3000
Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/utils/misc_utils.py:184: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.
Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:247: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:247: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.
Thu, 28 Jul 2022 09:52:21 DEBUG compute_gradients...
Thu, 28 Jul 2022 09:52:31 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:253: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.
Thu, 28 Jul 2022 09:52:31 WARNING From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/npu_loss_scale_optimizer.py:159: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:262: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:295: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:297: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:297: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.
Thu, 28 Jul 2022 09:53:11 INFO Restoring parameters from /home/ma-user/work/yolov3-tensorflow/code/./data/darknet_weights/darknet53.ckpt
Thu, 28 Jul 2022 09:53:18 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:306: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.
Thu, 28 Jul 2022 09:53:18 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:307: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.
Thu, 28 Jul 2022 09:54:14 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Thu, 28 Jul 2022 09:54:14 WARNING From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/util.py:206: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
Thu, 28 Jul 2022 09:58:08 INFO Epoch: 0, global_step: 9 fps: 0.75 lr: 0.000007 | loss: total: 6.61, xy: 0.31, wh: 0.97, conf: 2.96, class: 2.38 |
Thu, 28 Jul 2022 09:58:09 INFO Epoch: 0, global_step: 19 fps: 168.07 lr: 0.000015 | loss: total: 14.65, xy: 0.72, wh: 2.12, conf: 8.35, class: 3.46 |
...
...
...
3.6 资源占用情况

CPU占用情况
1*Ascend 910 CPU24核 内存96GiB(CUE评分1943) (modelarts.kat1.xlarge)
8*Ascend 910 CPU192核 内存720GiB(CUE评分15544) (modelarts.kat1.8xlarge)
内存占用情况
NPU占用情况
四、FAQ
Q:修改notebook规格不合法
ModelArts.6405: RUNNING status not allowed update flavor, "modelarts.kat1.xlarge" 不合法
8*Ascend910 降为 1*Ascend910,降配失败


Q:文件超过100MB,notebook上传文件失败

解决办法:
上传到obs转存即可。
Q:磁盘空间不足


解决办法:
存储容量扩容
如果不能扩容,则重新创建notebook,配置更大的存储容量。

边栏推荐
猜你喜欢

JS Bom window innerWidth clientWidth onresize 窗口滚动偏移量 返回顶部

测试/开发程序员面试该如何谈薪资待遇呢?突破这个坎......

uni-app如何配置APP自定义顶部标题栏

JS history.back() go(-1) Location 跳转 重新加载页面 get请求 返回顶部 bom

SwiftUI SQLite数据库存储使用教程大合集(2022年版)

戴尔首款纯软产品,再定义下一代对象存储

RAII Technology Learning

OSPF shamlink 解决后门链路问题

centOS安装MySQL详解

What to test for app testing
随机推荐
Not enough information to list load addresses in the image map.(STM32编译报错)
绘图问题记录
复旦-华盛顿大学EMBA科创的奥E丨《神奇的材料》与被塑造的我们
sublime 背景透明度以及列编辑
JS Bom window innerWidth clientWidth onresize 窗口滚动偏移量 返回顶部
Houdini 地形知识点
绘制热度图、频谱图、地形图、colormap
【C语言刷LeetCode】592. 分数加减运算(M)
SwiftUI SQLite数据库存储使用教程大合集(2022年版)
【笔记】结巴分词绘制词云图
diff和key的作用
[深入研究4G/5G/6G专题-45]: 5G Link Adaption链路自适应-1-总体架构
RAII Technology Learning
Understanding the prototype chain in js, what problem does the prototype chain solve?
c语言进阶篇:指针(四)
Push the image to the Alibaba Cloud private warehouse
LeetCode 2352. Equal Row Column Pairs
App测试需要测什么
mysql 报错 is too long for user name (should be no longer than 16)
postgresql日常运维技能,适合初学者