当前位置:网站首页>华为ModelArts训练Alexnet模型
华为ModelArts训练Alexnet模型
2022-07-01 07:15:00 【花花少年】
一、参考资料
二、关键步骤
2.1 上传源码到obs
.
|-- dataset # 数据集
| |-- train
| `-- val `-- train
|-- data.py
|-- model.py
`-- train_npu.py # 启动文件
2.2 obs创建目录
obs目录结构
log:日志存放路径
output:训练输出路径

2.3 创建算法
算法管理
---》创建
名称:自定义
描述:自定义
创建方式:自定义
AI引擎:tensorflow_1.15-cann_5.0.3-py_3.7-euler_2.8.3-aarch64
代码目录:/xxx/alexnet/train/
启动文件:/xxx/alexnet/train/train_npu.py
输入数据配置:数据集路径
映射名称:自定义(默认即可)
代码路径参数:data_url
输出数据配置:训练输出路径
映射名称:自定义(默认即可)
代码路径参数:train_url
其他:默认即可

2.4 创建训练作业
训练管理
---》训练作业
---》创建
名称:自定义
描述:自定义
算法:
---》我的算法
---》勾选刚创建的算法
训练输入:
---》data_url:/xxx/alexnet/dataset/
训练输出:
---》train_url:/xxx/alexnet/output/
资源池:根据实际需求
资源类型:Ascend
规格:根据实际需求
计算节点个数:根据实际需求
作业日志路径:/xxx/alexnet/log/

2.5 训练完成

三、FAQ
Q:The input shape of GeOp5_0 is dynamic
【南京大学】【MEMNET】【ID1085】 模型 npu迁移时报错: The input shape of GeOp5_0 is dynamic
File "/home/ma-user/modelarts/user-job-dir/train/train_npu.py", line 114, in main
test_loss, test_acc, summary = sess.run([cost, accuracy, summary_op], feed_dict=val_feed)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: The input shape of GeOp5_0 is dynamic, please ensure that npu option[dynamic_input] is set correctly, for more details please refer to the migration guide.
[[{
{
node GeOp5_0}}]]
错误原因:
模型训练过程中存在输入shape变化的问题
解决办法:
train_npu.py中添加
custom_op.parameter_map["dynamic_input"].b = True
custom_op.parameter_map["dynamic_graph_execute_mode"].s = tf.compat.as_bytes("lazy_recompile")
边栏推荐
- MySQL table partition creation method
- Rclone access web interface
- 熱烈祝賀五行和合酒成功掛牌
- Are there any practical skills for operation and maintenance management
- Automated test platform (13): interface automation framework and platform comparison, application scenario analysis and design ideas sharing
- Programming examples of stm32f1 and stm32subeide infrared receiving and decoding of NEC protocol
- 赌上了绩效,赢了公司CTO,我要搭DevOps平台!
- Rclone configuring Minio and basic operations
- [FPGA frame difference] FPGA implementation of frame difference target tracking based on vmodcam camera
- rclone 访问web界面
猜你喜欢

盘点华为云GaussDB(for Redis)六大秒级能力

【计网】(一) 集线器、网桥、交换机、路由器等概念

Jax's deep learning and scientific computing

【目标检测】目标检测界的扛把子YOLOv5(原理详解+修炼指南)
![Those high-frequency written tests and interview questions in [Jianzhi offer & Niuke 101] - linked list](/img/9a/44976b5df5567a7aff315e63569f6a.png)
Those high-frequency written tests and interview questions in [Jianzhi offer & Niuke 101] - linked list

C语言实现【扫雷游戏】完整版(实现源码)

如何进入互联网行业,成为产品经理?没有项目经验如何转行当上产品经理?

Système de gestion de l'exploitation et de l'entretien, expérience d'exploitation humanisée

K8s set up redis cluster

女生适合学产品经理吗?有什么优势?
随机推荐
Problem: officeexception: failed to start and connect (II)
C# 读写自定义的Config文件
Solve the problem of "unexpected status code 503 service unavailable" when kaniko pushes the image to harbor
(I) apple has open source, but so what?
C语言实现【扫雷游戏】完整版(实现源码)
為什麼這麼多人轉行產品經理?產品經理發展前景如何?
广发证券开户是安全可靠的么?怎么开广发证券账户
Inventory the six second level capabilities of Huawei cloud gaussdb (for redis)
运维管理系统,人性化操作体验
Product learning (II) - competitive product analysis
女生适合学产品经理吗?有什么优势?
Programming examples of stm32f1 and stm32subeide infrared receiving and decoding of NEC protocol
灰度何以跌下神坛?
Rclone configuring Minio and basic operations
在券商账户上买基金安全吗
Why are so many people turning to product managers? What is the development prospect of product manager?
Why did grayscale fall from the altar?
JAX的深度学习和科学计算
[classification model] Q-type cluster analysis
Problem: officeexception: failed to start and connect (III)