当前位置:网站首页>MindSpore:【resnet_thor模型】尝试运行resnet_thor时报Could not convert to
MindSpore:【resnet_thor模型】尝试运行resnet_thor时报Could not convert to
2022-07-30 19:04:00 【小乐快乐】
问题描述:
【功能模块】
用mindspore-ascend-1.1.1 运行resnet_thor(仓库地址:https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet_thor)时报错。
【操作步骤&问题现象】
1、解压imagenet2012数据集
2、注释掉src/dataset_helper.py中的160-162行(否则这里会抛出异常)

3、cd resnet_thor && python train.py --dataset_path=/home/ImageNet2012_origin
报错信息:
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
[ERROR] CORE(167346,python):2021-03-31-17:06:03.564.646 [mindspore/core/utils/status.cc:43] Status] Thread ID 281470327271920 Unexpected error. Could not convert to CV Tensor
Line of code : 142
File : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc
Traceback (most recent call last):
File "train.py", line 143, in
model.train(config.epoch_size, dataset, callbacks=cb)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train
sink_size=sink_size)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
File "/home/resnet_thor/src/model_thor.py", line 183, in _train_dataset_sink_process
iter_first_order=iter_first_order)
File "/home/resnet_thor/src/model_thor.py", line 122, in _exec_preprocess
dataset_helper = DatasetHelper(dataset, dataset_sink_mode, sink_size, epoch_num, iter_first_order)
File "/home/resnet_thor/src/dataset_helper.py", line 72, in init
self.iter = iterclass(dataset, sink_size, epoch_num, iter_first_order)
File "/home/resnet_thor/src/dataset_helper.py", line 156, in init
super().init(dataset, sink_size, epoch_num)
File "/home/resnet_thor/src/dataset_helper.py", line 106, in init
dataset.transfer_dataset = _exec_datagraph(dataset, self.sink_size)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/_utils.py", line 62, in _exec_datagraph
dataset_types, dataset_shapes = _get_types_and_shapes(exec_dataset)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/_utils.py", line 51, in _get_types_and_shapes
dataset_types = _convert_type(dataset.output_types())
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/dataset/engine/datasets.py", line 1443, in output_types
self.saved_output_shapes = runtime_getter[0].GetOutputShapes()
RuntimeError: Thread ID 281470327271920 Unexpected error. Could not convert to CV Tensor
Line of code : 142
File : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc
报错截图:

解决方案:
看报错应该是数据集使用方式不对,应该是数据集路径没有使用到训练那级的路径,排查下数据集,可以试下
python train.py --dataset_path=/home/ImageNet2012_origin/train
参考了@zhaoting_731 做了修改后,原来的问题解决了,但是遇到了新的报错

看起来似乎和hccl 多卡训练有关系,但我运行的命令是:
python train.py --dataset_path=/home/ImageNet2012_origin/ilsvrc
所以run_distribute是默认的False,走的应该是单卡训练
错误信息:
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
[ERROR] HCCL_ADPT(78728,python):2021-04-06-20:10:05.673.721 [mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc:124] GenTask] : The pointer[ops_kernel_builder] is null.
Traceback (most recent call last):
File "train.py", line 143, in <module>
model.train(config.epoch_size, dataset, callbacks=cb)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train
sink_size=sink_size)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
File "/home/thor/mindspore/model_zoo/official/cv/resnet_thor/src/model_thor.py", line 254, in _train_dataset_sink_process
outputs = self._train_network(*inputs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 322, in __call__
out = self.compile_and_run(*inputs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 578, in compile_and_run
self.compile(*inputs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 565, in compile
_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/common/api.py", line 505, in compile
result = self._executor.compile(obj, args_list, phase, use_vm)
RuntimeError: mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc:124 GenTask] : The pointer[ops_kernel_builder] is null.
model zoo中的这个示例主要是针对多卡场景的,目前我们已经将resnet及resnet_thor脚本合并为resnet,如果想要运行单卡训练的话,推荐使用resnet目录下的代码,将src/config.py中的优化器改为Thor,然后按照README 执行训练。如:
python train.py --net=resnet50 --dataset=imagenet2012 --device_target=Ascend --dataset_path=[DATASET_PATH]
边栏推荐
猜你喜欢

【剑指 Offer】剑指 Offer 22. 链表中倒数第k个节点

6 yuan per catty, why do Japanese companies come to China to collect cigarette butts?

"Ruffian Heng Embedded Bimonthly" Issue 59

云数据库和本地数据库有什么区别?

Fixed asset visualization intelligent management system

Scrapy框架介绍

Critical Reviews | A review of the global distribution of antibiotics and resistance genes in farmland soil by Nannong Zou Jianwen's group

SimpleOSS third-party library libcurl and engine libcurl error solution

中集世联达飞瞳全球工业人工智能AI领军者,全球顶尖AI核心技术高泛化性高鲁棒性稀疏样本持续学习,工业级高性能成熟AI产品规模应用

Read the "Language Model" in one article
随机推荐
Basic use of scrapy
golang日志库zerolog使用记录
MYSQL(基本篇)——一篇文章带你走进MYSQL的奇妙世界
Recommendation | People who are kind to you, don't repay them by inviting them to eat
开心的聚餐
Another company interview
SwiftUI iOS 精品开源项目之 完整烘焙食品菜谱App基于SQLite(教程含源码)
SwiftUI iOS Boutique Open Source Project Complete Baked Food Recipe App based on SQLite (tutorial including source code)
AI基础:图解Transformer
【Pointing to Offer】Pointing to Offer 18. Delete the node of the linked list
一文读懂“语言模型”
【总结】1396- 60+个 VSCode 插件,打造好用的编辑器
【Prometheus】Prometheus联邦的一次优化记录[续]
浅聊对比学习(Contrastive Learning)第一弹
AI Basics: Graphical Transformer
Go 系统收集
OneFlow source code analysis: Op, Kernel and interpreter
攻防世界web-Cat
实体中增加操作方法
【Swords Offer】Swords Offer 17. Print n digits from 1 to the largest