当前位置:网站首页>MindSpore:【resnet_thor模型】尝试运行resnet_thor时报Could not convert to
MindSpore:【resnet_thor模型】尝试运行resnet_thor时报Could not convert to
2022-07-30 19:04:00 【小乐快乐】
问题描述:
【功能模块】
用mindspore-ascend-1.1.1 运行resnet_thor(仓库地址:https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet_thor)时报错。
【操作步骤&问题现象】
1、解压imagenet2012数据集
2、注释掉src/dataset_helper.py中的160-162行(否则这里会抛出异常)

3、cd resnet_thor && python train.py --dataset_path=/home/ImageNet2012_origin
报错信息:
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
[ERROR] CORE(167346,python):2021-03-31-17:06:03.564.646 [mindspore/core/utils/status.cc:43] Status] Thread ID 281470327271920 Unexpected error. Could not convert to CV Tensor
Line of code : 142
File : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc
Traceback (most recent call last):
File "train.py", line 143, in
model.train(config.epoch_size, dataset, callbacks=cb)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train
sink_size=sink_size)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
File "/home/resnet_thor/src/model_thor.py", line 183, in _train_dataset_sink_process
iter_first_order=iter_first_order)
File "/home/resnet_thor/src/model_thor.py", line 122, in _exec_preprocess
dataset_helper = DatasetHelper(dataset, dataset_sink_mode, sink_size, epoch_num, iter_first_order)
File "/home/resnet_thor/src/dataset_helper.py", line 72, in init
self.iter = iterclass(dataset, sink_size, epoch_num, iter_first_order)
File "/home/resnet_thor/src/dataset_helper.py", line 156, in init
super().init(dataset, sink_size, epoch_num)
File "/home/resnet_thor/src/dataset_helper.py", line 106, in init
dataset.transfer_dataset = _exec_datagraph(dataset, self.sink_size)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/_utils.py", line 62, in _exec_datagraph
dataset_types, dataset_shapes = _get_types_and_shapes(exec_dataset)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/_utils.py", line 51, in _get_types_and_shapes
dataset_types = _convert_type(dataset.output_types())
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/dataset/engine/datasets.py", line 1443, in output_types
self.saved_output_shapes = runtime_getter[0].GetOutputShapes()
RuntimeError: Thread ID 281470327271920 Unexpected error. Could not convert to CV Tensor
Line of code : 142
File : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc
报错截图:

解决方案:
看报错应该是数据集使用方式不对,应该是数据集路径没有使用到训练那级的路径,排查下数据集,可以试下
python train.py --dataset_path=/home/ImageNet2012_origin/train
参考了@zhaoting_731 做了修改后,原来的问题解决了,但是遇到了新的报错

看起来似乎和hccl 多卡训练有关系,但我运行的命令是:
python train.py --dataset_path=/home/ImageNet2012_origin/ilsvrc
所以run_distribute是默认的False,走的应该是单卡训练
错误信息:
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
[ERROR] HCCL_ADPT(78728,python):2021-04-06-20:10:05.673.721 [mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc:124] GenTask] : The pointer[ops_kernel_builder] is null.
Traceback (most recent call last):
File "train.py", line 143, in <module>
model.train(config.epoch_size, dataset, callbacks=cb)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train
sink_size=sink_size)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
File "/home/thor/mindspore/model_zoo/official/cv/resnet_thor/src/model_thor.py", line 254, in _train_dataset_sink_process
outputs = self._train_network(*inputs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 322, in __call__
out = self.compile_and_run(*inputs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 578, in compile_and_run
self.compile(*inputs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 565, in compile
_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/common/api.py", line 505, in compile
result = self._executor.compile(obj, args_list, phase, use_vm)
RuntimeError: mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc:124 GenTask] : The pointer[ops_kernel_builder] is null.
model zoo中的这个示例主要是针对多卡场景的,目前我们已经将resnet及resnet_thor脚本合并为resnet,如果想要运行单卡训练的话,推荐使用resnet目录下的代码,将src/config.py中的优化器改为Thor,然后按照README 执行训练。如:
python train.py --net=resnet50 --dataset=imagenet2012 --device_target=Ascend --dataset_path=[DATASET_PATH]
边栏推荐
猜你喜欢

Node encapsulates a console progress bar plugin

MySQL data types

AI基础:图解Transformer

Swiper轮播图片并播放背景音乐

MongoDB打破了原则引入SQL?

The advanced version of the Niu Ke brushing series (team competition, sorting subsequences, inverting strings, deleting common characters, repairing pastures)

SimpleOSS第三方库libcurl与引擎libcurl错误解决方法

【剑指 Offe】剑指 Offer 18. 删除链表的节点

CIMC Shilian Dafeitong is the global industrial artificial intelligence AI leader, the world's top AI core technology, high generalization, high robustness, sparse sample continuous learning, industri

VBA 连接Access数据库和Excle
随机推荐
Range.CopyFromRecordset 方法 (Excel)
【网站放大镜效果】两种方式实现
LeetCode Exercise - Two Questions About Finding Sum of Array Elements
Alibaba Cloud Martial Arts Headline Event Sharing
Tensorflow2.0 confusion matrix does not match printing accuracy
还有三天忙完
NC | 西湖大学陶亮组-TMPRSS2“助攻”病毒感染并介导索氏梭菌出血毒素的宿主入侵...
浅聊对比学习(Contrastive Learning)第一弹
NXP IMX8QXP replacement DDR model operation process
【hbuilder】运行不了部分项目 , 打开终端 无法输入指令
监听开机广播
The use of terminal split screen tool Terminalx
WeChat Mini Program Cloud Development | Urban Information Management
NC | Tao Liang Group of West Lake University - TMPRSS2 "assists" virus infection and mediates the host invasion of Clostridium sothrix hemorrhagic toxin...
卫星电话是直接与卫星通信还是通过地面站?
Witness the magical awakening of the mini world in HUAWEI CLOUD
VS Code 连接SQL Server
阿里云武林头条活动分享
kotlin的by lazy
natural language processing nltk