当前位置:网站首页>项目场景 with ERRTYPE = cudaError CUDA failure 999 unknown error
项目场景 with ERRTYPE = cudaError CUDA failure 999 unknown error
2022-08-02 02:20:00 【mtl1994】
项目场景 [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 999: unknown error ; GPU=24 :
需要升级之前老的程序,之前的cuda 是10.2
问题描述:
环境
cuda 11.2 (之前是10.2)
onnxruntime-gpu 1.10
python 3.9.7
启动程序的时候
Traceback (most recent call last):
File "/home/aiuser/cover/liheng-foggun/app.py", line 15, in <module>
model = DetectMultiBackend(weights=config.paddle.model_file)
File "/home/aiuser/miniconda3/envs/cover/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/aiuser/cover/liheng-foggun/models/yolo.py", line 37, in __init__
self.session = onnxruntime.InferenceSession(weights, providers=['CUDAExecutionProvider'])
File "/home/aiuser/miniconda3/envs/cover/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/aiuser/miniconda3/envs/cover/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 379, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE =
cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*
, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 999: unknown error ; GPU=24 ; hostname=aiserver-sl-01 ; expr=cudaSetDevice(info_.device_id);
原因分析:
1.刚开始以为是onnxruntime-gpu 版本问题 升级到了 1.12 还是报错
2.网上又说是不兼容的问题
3.试试重装下驱动,卸载了11.2 的时候 通过nvidia-smi 发现之前10.2的驱动还存在
4.是因为之前的驱动没有卸载干净
解决方案:
1.卸载10.2
sudo /usr/local/cuda-10.2/bin/cuda-uninstaller
2.安装新驱动
#离线安装 515.57
sudo ./NVIDIA-Linux-x86_64-515.57.run -no-x-check -no-nouveau-check
VIDIA-Linux-x86_64-515.57.run -no-x-check -no-nouveau-check
边栏推荐
- "NetEase Internship" Weekly Diary (2)
- Nanoprobes免疫测定丨FluoroNanogold试剂免疫染色方案
- [LeetCode Daily Question] - 103. Zigzag Level Order Traversal of Binary Tree
- AI target segmentation capability for fast video cutout without green screen
- oracle query scan full table and walk index
- Analysis of the status quo of digital transformation of manufacturing enterprises
- Safety (1)
- FOFAHUB使用测试
- 【 wheeled odometer 】
- CodeTon Round 2 D. Magical Array 规律
猜你喜欢
[LeetCode Daily Question] - 103. Zigzag Level Order Traversal of Binary Tree
Analysis of the status quo of digital transformation of manufacturing enterprises
AWR analysis report questions for help: How can SQL be optimized from what aspects?
LeetCode Brushing Diary: 74. Searching 2D Matrix
项目后台技术Express
Power button 1374. Generate each character string is an odd number
NAS和私有云盘的区别?1篇文章说清楚
2022-07-30 mysql8 executes slow SQL-Q17 analysis
nacos startup error, the database has been configured, stand-alone startup
Redis Subscription and Redis Stream
随机推荐
LeetCode Review Diary: 34. Find the first and last position of an element in a sorted array
软件测试 接口自动化测试 pytest框架封装 requests库 封装统一请求和多个基础路径处理 接口关联封装 测试用例写在yaml文件中 数据热加载(动态参数) 断言
Nanoprobes丨1-巯基-(三甘醇)甲醚功能化金纳米颗粒
nacos启动报错,已配置数据库,单机启动
[LeetCode Daily Question]——654. The largest binary tree
2022-07-30 mysql8执行慢SQL-Q17分析
oracle query scan full table and walk index
【web】Understanding Cookie and Session Mechanism
2023年起,这些地区软考成绩低于45分也能拿证
Nanoprobes丨1-mercapto-(triethylene glycol) methyl ether functionalized gold nanoparticles
Service discovery of kubernetes
BI-SQL丨WHILE
【LeetCode每日一题】——654.最大二叉树
BioVendor Human Club Cellular Protein (CC16) Elisa Kit Research Fields
LeetCode刷题日记:74. 搜索二维矩阵
Ask God to answer, how should this kind of sql be written?
Unable to log in to the Westward Journey
Centos7 install postgresql and enable remote access
Effects of Scraping and Aggregation
NAS和私有云盘的区别?1篇文章说清楚