当前位置:网站首页>Deploy crawl detection network using tensorrt (I)
Deploy crawl detection network using tensorrt (I)
2022-07-03 05:17:00 【Qianyu QY】
1. TensorRT brief introduction
tensorRT Yes, you can. NVIDIA Various GP U One running under C++ The frame of reasoning . We use Pytorch、TF Trained models , Can be converted to TensorRT The format of , And then use TensorRT The inference engine runs this model , So as to improve the model in NVIDIA GPU Running speed on , Generally, it can be increased by several times ~ Dozens of times .
Mainstream pytorch Deployment path :
- pytorch → \rightarrow → ONNX → \rightarrow → TensorRT
- torch2trt
- torch2trt_dynamic
- TRTorch
2. Capture detection deployment process
The crawl detection network used here comes from my previous paper :High-performance Pixel-level Grasp Detection based on Adaptive Grasping and Grasp-aware Network. This method has achieved 99.09% Grasp detection accuracy , And in the actual multi object stacking scene 95.71% Capture success rate , The experimental demonstration video is in youtube On :https://www.youtube.com/watch?v=KUa3XlVwDsU. Thesis download address :https://www.techrxiv.org/articles/preprint/High-performance_Pixel-level_Grasp_Detection_based_on_Adaptive_Grasping_and_Grasp-aware_Network/14680455
Deploy TensorRT Need to install pytorch、tensorRT、ONNX Such dependence , The specific installation methods are quite detailed on the Internet , Here is only the version information I used :
ubuntu: 16.06
TensorRT: 7.0.0
ONNX IR version: 0.0.4
Opset version: 10
Producer name: pytorch 1.2.0
GPU: TITAN Xp
CUDA: 10.0
Driver Version: 430.14
have access to python perhaps C++ Deployment , Here I use C++.
2.1 take pytorch Network generation onnx file
pytorch Provides generation onnx Model approach , The code is as follows :
import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torch.load("model_path") # pytorch Model loading
model.eval()
x = torch.randn((1, 3, 320, 320)) # Generating tensor
x = x.to(device)
torch.onnx.export(model,
x,
"ckpt/sgdn.onnx",
verbose =True,
opset_version=10,
do_constant_folding=True, # Whether to perform constant folding optimization
input_names=["input"], # Enter the name
output_names=["output_able", "output_angle", "output_width"]) # Output name
At the time of generation , There is one caveat , Do not use interpolation upsampling in the network , Otherwise, in the tensorRT Reasoning will report errors , Use torch.nn.UpsamplingNearest2d() Instead of interpolation upsampling . Discussion on this issue :https://github.com/NVIDIA/TensorRT/issues/284.
onnx Files can be downloaded on my Google disk :
https://drive.google.com/file/d/1AGyjRTWIw85ctwP6VsBDCmR0mE8NdRLu/view?usp=sharing
Use python Of onnx Package check sgdn.onnx Whether it works , The procedure is as follows :
import onnx
model_path = 'sgdn.onnx'
# Verify the validity of the model
onnx_model = onnx.load(model_path)
onnx.checker.check_model(onnx_model)
Use netron Tools to view the network architecture and the input and output shapes of the network , online netron Address the following :
https://netron.app/
Here's a screenshot :
2.2 Generate txt Format image data
Normally , To use opencv Read images , Or by ROS The system subscribes to images , This is for testing purposes , Convert the image into txt Format . Since the input size of the grab detection network is ( b a t c h , 3 , 320 , 320 ) (batch,3,320,320) (batch,3,320,320), So first crop the middle of the image ( 320 , 320 ) (320,320) (320,320) Area , Then save the pixel value to txt file . Storage per row 320 It's worth , common 320*3 That's ok , among , front 320 Behavior B passageway , The following is in order G and R passageway .
The program is in github download :
https://github.com/dexin-wang/tensorRT_SGDN/tree/main/create_txt
adopt python3 create_txt.py Generate txt file .
2.3 TensorRT Reasoning
Because it's still in the testing phase , So my C++ The procedure is in TensorRT The official sample code is changed , In my github Can be downloaded from :
https://github.com/dexin-wang/tensorRT_SGDN/tree/main/sampleOnnxSGDN
Follow the online tutorial to install TensorRT after , Link the sampleOnnxSGDN Put the folder in /home/.../TensorRT-7.0.0.11/samples/ in , And in /home/.../TensorRT-7.0.0.11/samples/Makefile File first 39 In line , Add a sampleOnnxSGDN:
samples=... sampleOnnxSGDN ...
then , Will download sgdn.onnx Put it in /home/.../TensorRT-7.0.0.11/samples/sampleOnnxSGDN/data/ Under the folder . in addition , You may need to modify the file path involved in the program .
such , You can compile and run .
compile
cd /home/.../TensorRT-7.0.0.11/samples/sampleOnnxSGDN/
make
After compilation , stay /home/.../TensorRT-7.0.0.11/bin Under the path , Two files generated :
sample_onnx_sgdn
sample_onnx_sgdn_debug
function
cd /home/.../TensorRT-7.0.0.11/bin
./sample_onnx_sgdn
If the following goes well , You can see the results :
[07/01/2021-10:14:08] [I] Building and running a GPU inference engine for Onnx MNIST
----------------------------------------------------------------
Input filename: /home/wangdx/tensorRT/TensorRT-7.0.0.11/samples/sampleOnnxSGDN/data/sgdn.onnx
ONNX IR version: 0.0.4
Opset version: 10
Producer name: pytorch
Producer version: 1.2
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[07/01/2021-10:14:13] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increa
[07/01/2021-10:14:51] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[07/01/2021-10:14:51] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqu
[07/01/2021-10:14:51] [I] Output:
[07/01/2021-10:14:51] [I] (row, col) = 233, 187
confidence = 0.996655
&&&& PASSED TensorRT.sample_onnx_sgdn # ./sample_onnx_sgdn
The result shows , stay ( 233 , 187 ) (233,187) (233,187) The confidence of the grab point at the position is the highest , The confidence level is 0.996655. Because the middle of the image was cropped at the beginning ( 320 , 320 ) (320,320) (320,320), So in the original picture , The predicted position of the grab point is ( 233 + 80 , 187 + 160 ) = ( 313 , 347 ) (233+80,187+160)=(313,347) (233+80,187+160)=(313,347). There is no code in the code to analyze the grab angle and grab width , Later I will update the code and release .
3. Error report summary
Report errors 1:
onnx->tensorRT when
While parsing node number 360 [Resize]:
ERROR: ModelImporter.cpp:124 In function parseGraph:
[5] Assertion failed: ctx->tensors().count(inputName)
solve : Do not use bilinear interpolation , Use nn.UpsamplingNearest2d((h,w))
Report errors 2:
[06/30/2021-17:22:19] [E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[06/30/2021-17:22:19] [E] [TRT] Network validation failed.
&&&& FAILED TensorRT.sample_onnx_sgdn # ./sample_onnx_sgdn
solve : In the generation will pytorch To onnx when , Don't set dynamic_axes. Check the correct method : stay netron Network input shape yes ( 1 , 3 , 320 , 320 ) (1,3,320,320) (1,3,320,320). instead of ( b a t c h _ s i z e , 3 , 320 , 320 ) (batch\_size,3,320,320) (batch_size,3,320,320).
Report errors 3:
Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[06/30/2021-17:52:05] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[06/30/2021-17:52:06] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Segmentation fault (core dumped)
solve : Error reading binary file , Read instead txt.
4. Reference resources
https://zhuanlan.zhihu.com/p/371239130
https://zhuanlan.zhihu.com/p/348301573
边栏推荐
- Audio Focus Series: write a demo to understand audio focus and audiomananger
- 1095 cars on campus (30 points)
- 聊聊如何利用p6spy进行sql监控
- [set theory] relation properties (transitivity | transitivity examples | transitivity related theorems)
- 1110 complete binary tree (25 points)
- Dynamic programming - related concepts, (tower problem)
- appium1.22.x 版本后的 appium inspector 需单独安装
- Self introduction and objectives
- Covering Safari and edge, almost all mainstream browsers have realized webgl 2.0 support
- Three representations of signed numbers: original code, inverse code and complement code
猜你喜欢
![[research materials] 2021 China's game industry brand report - Download attached](/img/b7/a377b0b7c742078e2feb28ebfbca62.jpg)
[research materials] 2021 China's game industry brand report - Download attached

Webrtc protocol introduction -- an article to understand ice, stun, NAT, turn

Technical analysis of qianyuantong multi card aggregation router
![[research materials] 2022q1 game preferred casual game distribution circular - Download attached](/img/13/5a67c5d08131745759fdc70a71cf0f.jpg)
[research materials] 2022q1 game preferred casual game distribution circular - Download attached
![[basic grammar] Snake game written in C language](/img/cb/83631ef3ccd7047ca42d33dc49bf90.jpg)
[basic grammar] Snake game written in C language

【批处理DOS-CMD命令-汇总和小结】-CMD窗口的设置与操作命令-关闭cmd窗口、退出cmd环境(exit、exit /b、goto :eof)

BTC-密码学原理

Handler understands the record

XML配置文件

Skip table: principle introduction, advantages and disadvantages of skiplist
随机推荐
Technical analysis of qianyuantong multi card aggregation router
6.23星期四库作业
Disassembly and installation of Lenovo r7000 graphics card
JQ style, element operation, effect, filtering method and transformation, event object
Botu uses peek and poke for IO mapping
Hotel public broadcasting background music - Design of hotel IP network broadcasting system based on Internet +
2022-02-12 daily clock in: problem fine brush
Redis breakdown penetration avalanche
ES7 easy mistakes in index creation
Basic introduction of redis and explanation of eight types and transactions
Explanation of several points needing attention in final (tested by the author)
[backtrader source code analysis 4] use Python to rewrite the first function of backtrader: time2num, which improves the efficiency by 2.2 times
最大连续子段和(动态规划,递归,递推)
1106 lowest price in supply chain (25 points)
C language program ideas and several commonly used filters
编译GCC遇到的“pthread.h” not found问题
1111 online map (30 points)
Chapter II program design of circular structure
1103 integer factorization (30 points)
leetcode406. Rebuild the queue based on height