当前位置:网站首页>Deploy crawl detection network using tensorrt (I)
Deploy crawl detection network using tensorrt (I)
2022-07-03 05:17:00 【Qianyu QY】
1. TensorRT brief introduction
tensorRT Yes, you can. NVIDIA Various GP U One running under C++ The frame of reasoning . We use Pytorch、TF Trained models , Can be converted to TensorRT The format of , And then use TensorRT The inference engine runs this model , So as to improve the model in NVIDIA GPU Running speed on , Generally, it can be increased by several times ~ Dozens of times .
Mainstream pytorch Deployment path :
- pytorch → \rightarrow → ONNX → \rightarrow → TensorRT
- torch2trt
- torch2trt_dynamic
- TRTorch
2. Capture detection deployment process
The crawl detection network used here comes from my previous paper :High-performance Pixel-level Grasp Detection based on Adaptive Grasping and Grasp-aware Network. This method has achieved 99.09% Grasp detection accuracy , And in the actual multi object stacking scene 95.71% Capture success rate , The experimental demonstration video is in youtube On :https://www.youtube.com/watch?v=KUa3XlVwDsU. Thesis download address :https://www.techrxiv.org/articles/preprint/High-performance_Pixel-level_Grasp_Detection_based_on_Adaptive_Grasping_and_Grasp-aware_Network/14680455
Deploy TensorRT Need to install pytorch、tensorRT、ONNX Such dependence , The specific installation methods are quite detailed on the Internet , Here is only the version information I used :
ubuntu: 16.06
TensorRT: 7.0.0
ONNX IR version: 0.0.4
Opset version: 10
Producer name: pytorch 1.2.0
GPU: TITAN Xp
CUDA: 10.0
Driver Version: 430.14
have access to python perhaps C++ Deployment , Here I use C++.
2.1 take pytorch Network generation onnx file
pytorch Provides generation onnx Model approach , The code is as follows :
import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torch.load("model_path") # pytorch Model loading
model.eval()
x = torch.randn((1, 3, 320, 320)) # Generating tensor
x = x.to(device)
torch.onnx.export(model,
x,
"ckpt/sgdn.onnx",
verbose =True,
opset_version=10,
do_constant_folding=True, # Whether to perform constant folding optimization
input_names=["input"], # Enter the name
output_names=["output_able", "output_angle", "output_width"]) # Output name
At the time of generation , There is one caveat , Do not use interpolation upsampling in the network , Otherwise, in the tensorRT Reasoning will report errors , Use torch.nn.UpsamplingNearest2d() Instead of interpolation upsampling . Discussion on this issue :https://github.com/NVIDIA/TensorRT/issues/284.
onnx Files can be downloaded on my Google disk :
https://drive.google.com/file/d/1AGyjRTWIw85ctwP6VsBDCmR0mE8NdRLu/view?usp=sharing
Use python Of onnx Package check sgdn.onnx Whether it works , The procedure is as follows :
import onnx
model_path = 'sgdn.onnx'
# Verify the validity of the model
onnx_model = onnx.load(model_path)
onnx.checker.check_model(onnx_model)
Use netron Tools to view the network architecture and the input and output shapes of the network , online netron Address the following :
https://netron.app/
Here's a screenshot :
2.2 Generate txt Format image data
Normally , To use opencv Read images , Or by ROS The system subscribes to images , This is for testing purposes , Convert the image into txt Format . Since the input size of the grab detection network is ( b a t c h , 3 , 320 , 320 ) (batch,3,320,320) (batch,3,320,320), So first crop the middle of the image ( 320 , 320 ) (320,320) (320,320) Area , Then save the pixel value to txt file . Storage per row 320 It's worth , common 320*3 That's ok , among , front 320 Behavior B passageway , The following is in order G and R passageway .
The program is in github download :
https://github.com/dexin-wang/tensorRT_SGDN/tree/main/create_txt
adopt python3 create_txt.py Generate txt file .
2.3 TensorRT Reasoning
Because it's still in the testing phase , So my C++ The procedure is in TensorRT The official sample code is changed , In my github Can be downloaded from :
https://github.com/dexin-wang/tensorRT_SGDN/tree/main/sampleOnnxSGDN
Follow the online tutorial to install TensorRT after , Link the sampleOnnxSGDN Put the folder in /home/.../TensorRT-7.0.0.11/samples/ in , And in /home/.../TensorRT-7.0.0.11/samples/Makefile File first 39 In line , Add a sampleOnnxSGDN:
samples=... sampleOnnxSGDN ...
then , Will download sgdn.onnx Put it in /home/.../TensorRT-7.0.0.11/samples/sampleOnnxSGDN/data/ Under the folder . in addition , You may need to modify the file path involved in the program .
such , You can compile and run .
compile
cd /home/.../TensorRT-7.0.0.11/samples/sampleOnnxSGDN/
make
After compilation , stay /home/.../TensorRT-7.0.0.11/bin Under the path , Two files generated :
sample_onnx_sgdn
sample_onnx_sgdn_debug
function
cd /home/.../TensorRT-7.0.0.11/bin
./sample_onnx_sgdn
If the following goes well , You can see the results :
[07/01/2021-10:14:08] [I] Building and running a GPU inference engine for Onnx MNIST
----------------------------------------------------------------
Input filename: /home/wangdx/tensorRT/TensorRT-7.0.0.11/samples/sampleOnnxSGDN/data/sgdn.onnx
ONNX IR version: 0.0.4
Opset version: 10
Producer name: pytorch
Producer version: 1.2
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[07/01/2021-10:14:13] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increa
[07/01/2021-10:14:51] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[07/01/2021-10:14:51] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqu
[07/01/2021-10:14:51] [I] Output:
[07/01/2021-10:14:51] [I] (row, col) = 233, 187
confidence = 0.996655
&&&& PASSED TensorRT.sample_onnx_sgdn # ./sample_onnx_sgdn
The result shows , stay ( 233 , 187 ) (233,187) (233,187) The confidence of the grab point at the position is the highest , The confidence level is 0.996655. Because the middle of the image was cropped at the beginning ( 320 , 320 ) (320,320) (320,320), So in the original picture , The predicted position of the grab point is ( 233 + 80 , 187 + 160 ) = ( 313 , 347 ) (233+80,187+160)=(313,347) (233+80,187+160)=(313,347). There is no code in the code to analyze the grab angle and grab width , Later I will update the code and release .
3. Error report summary
Report errors 1:
onnx->tensorRT when
While parsing node number 360 [Resize]:
ERROR: ModelImporter.cpp:124 In function parseGraph:
[5] Assertion failed: ctx->tensors().count(inputName)
solve : Do not use bilinear interpolation , Use nn.UpsamplingNearest2d((h,w))
Report errors 2:
[06/30/2021-17:22:19] [E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[06/30/2021-17:22:19] [E] [TRT] Network validation failed.
&&&& FAILED TensorRT.sample_onnx_sgdn # ./sample_onnx_sgdn
solve : In the generation will pytorch To onnx when , Don't set dynamic_axes. Check the correct method : stay netron Network input shape yes ( 1 , 3 , 320 , 320 ) (1,3,320,320) (1,3,320,320). instead of ( b a t c h _ s i z e , 3 , 320 , 320 ) (batch\_size,3,320,320) (batch_size,3,320,320).
Report errors 3:
Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[06/30/2021-17:52:05] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[06/30/2021-17:52:06] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Segmentation fault (core dumped)
solve : Error reading binary file , Read instead txt.
4. Reference resources
https://zhuanlan.zhihu.com/p/371239130
https://zhuanlan.zhihu.com/p/348301573
边栏推荐
- 穀歌 | 蛋白序列的深度嵌入和比對
- Kept hot standby and haproxy
- Self introduction and objectives
- Chapter II program design of circular structure
- 2022-02-11 daily clock in: problem fine brush
- 3dslam with 16 line lidar and octomap
- Burp suite plug-in based on actual combat uses tips
- Explanation of variables, code blocks, constructors, static variables and initialization execution sequence of static code blocks of Ali interview questions
- Shuttle + alluxio accelerated memory shuffle take-off
- JQ style, element operation, effect, filtering method and transformation, event object
猜你喜欢
![[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached](/img/95/833f5ec20207ee5d7e6cdfa7208c5e.jpg)
[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached

微服务常见面试题

乾元通多卡聚合路由器的技术解析

"Hands on deep learning" pytorch edition Chapter II exercise

Appium 1.22. L'Inspecteur appium après la version X doit être installé séparément

JS scope

Pan details of deep learning

Disassembly and installation of Lenovo r7000 graphics card

Ueditor, FCKeditor, kindeditor editor vulnerability

Online VR model display - 3D visual display solution
随机推荐
【批处理DOS-CMD命令-汇总和小结】-CMD窗口的设置与操作命令-关闭cmd窗口、退出cmd环境(exit、exit /b、goto :eof)
[research materials] 2022q1 game preferred casual game distribution circular - Download attached
Pessimistic lock and optimistic lock of multithreading
Congratulations to musk and NADELLA on their election as academicians of the American Academy of engineering, and Zhang Hongjiang and Fang daining on their election as foreign academicians
Go practice - gorilla / handlers used by gorilla web Toolkit
谷歌 | 蛋白序列的深度嵌入和比对
微服务常见面试题
Kept hot standby and haproxy
Dynamic programming - related concepts, (tower problem)
乾元通多卡聚合路由器的技术解析
Why is go language particularly popular in China
Shallow and first code
1103 integer factorization (30 points)
@Autowired 导致空指针报错 解决方式
[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached
[set theory] relation properties (transitivity | transitivity examples | transitivity related theorems)
Webapidom get page elements
study hard and make progress every day
Overview of basic knowledge of C language
1087 all roads lead to Rome (30 points)