当前位置:网站首页>[tensorrt] video swing transformer deployment
[tensorrt] video swing transformer deployment
2022-06-22 01:31:00 【MaxeeoveCR】
1. TensorRT(.engine) python Interface reasoning
Code
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
def do_inference(context, bindings, inputs, outputs, stream, ctx, batch_size=1):
# Transfer input data to the GPU.
'''
Initialize cuda to avoid error:
[TensorRT] ERROR: ../rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400
(invalid resource handle)
Solution ref.: https://blog.csdn.net/yiyayi1/article/details/111314520
```
ctx.push()
{your inference code}
ctx.pop()
```
'''
ctx.push()
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
ctx.pop()
# Return only the host outputs.
return [out.host for out in outputs]
def inf_trt(engine_path, data_loader):
# Initialize cuda
cuda.init()
ctx = cuda.Device(0).make_context()
# Load engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
runtime = trt.Runtime(TRT_LOGGER)
engine = None
with open(engine_path, mode='rb') as f:
engine_bytes = f.read()
engine = runtime.deserialize_cuda_engine(engine_bytes)
# Allocate inputs/outputs/stream buffers
inputs, outputs, bindings, stream = allocate_buffers(engine)
# Fetch context
context = engine.create_execution_context()
# get output tensor
dataset = data_loader.dataset
# TensorRT inference
results = []
for data_idx, data in enumerate(data_loader):
# Inference
result_buffer = do_inference(context, bindings, inputs, outputs, stream, ctx)
result = copy.deepcopy(result_buffer)
results.append(result_info)
del engine
ctx.pop()
del context
del stream
del inputs
del outputs
return results
2. common problem ( There are many pits )
[TensorRT] ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)
resolvent
Reference: https://github.com/NVIDIA/TensorRT/issues/1107
# Initialize
cuda.init()
ctx = cuda.Device(0).make_context()
...
# Inference
ctx.push()
{your inference code}
ctx.pop()
PyCUDA ERROR: The context stack was not empty upon module cleanup.
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
resolvent
Reference: https://github.com/NVIDIA/TensorRT/issues/1107
stay del engine after , add ctx.pop(), Otherwise, the above error will be reported
del engine
ctx.pop()
del context
del stream
del inputs
del outputs
3. Go on
Praise TRT 8.+, transformer The acceleration of the module is like taking off
Others to be added …
边栏推荐
- Idea prompt 'optional Get() 'without' ispresent() 'check error.
- [解决方案] 明厨亮灶视频边缘计算网关解决方案
- crf*. Handling of large BDB file
- 容器云是什么意思?与堡垒机有什么区别?
- Conversion between three file handles
- IDEA 提示 Duplicated code fragment (15 lines long)
- [environment stepping on the pit] open the picture with OpenCV and report an error
- 【NOI模拟赛】区间距离(分块,卷积)
- 想加入大厂?看这篇文章也许会帮助到你
- [dailyfresh] problems encountered in sending activation email
猜你喜欢

SparkRDD 案例:计算总成绩

Shardingsphere-proxy-5.0.0 implementation of distributed hash modulo fragmentation (4)

Documenter l'utilisation de webcraper

Making unequal interval histogram with Matplotlib
![[redis] event driven framework source code analysis (single thread)](/img/72/ae961423832f217324007c81b6f9e5.png)
[redis] event driven framework source code analysis (single thread)

ROS 2 驱动程序现在可用于 ABB 的机械臂
![[dailyfresh] course record 3 -- product search related](/img/54/e1dbc6c2a6bd6d1f39de0f55d79976.png)
[dailyfresh] course record 3 -- product search related

Sending webhook of message queue to realize cross application asynchronous callback

動態規劃-01背包,分割等和子集,最後一塊石頭的重量

Idea prompt duplicated code fragment (15 lines long)
随机推荐
[工程构建] cmake创建Release和Debug工程
pm2 的学习
容器云是什么意思?与堡垒机有什么区别?
Mysql database high version low version
yolov3 3D 语义点云paper阅读
ShardingSphere-proxy-5.0.0分布式哈希取模分片实现(四)
Handwriting database connection pool
[redis] install redis in Ubuntu and the basic usage and configuration of redis
Pytorch神经网络【手写数字识别】
Brief introduction to jpom: simple and light low intrusive online construction, automatic deployment, daily operation and maintenance, and project monitoring software
52 classes 110 common components and frameworks
Simple sorting of RNN
站在数字化风口,工装企业如何“飞起来”
4273. linked list consolidation
Some introduction and transplantation of lvgl
Planification dynamique - 01 sac à dos, partitions et sous - ensembles, poids de la dernière pierre
Broadening - simple strategy test
云堡垒机分布式集群部署优缺点简单说明
Pat (a) - 1001 a+b format
【ÑÖÏ模拟赛】花萎(矩阵加速,循环卷积,高斯消元)