当前位置：网站首页>Tencent releases the full platform version of reasoning framework TNN, and supports mobile terminal, desktop terminal and server terminal at the same time

Tencent releases the full platform version of reasoning framework TNN, and supports mobile terminal, desktop terminal and server terminal at the same time

2022-06-24 16:20:00 【Youtu Laboratory】

TNN Tencent is a new generation of open-source cross platform deep learning reasoning framework , It is also Tencent's deep learning and acceleration Oteam The open source collaborative results of Yunfan , Led by Tencent Youtu lab , Tencent light and shadow research lab 、 Tencent Cloud Architecture Platform Department 、 Tencent data platform department and other teams work together to develop . after 4 After more than a month of iteration ,TNN The new version v0.3 Official release , It's the first to support mobile terminals at the same time 、 The desktop 、 The full platform open source version of the server .TNN The new version is universal 、 Ease of use 、 Further improvement in performance .

TNN Address ：

https://github.com/Tencent/TNN

generality

In order to ensure the unity of the model 、 Under the premise of unified interface , Relying on the basic operator support of acceleration framework provided by hardware manufacturers , And handwriting kernel The way to optimize , For mobile terminal 、 The desktop side and the server side provide a variety of different acceleration options , Realized the common CV、NLP Optimization and adaptation of the model .

The hardware platform supports

TNN Through integration OpenVINO and TensorRT The way New server X86 and NVIDIA Hardware support , It can quickly obtain the latest optimization results of hardware manufacturers , It can also add custom implementation based on the structural characteristics of business model to achieve the ultimate performance . At the same time, considering the limitation of desktop applications on the size of the installation package ,TNN adopt JIT And manual optimization to achieve lightweight X86 Back end , The overall library size is only 5MB about .

Model operators support

TNN The new version stay CV Class model support extends to 3D-CNN、LSTM、BERT And so on , The total number of operators is from 88 An increase to 107 individual , The new operators include LSTM、GridSample、Histogram、OneHot、BitShift、Gather、ScatterND、LayerNorm、GroupNorm、GELU、SoftSign、Erf etc. .

Ease of use

Dynamic dimension and preprocessing support

TNN Previous versions mainly supported CV Class model , Network input is basically NCHW4 And the value of each dimension is basically unchanged . and NLP In this scenario, the same network will have 0 Dimension to 6 The d , And the value of each dimension changes according to the input . So TNN New input dimension configuration interface , In the model operator 、 Hardware 、 A lot of supplements and improvements have been made to the system support .

API The interface of Mat The related interfaces have been expanded , Including copy filling function (CopyMakeBorder), convenient SDK Developers do network preprocessing and post-processing acceleration . at present TNN Clipping is supported (Crop)、 The zoom (Resize)、 Color space conversion (CvtColor)、 Affine transformation (WarpAffine) And copy fill (CopyMakeBorder) And so on .

Runtime constant collapse

onnx When a model is exported from a model, many adhesive operators are generated to calculate constants and data Shape Information about ,TNN Realized ConstFolder Constant folding function to simplify the model structure and improve the running performance of the model . Compared to open source community tools onnx-simplifier,ConstFolder Added to Israel ATen Support for formal output operators , At the same time, it supports the folding of runtime constants to support the requirement of model variable dimension .TNN At run time, the operators of variable dimension calculation part are extracted separately NAIVE（ pure C++） perform , To lighten the hardware device（ARM、Metal、OpenCL） The pressure is realized by the operator of .

Example show

With the support of Tencent Light & shadow and Tencent optical flow team , Hair dyeing and body posture TNN The sample in TNN It has been released in a small version of the intermediate iteration , And show a good algorithm effect . This time with the release of a new version , We added mobile Chinese OCR Examples and desktop side / Background end BERT Reading comprehension examples .

chinese OCR Example adoption chineseocr lite Model , It shows how to detect the position of text box + Text box angle detection + Character recognition 3 A series of models for Chinese character recognition ;

BERT Reading comprehension examples use BERT-Squad10 Model , It shows how to implement a simple question answering system by inputting context and vocabulary in advance . The following is hair dyeing in turn 、 Human posture 、 chinese OCR、BERT Reading comprehension effect display .

performance optimization

Mobile performance optimization

Arm performance optimization :

armv8.2 Optimize ：fp16 Vector instruction optimization , Compared with fp32 Double the expected performance , Except that, like most open source frameworks, it supports arm64 outside , in the light of arm32 The architecture also implements fp16 Instruction Optimization , Give Way 64 Bit and 32 position APP Can play hardware fp16 The ability of vector acceleration ;

int8 Optimize ： For common operators block The group adopted a more radical fusion strategy , Such as conv+add+activation, It can effectively reduce the cost of quantization and inverse quantization and memory reading and writing , And verified by internal business , While improving the performance, it will not cause the decline of accuracy

OpenCL performance optimization ：

Core convolution optimization ：

a. Memory access performance optimization : Channel Blocking Optimize 、 And local memory (local memory) Optimize and improve memory access performance , Achieve data sharing within the working group ;

b. Computing performance optimization : winograd Algorithm optimization 3x3 Convolution , Addressing computing optimization , The offset of adjacent computing grid shares vector register , Reduce fp32 Calculate the unit pressure ;

Working group size optimization : Optimization calculation strategy , And pass Auto-Tuning Choose the best team size ;

Preprocessing / Post processing optimization : Use buffer Do parameter caching , Reduce GPU Copy overhead .

The desktop / Server performance optimization

TNN Server side Through integration OpenVINO and TensorRT In this way, the server is added X86 and NVIDIA Hardware support , It can quickly obtain the latest optimization results of hardware manufacturers , It can also add custom implementation based on the structural characteristics of business model to achieve the ultimate performance . Unified framework with industry server onnxruntime Compared with the best performance version ,TNN The current in CV The class model has some advantages , and onnxruntime stay NLP The class model has some advantages .TNN I just started to support NLP Model , We will continue to optimize in the future .

TNN The desktop In order to balance high performance and hardware compatibility , Also consider applications App Limitation on package size , adopt JIT And manual optimization to achieve lightweight X86 Back end , Support SSE41、SSE42、AVX、AVX2、FMA And so on . comparison onnxruntime Server Library 80MB,TNN The overall size of the desktop library is only 5MB about , And the performance gap is 20% within .

Conclusion

TNN Our goal is to be a platform supported AI The frame of reasoning , In collaboration with partners, we will continue to output to the hardware platforms (ARM、X86、NVIDIA etc. ) Adaptation and optimization of , Coming soon ！

Click on 【 Read the original 】, Can get TNN Open source address .

Past highlights

Tencent pictures ncnn Won the 2020 Annual Top 10 new open source projects ！

Tencent Youtu open source deep learning inference framework TNN, help AI Development reduces cost and increases efficiency

Break two World Records , Tencent Youtu open source video motion detection algorithm DBG

The background to reply “ The group of ”

Join the Youtu community

原网站

版权声明
本文为[Youtu Laboratory]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/04/20210428175240255I.html