当前位置:网站首页>Tencent releases the full platform version of reasoning framework TNN, and supports mobile terminal, desktop terminal and server terminal at the same time
Tencent releases the full platform version of reasoning framework TNN, and supports mobile terminal, desktop terminal and server terminal at the same time
2022-06-24 16:20:00 【Youtu Laboratory】
TNN Tencent is a new generation of open-source cross platform deep learning reasoning framework , It is also Tencent's deep learning and acceleration Oteam The open source collaborative results of Yunfan , Led by Tencent Youtu lab , Tencent light and shadow research lab 、 Tencent Cloud Architecture Platform Department 、 Tencent data platform department and other teams work together to develop . after 4 After more than a month of iteration ,TNN The new version v0.3 Official release , It's the first to support mobile terminals at the same time 、 The desktop 、 The full platform open source version of the server .TNN The new version is universal 、 Ease of use 、 Further improvement in performance .
TNN Address :
https://github.com/Tencent/TNN
01
generality
In order to ensure the unity of the model 、 Under the premise of unified interface , Relying on the basic operator support of acceleration framework provided by hardware manufacturers , And handwriting kernel The way to optimize , For mobile terminal 、 The desktop side and the server side provide a variety of different acceleration options , Realized the common CV、NLP Optimization and adaptation of the model .
The hardware platform supports
TNN Through integration OpenVINO and TensorRT The way New server X86 and NVIDIA Hardware support , It can quickly obtain the latest optimization results of hardware manufacturers , It can also add custom implementation based on the structural characteristics of business model to achieve the ultimate performance . At the same time, considering the limitation of desktop applications on the size of the installation package ,TNN adopt JIT And manual optimization to achieve lightweight X86 Back end , The overall library size is only 5MB about .
Model operators support
TNN The new version stay CV Class model support extends to 3D-CNN、LSTM、BERT And so on , The total number of operators is from 88 An increase to 107 individual , The new operators include LSTM、GridSample、Histogram、OneHot、BitShift、Gather、ScatterND、LayerNorm、GroupNorm、GELU、SoftSign、Erf etc. .
02
Ease of use
Dynamic dimension and preprocessing support
TNN Previous versions mainly supported CV Class model , Network input is basically NCHW4 And the value of each dimension is basically unchanged . and NLP In this scenario, the same network will have 0 Dimension to 6 The d , And the value of each dimension changes according to the input . So TNN New input dimension configuration interface , In the model operator 、 Hardware 、 A lot of supplements and improvements have been made to the system support .
API The interface of Mat The related interfaces have been expanded , Including copy filling function (CopyMakeBorder), convenient SDK Developers do network preprocessing and post-processing acceleration . at present TNN Clipping is supported (Crop)、 The zoom (Resize)、 Color space conversion (CvtColor)、 Affine transformation (WarpAffine) And copy fill (CopyMakeBorder) And so on .
Runtime constant collapse
onnx When a model is exported from a model, many adhesive operators are generated to calculate constants and data Shape Information about ,TNN Realized ConstFolder Constant folding function to simplify the model structure and improve the running performance of the model . Compared to open source community tools onnx-simplifier,ConstFolder Added to Israel ATen Support for formal output operators , At the same time, it supports the folding of runtime constants to support the requirement of model variable dimension .TNN At run time, the operators of variable dimension calculation part are extracted separately NAIVE( pure C++) perform , To lighten the hardware device(ARM、Metal、OpenCL) The pressure is realized by the operator of .
Example show
With the support of Tencent Light & shadow and Tencent optical flow team , Hair dyeing and body posture TNN The sample in TNN It has been released in a small version of the intermediate iteration , And show a good algorithm effect . This time with the release of a new version , We added mobile Chinese OCR Examples and desktop side / Background end BERT Reading comprehension examples .
chinese OCR Example adoption chineseocr lite Model , It shows how to detect the position of text box + Text box angle detection + Character recognition 3 A series of models for Chinese character recognition ;
BERT Reading comprehension examples use BERT-Squad10 Model , It shows how to implement a simple question answering system by inputting context and vocabulary in advance . The following is hair dyeing in turn 、 Human posture 、 chinese OCR、BERT Reading comprehension effect display .
03
performance optimization
Mobile performance optimization
Arm performance optimization :
01
armv8.2 Optimize :fp16 Vector instruction optimization , Compared with fp32 Double the expected performance , Except that, like most open source frameworks, it supports arm64 outside , in the light of arm32 The architecture also implements fp16 Instruction Optimization , Give Way 64 Bit and 32 position APP Can play hardware fp16 The ability of vector acceleration ;
02
int8 Optimize : For common operators block The group adopted a more radical fusion strategy , Such as conv+add+activation, It can effectively reduce the cost of quantization and inverse quantization and memory reading and writing , And verified by internal business , While improving the performance, it will not cause the decline of accuracy
OpenCL performance optimization :
01
Core convolution optimization :
a. Memory access performance optimization : Channel Blocking Optimize 、 And local memory (local memory) Optimize and improve memory access performance , Achieve data sharing within the working group ;
b. Computing performance optimization : winograd Algorithm optimization 3x3 Convolution , Addressing computing optimization , The offset of adjacent computing grid shares vector register , Reduce fp32 Calculate the unit pressure ;
02
Working group size optimization : Optimization calculation strategy , And pass Auto-Tuning Choose the best team size ;
03
Preprocessing / Post processing optimization : Use buffer Do parameter caching , Reduce GPU Copy overhead .
The desktop / Server performance optimization
TNN Server side Through integration OpenVINO and TensorRT In this way, the server is added X86 and NVIDIA Hardware support , It can quickly obtain the latest optimization results of hardware manufacturers , It can also add custom implementation based on the structural characteristics of business model to achieve the ultimate performance . Unified framework with industry server onnxruntime Compared with the best performance version ,TNN The current in CV The class model has some advantages , and onnxruntime stay NLP The class model has some advantages .TNN I just started to support NLP Model , We will continue to optimize in the future .
TNN The desktop In order to balance high performance and hardware compatibility , Also consider applications App Limitation on package size , adopt JIT And manual optimization to achieve lightweight X86 Back end , Support SSE41、SSE42、AVX、AVX2、FMA And so on . comparison onnxruntime Server Library 80MB,TNN The overall size of the desktop library is only 5MB about , And the performance gap is 20% within .
04
Conclusion
TNN Our goal is to be a platform supported AI The frame of reasoning , In collaboration with partners, we will continue to output to the hardware platforms (ARM、X86、NVIDIA etc. ) Adaptation and optimization of , Coming soon !
Click on 【 Read the original 】, Can get TNN Open source address .
Past highlights
Tencent pictures ncnn Won the 2020 Annual Top 10 new open source projects !
Break two World Records , Tencent Youtu open source video motion detection algorithm DBG
The background to reply “ The group of ”
Join the Youtu community
边栏推荐
- 2021-04-22: given many line segments, each line segment has two numbers [start, end],
- April 26, 2021: the length of the integer array arr is n (3 < = n < = 10^4), and each number is
- MySQL date timestamp conversion
- Golang+redis reentrant lock
- [tke] analysis of CLB loopback in Intranet under IPVS forwarding mode
- MySQL InnoDB and MyISAM
- D. Solve The Maze(思维+bfs)Codeforces Round #648 (Div. 2)
- Global and Chinese market of computer protective film 2022-2028: Research Report on technology, participants, trends, market size and share
- Go deep into the implementation principle of go language defer
- MySQL Innodb和Myisam
猜你喜欢

Cognition and difference of service number, subscription number, applet and enterprise number (enterprise wechat)
Advanced programmers must know and master. This article explains in detail the principle of MySQL master-slave synchronization

Using alicloud RDS for SQL Server Performance insight to optimize database load - first understanding of performance insight

我与“Apifox”的网络情缘

C. K-th Not Divisible by n(数学+思维) Codeforces Round #640 (Div. 4)

一文详解JackSon配置信息

Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021

Ps\ai and other design software pondering notes

ZOJ - 4104 sequence in the pocket

【应用推荐】最近大火的Apifox & Apipost 上手体验与选型建议
随机推荐
MySQL Advanced Series: Locks - Locks in InnoDB
Bitwise Operators
2021-05-03: given a non negative integer num, how to avoid circular statements,
Siggraph 2022 | truly restore the hand muscles. This time, the digital human hands have bones, muscles and skin
How to easily realize online karaoke room and sing "mountain sea" with Wang Xinling
@There is a free copyright protection service for enterprises in Dawan District
Using alicloud RDS for SQL Server Performance insight to optimize database load - first understanding of performance insight
Summer Challenge harmonyos - to do list with date effect
mysql时间戳格式转换日期格式字符串
D. Solve The Maze(思维+bfs)Codeforces Round #648 (Div. 2)
Percona Toolkit series - Pt deadlock logger
Snowflake algorithm implemented in go language
Leetcode notes of Google boss | necessary for school recruitment!
2021-05-04: given a non negative integer C, you need to judge whether there are two integers a and B, so that a*a+b*b=c.
There are potential safety hazards Land Rover recalls some hybrid vehicles
A troubleshooting of golang memory leak
Introduction to new features of ECMAScript 2019 (ES10)
great! The novel website project is completely open source
Some experiences of project K several operations in the global template
[go] runtime package for concurrent programming and its common methods