当前位置:网站首页>Tencent releases the full platform version of reasoning framework TNN, and supports mobile terminal, desktop terminal and server terminal at the same time
Tencent releases the full platform version of reasoning framework TNN, and supports mobile terminal, desktop terminal and server terminal at the same time
2022-06-24 16:20:00 【Youtu Laboratory】
TNN Tencent is a new generation of open-source cross platform deep learning reasoning framework , It is also Tencent's deep learning and acceleration Oteam The open source collaborative results of Yunfan , Led by Tencent Youtu lab , Tencent light and shadow research lab 、 Tencent Cloud Architecture Platform Department 、 Tencent data platform department and other teams work together to develop . after 4 After more than a month of iteration ,TNN The new version v0.3 Official release , It's the first to support mobile terminals at the same time 、 The desktop 、 The full platform open source version of the server .TNN The new version is universal 、 Ease of use 、 Further improvement in performance .
TNN Address :
https://github.com/Tencent/TNN
01
generality
In order to ensure the unity of the model 、 Under the premise of unified interface , Relying on the basic operator support of acceleration framework provided by hardware manufacturers , And handwriting kernel The way to optimize , For mobile terminal 、 The desktop side and the server side provide a variety of different acceleration options , Realized the common CV、NLP Optimization and adaptation of the model .
The hardware platform supports
TNN Through integration OpenVINO and TensorRT The way New server X86 and NVIDIA Hardware support , It can quickly obtain the latest optimization results of hardware manufacturers , It can also add custom implementation based on the structural characteristics of business model to achieve the ultimate performance . At the same time, considering the limitation of desktop applications on the size of the installation package ,TNN adopt JIT And manual optimization to achieve lightweight X86 Back end , The overall library size is only 5MB about .
Model operators support
TNN The new version stay CV Class model support extends to 3D-CNN、LSTM、BERT And so on , The total number of operators is from 88 An increase to 107 individual , The new operators include LSTM、GridSample、Histogram、OneHot、BitShift、Gather、ScatterND、LayerNorm、GroupNorm、GELU、SoftSign、Erf etc. .
02
Ease of use
Dynamic dimension and preprocessing support
TNN Previous versions mainly supported CV Class model , Network input is basically NCHW4 And the value of each dimension is basically unchanged . and NLP In this scenario, the same network will have 0 Dimension to 6 The d , And the value of each dimension changes according to the input . So TNN New input dimension configuration interface , In the model operator 、 Hardware 、 A lot of supplements and improvements have been made to the system support .
API The interface of Mat The related interfaces have been expanded , Including copy filling function (CopyMakeBorder), convenient SDK Developers do network preprocessing and post-processing acceleration . at present TNN Clipping is supported (Crop)、 The zoom (Resize)、 Color space conversion (CvtColor)、 Affine transformation (WarpAffine) And copy fill (CopyMakeBorder) And so on .
Runtime constant collapse
onnx When a model is exported from a model, many adhesive operators are generated to calculate constants and data Shape Information about ,TNN Realized ConstFolder Constant folding function to simplify the model structure and improve the running performance of the model . Compared to open source community tools onnx-simplifier,ConstFolder Added to Israel ATen Support for formal output operators , At the same time, it supports the folding of runtime constants to support the requirement of model variable dimension .TNN At run time, the operators of variable dimension calculation part are extracted separately NAIVE( pure C++) perform , To lighten the hardware device(ARM、Metal、OpenCL) The pressure is realized by the operator of .
Example show
With the support of Tencent Light & shadow and Tencent optical flow team , Hair dyeing and body posture TNN The sample in TNN It has been released in a small version of the intermediate iteration , And show a good algorithm effect . This time with the release of a new version , We added mobile Chinese OCR Examples and desktop side / Background end BERT Reading comprehension examples .
chinese OCR Example adoption chineseocr lite Model , It shows how to detect the position of text box + Text box angle detection + Character recognition 3 A series of models for Chinese character recognition ;
BERT Reading comprehension examples use BERT-Squad10 Model , It shows how to implement a simple question answering system by inputting context and vocabulary in advance . The following is hair dyeing in turn 、 Human posture 、 chinese OCR、BERT Reading comprehension effect display .
03
performance optimization
Mobile performance optimization
Arm performance optimization :
01
armv8.2 Optimize :fp16 Vector instruction optimization , Compared with fp32 Double the expected performance , Except that, like most open source frameworks, it supports arm64 outside , in the light of arm32 The architecture also implements fp16 Instruction Optimization , Give Way 64 Bit and 32 position APP Can play hardware fp16 The ability of vector acceleration ;
02
int8 Optimize : For common operators block The group adopted a more radical fusion strategy , Such as conv+add+activation, It can effectively reduce the cost of quantization and inverse quantization and memory reading and writing , And verified by internal business , While improving the performance, it will not cause the decline of accuracy
OpenCL performance optimization :
01
Core convolution optimization :
a. Memory access performance optimization : Channel Blocking Optimize 、 And local memory (local memory) Optimize and improve memory access performance , Achieve data sharing within the working group ;
b. Computing performance optimization : winograd Algorithm optimization 3x3 Convolution , Addressing computing optimization , The offset of adjacent computing grid shares vector register , Reduce fp32 Calculate the unit pressure ;
02
Working group size optimization : Optimization calculation strategy , And pass Auto-Tuning Choose the best team size ;
03
Preprocessing / Post processing optimization : Use buffer Do parameter caching , Reduce GPU Copy overhead .
The desktop / Server performance optimization
TNN Server side Through integration OpenVINO and TensorRT In this way, the server is added X86 and NVIDIA Hardware support , It can quickly obtain the latest optimization results of hardware manufacturers , It can also add custom implementation based on the structural characteristics of business model to achieve the ultimate performance . Unified framework with industry server onnxruntime Compared with the best performance version ,TNN The current in CV The class model has some advantages , and onnxruntime stay NLP The class model has some advantages .TNN I just started to support NLP Model , We will continue to optimize in the future .
TNN The desktop In order to balance high performance and hardware compatibility , Also consider applications App Limitation on package size , adopt JIT And manual optimization to achieve lightweight X86 Back end , Support SSE41、SSE42、AVX、AVX2、FMA And so on . comparison onnxruntime Server Library 80MB,TNN The overall size of the desktop library is only 5MB about , And the performance gap is 20% within .
04
Conclusion
TNN Our goal is to be a platform supported AI The frame of reasoning , In collaboration with partners, we will continue to output to the hardware platforms (ARM、X86、NVIDIA etc. ) Adaptation and optimization of , Coming soon !
Click on 【 Read the original 】, Can get TNN Open source address .
Past highlights
Tencent pictures ncnn Won the 2020 Annual Top 10 new open source projects !
Break two World Records , Tencent Youtu open source video motion detection algorithm DBG
The background to reply “ The group of ”
Join the Youtu community
边栏推荐
- ThinkPHP 漏洞利用工具
- 期货怎么开户安全些?哪些期货公司靠谱些?
- 转置卷积详解
- How does easydss, an online classroom / online medical live on demand platform, separate audio and video data?
- C. Three displays(动态规划)Codeforces Round #485 (Div. 2)
- How to select an open source license
- MySQL Innodb和Myisam
- Global and Chinese market of computer protective film 2022-2028: Research Report on technology, participants, trends, market size and share
- Is Guotai Junan Futures safe? How to open a futures account? How to reduce the futures commission?
- Dismantle the industrial chain of synthetic rubber industry, and the supply chain may become a sharp weapon for breakthrough
猜你喜欢

微信公众号调试与Natapp环境搭建
MySQL Advanced Series: locks - locks in InnoDB

Understanding openstack network

B. Ternary Sequence(思维+贪心)Codeforces Round #665 (Div. 2)

Some adventurer hybrid versions with potential safety hazards will be recalled

【附下载】汉化版Awvs安装与简单使用

Applet - use of template

B. Terry sequence (thinking + greed) codeforces round 665 (Div. 2)
![[application recommendation] the hands-on experience and model selection suggestions of apifox & apipost in the recent fire](/img/dd/24df91a8a1cf1f1b9ac635abd6863a.png)
[application recommendation] the hands-on experience and model selection suggestions of apifox & apipost in the recent fire
![[cloud native | kubernetes chapter] Introduction to kubernetes Foundation (III)](/img/21/503ed54a2fa14fbfd67f75a55ec286.png)
[cloud native | kubernetes chapter] Introduction to kubernetes Foundation (III)
随机推荐
How to obtain ECS metadata
ThinkPHP 漏洞利用工具
Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021
Install the imagemagick7.1 library and the imageick extension for PHP
Logging is not as simple as you think
[tke] troubleshooting tips for container problems
Global and Chinese markets of natural insect repellents 2022-2028: Research Report on technology, participants, trends, market size and share
Find out the invisible assets -- use hosts collision to break through the boundary
Cognition and difference of service number, subscription number, applet and enterprise number (enterprise wechat)
60 divine vs Code plug-ins!!
2021-04-25: given an array arr and a positive number m, the
Goby+AWVS 实现攻击面检测
Some adventurer hybrid versions with potential safety hazards will be recalled
Where is the most formal and safe account opening for speculation futures? How to open a futures account?
Istio FAQ: virtualservice route matching sequence
Wechat official account debugging and natapp environment building
How to pop up an alarm through the national standard gb28181 protocol video platform easygbs for mobile detection / perimeter intrusion detection video recording
MySQL timestamp format conversion date format string
2021-05-02: given the path of a file directory, write a function
MySQL Advanced Series: locks - locks in InnoDB