当前位置:网站首页>Best practices for auto plug-ins and automatic batch processing in openvinotm 2022.1
Best practices for auto plug-ins and automatic batch processing in openvinotm 2022.1
2022-06-23 13:36:00 【Intel edge computing community】
OpenVINOTM 2022.1 in AUTO Best practices for plug-ins and automated batch processing
OpenVINOTM 2022.1 Is from OpenVINOTM 2018 One of the biggest updates since it was first released in , See 《OpenVINO Usher in the most significant update to date ,2022.1 New features are the first to see !》. Among the many new features ,AUTO Plug ins and automatic batch processing (Automatic-Batching) Is one of the most important new features , It helps developers improve the performance and efficiency of reasoning computation without complex programming .
What is? AUTO plug-in unit ?
AUTO plug-in unit [1], Full name Automatic device selection (Automatic device selection), It's built on CPU/GPU Virtual plug-ins on top of plug-ins , Pictured 1-1 Shown . stay OpenVINOTM In the document ,“ equipment (device)” Is used for reasoning Intel processor , It can be supported CPU、GPU、VPU( Visual processing unit ) or GNA( Gauss neural accelerator coprocessor ) Or a combination of these devices [3].

chart 1-1 OpenVINOTM Runtime Supported device plug-ins [3]
AUTO plug-in unit benefits Yes :
- First, check all available computing devices on the runtime platform , Then select the best computing device for reasoning , And according to the deep learning model and the characteristics of the selected equipment Use... In the best configuration it .
- send GPU Achieve faster first inference delay :GPU The plug-in needs to compile the online model at run time before starting reasoning —— You may need to 10 Seconds or so to complete , It depends on the performance of the platform and the complexity of the model . When selecting independent or integrated GPU when ,“AUTO” The plug-in will first use CPU Reasoning , To hide this GPU Model compilation time .
- Easy to use , Developers just need to compile_model() Methodical device_name Parameter specified as “AUTO” that will do , Pictured 1-2 Shown .

chart 1-2 Appoint AUTO plug-in unit
Automatic batch processing (Automatic Batching) [2], Also called automatic batch execution (Automatic Batching Execution), yes OpenVINOTM Runtime One of the supported devices , Pictured 1-1 Shown .
Generally speaking , Batch size (batch size) The greater the reasoning calculation , The better the reasoning efficiency and throughput . Automatic batch execution combines multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request .
Automatic batch processing does not need to be manually specified by the developer . When compile_model() Methodical config Parameter set to {“PERFORMANCE_HINT”: ”THROUGHPUT”} when ,OpenVINOTM Runtime Meeting Auto start Automatic batch execution , Pictured 1-3 Shown , Let developers enjoy the improvement of computing device utilization and throughput with the least coding work .

chart 1-3 Automatically start automatic batch execution
Do it yourself AUTO Plug in features
Reading is learning , Practice is also learning , And it's more effective learning . This article provides a complete experimental code , For readers to practice , While learning and summarizing .
Github Address : https://github.com/yas-sim/openvino-auto-feature-visualization
First step , Clone the code repository to the local .
git clone https://github.com/yas-sim/openvino-auto-feature-visualization.git
The second step , stay openvino-auto-feature-visualization Path execution :
python -m pip install --upgrade pip
pip install -r requirements.txt
The third step , Download the model and complete the transformation
omz_downloader --list models.txt
omz_converter --list models.txt
Here we are , The experimental environment is set up . All configuration and setting parameters of the experimental program are hard coded in the source code , You need to Manually modify Source code to change the test configuration , Pictured 1-4 Shown .

chart 1-4 Manually modify the configuration in the source code
AUTO Plug ins automatically switch computing devices
GPU The plug-in needs to be in the GPU Before starting the reasoning on IR The model is compiled as OpenCL Model . This model compilation process may take a long time , for example 10 second , Will delay the application from starting reasoning , Make the user experience when the application starts bad .
To hide this GPU Model compilation delay ,AUTO The plug-in will be in GPU Model compilation is performed using CPU Perform reasoning tasks ; When GPU After the model is compiled ,AUTO The plug-in will automatically transfer the inference computing device from CPU Switch to GPU, Such as chart 1-5 Shown .
chart 1-
5 AUTO Plug ins automatically switch computing devices
Observe the behavior of automatically switching computing devices
AUTO The plug-in will be based on the device priority [1]: dGPU > iGPU > VPU > CPU, To choose the best computing device . When the auto plug-in selects GPU As the best equipment , Reasoning device switching will occur , To hide the first inference delay .
Please note that , The reasoning delay before and after device switching is different ; Besides , It is inferred that the delay fault may occur at the moment of device switching , Pictured 1-6 Shown .
As shown in the figure 1-6 Shown , Set up auto-test-latency-graph.py The configuration parameter is :
cfg['PERFORMANCE_HINT'] = ['THROUGHPUT', 'LATENCY'][0]
And run the command :
python auto-test-latency-graph.py
Open at the same time Windows Task manager , Observe CPU and iGPU Utilization ratio .

chart 1-6 config={“PERFORMANE_HINT”:”THROUGPUT”} The execution of
PERFORMANCE_HINT Set up
Such as 1.1.2 Section ,AUTO The execution behavior of the plug-in depends on compile_model() Methodical config Parametric PERFORMANCE_HINT Set up , As shown in the table 1-1 Shown :
surface 1-1 PERFORMANCE_HINT Set up
PERFORMANCE_HINT | Application scenarios | Whether to start Auto Batching? |
'THROUGHPUT' | Non real-time large-scale reasoning and computing tasks | yes |
'LATENCY' | Real time or near real time application tasks | no |
Set up auto-test-latency-graph.py The configuration parameter is :
cfg['PERFORMANCE_HINT'] = ['THROUGHPUT', 'LATENCY'][1]
And run the command :
python auto-test-latency-graph.py
Open at the same time Windows Task manager , Observe CPU and iGPU Utilization ratio , The operation result is as shown in the figure 1-7 Shown .

chart 1-7 config={“PERFORMANE_HINT”:”LATENCY”} The execution of
Through the experiment , We can find out , According to config Parameter setting , bring AUTO Plug ins can work in different modes :
- stay Latency Pattern , It doesn't start automatically Auto Batching, After performing the device switching ,GPU The reasoning delay on is very small , Without shaking .
- stay THROUGHPUT Pattern , Auto start Auto Batching, After performing the device switching ,GPU The reasoning delay on is large , And it shakes .
Next , This article will discuss Auto Batching Influence on the behavior of reasoning and calculation .
Do it yourself Auto Batching Characteristics of
Such as 1.1.2 Section , Automatic batch execution combines multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request , Pictured 1-8 Shown .

chart 1-8 Auto Batching Implementation process of
Auto Batching When a specified number of asynchronous inference requests or timers are collected ( Default timeout =1,000 millisecond ) Start the batch inference calculation (batch-inference), Pictured 1-9 Shown .

chart 1-9 Start batch inference calculation
Auto Batching When prohibited
Auto Batching When prohibited , All inference requests are processed separately .
Please configure and run auto-test.py.
Device: AUTO
Config: { 'PERFORMANCE_HINT': 'LATENCY'}
niter: 20 , interval: 30 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS1
Number of infer requests: 1
The operation result is as shown in the figure 1-10 Shown , It can be seen that each inference request is processed separately .

chart 1-10 Auto Batching The result of the operation when it is prohibited
Auto Batching When enabled
Auto Batching When enabled , Asynchronous inference requests will be bound and processed as multi batch inference requests . When the reasoning is done , The results are distributed to each asynchronous inference request and returned . It should be noted that : Batch inference computation does not guarantee the inference order of asynchronous inference requests .
Please configure and run auto-test.py.
Device: GPU
Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT', 'ALLOW_AUTO_BATCHING': 'YES'}
niter: 200 , interval: 30 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS64
Number of infer requests: 16
The operation result is as shown in the figure 1-11 Shown , Visible every 16 Reasoning requests are combined into a batch for batch reasoning calculation , The order of reasoning is not guaranteed .

chart 1-11 Auto Batching Operation results when enabled
Auto Batching It will lead to a longer reasoning delay
Due to the long default timeout ( Default timeout = 1,000ms), In the case of low reasoning request frequency, a long reasoning delay may be introduced .
because Auto Batching Will wait for the specified number of inference requests to enter or the timeout timer timed out , In the case of low reasoning frequency , It cannot collect enough inference requests to start the batch inference computation within the specified timeout , therefore , The submitted reasoning request will be postponed , Until the timer times out , This will introduce greater than timeout Reasoning delay set .
To solve the above problems , The user can go through AUTO_BATCH_TIMEOUT The configuration parameter specifies the timeout time , To minimize this impact .
Please use AutoBatching Default timeout, function auto-test.py.
Device: GPU
Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT'}
niter: 20, interval: 300 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS64
Number of infer requests: 64
The operation result is as shown in the figure 1-12 Shown , Because every time I can't timeout A specified number of inference requests were collected within the time , This leads to a high delay in reasoning requests .
![]()
chart 1-12 timeout=1000ms Running results
Please configure AutoBatching Of timeout=100ms, And then run auto-test.py.
Device: GPU
Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT', 'AUTO_BATCH_TIMEOUT': '100'}
niter: 20 , interval: 300 ms
OPTIMAL_NUMBER_OF_INFER_REQUESTS64
Number of infer requests: 16

chart 1-13 timeout=100ms Running results
The operation result is as shown in the figure 1-13 Shown , timeout=100ms Within time , Only one inference request can be collected .
Auto Batching Best practices
in summary ,Auto Batching Best programming practices for :
- Remember , By default Auto Batching It will not be enabled .
- Only in {'PERFORMANCE_HINT': 'THROUGHPUT', 'ALLOW_AUTO_BATCHING': 'YES'} when ,Auto Batching To enable .
- If your application can Submit reasoning requests continuously at a high frequency , Please use automatic batch processing
- Warning : If your application intermittently submits reasoning requests , The last inference request may have an unexpected long delay .
- If the reasoning rhythm or frequency is low , That is, the reasoning frequency is much lower than AUTO_BATCH_TIMEOUT( The default is 1,000 millisecond ), Do not turn on automatic batch processing .
- • You can use AUTO_BATCH_TIMEOUT Parameter to change the timeout setting of automatic batch processing , To minimize unwanted long delays , The unit of parameter value is “ms”.
- If you know the optimal batch size for your workload , Please use PERFORMANCE_HINT_NUM_REQUESTS Specify the appropriate batch quantity , namely {'PERFORMANCE_HINT_NUM_REQUESTS':'4'}. meanwhile , With GPU For example ,AUTO The plug-in will use the available memory in the background , Model accuracy, etc. to calculate the optimal batch size
summary
This section gives AUTO Plug ins and Auto Batching Quick summary of , As shown in the table 1-2 Shown .
surface 1-2 AUTO Plug in and automatic batch execution quick summary table
Automatic device selection | Automatic Batching | |
describe | - Enumerate the available devices on the system , Select the best equipment and use it for reasoning - By using CPU Start reasoning to hide GPU Model compilation time , After compiling, switch to GPU | Combine multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request |
advantage | - Developers do not need to do detailed hardware configuration - Applications can take advantage of the best performance of the system - Shorter first inference delay :AUTO Plug ins can be hidden GPU Model compilation time | - Equipment utilization and efficiency will be improved - Developers can enjoy multi batch throughput with minimal programming effort |
shortcoming | * Not for people who need consistent and predictable performance * The reasoning performance will be different before and after device switching ( for example ,“CPU”->“GPU”) * Reasoning performance may decline at the time of device switching ( In the order of a few seconds ) | * Only in GPU Available on the * Default timeout =1,000 millisecond . This can cause unexpected long latency performance problems . |
Default Enable ? | Not started by default . You need to specify “AUTO” As the plug-in name . | Enabled by default Limited to GPU |
how start-up | Appoint “AUTO” As the plug-in name | ALLOW_AUTO_BATCHING=YES Is the default value . 1. ALLOW_AUTO_BATCHING=YES, device=GPU, PERFORMANCE_HINT=THROUGHPUT 2. Appoint “BATCH:GPU” As equipment name |
additional Be careful | Default device selection priority :dGPU > iGPU > VPU > CPU Important note : If AUTO The plug-in can choose “GPU” As the final computing device , also PERFORMANCE_HINT=THROUGHPUT Set , Automatic batch processing will be enabled . | How to disable Auto Batching? Use compile_model() or set_property(): 1. Set up ALLOW_AUTO_BATCHING = NO, perhaps 2 Appoint PERFORMANCE_HINT = LATENCY |
边栏推荐
- PHP handwriting a perfect daemon
- TUIKit 音视频低代码解决方案导航页
- Go write file permission WriteFile (filename, data, 0644)?
- 4E1 PDH optical transceiver 19 inch rack type single fiber transmission 20km E1 interface optical network optical transceiver
- #云原生征文#深入了解Ingress
- Restcloud ETL resolves shell script parameterization
- Getting started with reverse debugging - learn about PE structure files
- Gradle Build Cache引发的Task缓存编译问题怎么解决
- POW共识机制
- AAIG看全球6月刊(上)发布|AI人格真的觉醒了吗?NLP哪个细分方向最具社会价值?Get新观点新启发~
猜你喜欢

『忘了再学』Shell流程控制 — 39、特殊流程控制语句

Germancreditdata of dataset: a detailed introduction to the introduction, download and use of germancreditdata dataset

Modelsim 安装步骤详解

使用OpenVINOTM预处理API进一步提升YOLOv5推理性能

Androd Gradle模块依赖替换如何使用

Getting started with reverse debugging - learn about PE structure files

Esp32-c3 introductory tutorial problem ⑦ - fatal error: ESP_ Bt.h: no such file or directory ESP not found_ bt.h

Loss, duplication and backlog of message queues

华三交换机配置SSH远程登录

AGCO AI frontier promotion (6.23)
随机推荐
The GLM function of R language uses frequency data to build a binary logistic regression model. The input data for analysis is frequency data, which is transformed into normal sample data (split and s
2-optical-2-electric cascaded optical fiber transceiver Gigabit 2-optical-2-electric optical fiber transceiver Mini embedded industrial mine intrinsic safety optical fiber transceiver
前AMD芯片架构师吐槽,取消 K12 处理器项目是因为 AMD 怂了!
Esp32-c3 introductory tutorial problem ⑦ - fatal error: ESP_ Bt.h: no such file or directory ESP not found_ bt.h
理解ADT与OOP
618's money saving technology strategy is coming - experience the scene and get a 10 yuan cat super card!
How about stock online account opening and account opening process? Is it safe to open a mobile account?
Has aaig really awakened its AI personality after reading the global June issue (Part 1)? Which segment of NLP has the most social value? Get new ideas and inspiration ~
TUIKit 音视频低代码解决方案导航页
< Sicily> 1000. number reversal
Restcloud ETL resolves shell script parameterization
LM05丨曾经的VIX(二代产品)
Tt-slam: dense monocular slam for flat environment (IEEE 2021)
从类、API、框架三个层面学习如何设计可复用软件的学习心得
怎么手写vite插件
How to solve the task cache compilation problem caused by gradle build cache
R language uses matchit package for propensity matching analysis (set the matching method as nearest, match the control group and case group with the closest propensity score, 1:1 ratio), and use matc
4-way telephone +1-way Gigabit Ethernet 4-way PCM telephone optical transceiver
R language dplyr package mutate_ The all function multiplies all numeric columns (variables) in the dataframe by a fixed value to generate a new data column, and specifies a user-defined suffix name f
POW consensus mechanism