当前位置:网站首页>Best practices for auto plug-ins and automatic batch processing in openvinotm 2022.1

Best practices for auto plug-ins and automatic batch processing in openvinotm 2022.1

2022-06-23 13:36:00 Intel edge computing community

OpenVINOTM 2022.1 in AUTO Best practices for plug-ins and automated batch processing

summary

OpenVINOTM 2022.1 Is from OpenVINOTM 2018 One of the biggest updates since it was first released in , See 《OpenVINO Usher in the most significant update to date ,2022.1 New features are the first to see !》. Among the many new features ,AUTO Plug ins and automatic batch processing (Automatic-Batching) Is one of the most important new features , It helps developers improve the performance and efficiency of reasoning computation without complex programming .

What is? AUTO plug-in unit ?

AUTO plug-in unit [1], Full name Automatic device selection (Automatic device selection), It's built on CPU/GPU Virtual plug-ins on top of plug-ins , Pictured 1-1 Shown . stay OpenVINOTM In the document ,“ equipment (device)” Is used for reasoning Intel processor , It can be supported CPU、GPU、VPU( Visual processing unit ) or GNA( Gauss neural accelerator coprocessor ) Or a combination of these devices [3].

 

chart 1-1  OpenVINOTM Runtime Supported device plug-ins [3]

AUTO plug-in unit benefits Yes :

  1. First, check all available computing devices on the runtime platform , Then select the best computing device for reasoning , And according to the deep learning model and the characteristics of the selected equipment Use... In the best configuration it .
  2. send GPU Achieve faster first inference delay :GPU The plug-in needs to compile the online model at run time before starting reasoning —— You may need to 10 Seconds or so to complete , It depends on the performance of the platform and the complexity of the model . When selecting independent or integrated GPU when ,“AUTO” The plug-in will first use CPU Reasoning , To hide this GPU Model compilation time .
  3. Easy to use , Developers just need to compile_model() Methodical device_name Parameter specified as “AUTO” that will do , Pictured 1-2 Shown .

 

chart 1-2  Appoint AUTO plug-in unit

What is automatic batch processing ?

Automatic batch processing (Automatic Batching) [2], Also called automatic batch execution (Automatic Batching Execution), yes OpenVINOTM Runtime One of the supported devices , Pictured 1-1 Shown .

Generally speaking , Batch size (batch size) The greater the reasoning calculation , The better the reasoning efficiency and throughput . Automatic batch execution combines multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request .

Automatic batch processing does not need to be manually specified by the developer . When compile_model() Methodical config Parameter set to {“PERFORMANCE_HINT”: ”THROUGHPUT”} when ,OpenVINOTM Runtime Meeting Auto start Automatic batch execution , Pictured 1-3 Shown , Let developers enjoy the improvement of computing device utilization and throughput with the least coding work .

 

chart 1-3  Automatically start automatic batch execution

Do it yourself AUTO Plug in features

Reading is learning , Practice is also learning , And it's more effective learning . This article provides a complete experimental code , For readers to practice , While learning and summarizing .

Github Address : https://github.com/yas-sim/openvino-auto-feature-visualization

      1. Set up an experimental environment

First step , Clone the code repository to the local .

git clone https://github.com/yas-sim/openvino-auto-feature-visualization.git

The second step , stay openvino-auto-feature-visualization Path execution :

python -m pip install --upgrade pip

pip install -r requirements.txt

The third step , Download the model and complete the transformation

omz_downloader --list models.txt

omz_converter --list models.txt

Here we are , The experimental environment is set up . All configuration and setting parameters of the experimental program are hard coded in the source code , You need to Manually modify Source code to change the test configuration , Pictured 1-4 Shown .

 

chart 1-4  Manually modify the configuration in the source code

AUTO Plug ins automatically switch computing devices

GPU The plug-in needs to be in the GPU Before starting the reasoning on IR The model is compiled as OpenCL Model . This model compilation process may take a long time , for example 10 second , Will delay the application from starting reasoning , Make the user experience when the application starts bad .

To hide this GPU Model compilation delay ,AUTO The plug-in will be in GPU Model compilation is performed using CPU Perform reasoning tasks ; When GPU After the model is compiled ,AUTO The plug-in will automatically transfer the inference computing device from CPU Switch to GPU, Such as chart 1-5 Shown .

chart 1-5  AUTO Plug ins automatically switch computing devices

 

Observe the behavior of automatically switching computing devices

AUTO The plug-in will be based on the device priority [1]: dGPU > iGPU > VPU > CPU, To choose the best computing device . When the auto plug-in selects GPU As the best equipment , Reasoning device switching will occur , To hide the first inference delay .

Please note that , The reasoning delay before and after device switching is different ; Besides , It is inferred that the delay fault may occur at the moment of device switching , Pictured 1-6 Shown .

As shown in the figure 1-6 Shown , Set up auto-test-latency-graph.py The configuration parameter is :

cfg['PERFORMANCE_HINT'] = ['THROUGHPUT', 'LATENCY'][0]

And run the command :

python auto-test-latency-graph.py

Open at the same time Windows Task manager , Observe CPU and iGPU Utilization ratio .

 

chart 1-6  config={“PERFORMANE_HINT”:”THROUGPUT”} The execution of

PERFORMANCE_HINT Set up

Such as 1.1.2 Section ,AUTO The execution behavior of the plug-in depends on compile_model() Methodical config Parametric PERFORMANCE_HINT Set up , As shown in the table 1-1 Shown :

surface 1-PERFORMANCE_HINT Set up

PERFORMANCE_HINT

Application scenarios

Whether to start Auto Batching?

'THROUGHPUT'

Non real-time large-scale reasoning and computing tasks

yes

'LATENCY'

Real time or near real time application tasks

no

Set up auto-test-latency-graph.py The configuration parameter is :

cfg['PERFORMANCE_HINT'] = ['THROUGHPUT', 'LATENCY'][1]

And run the command :

python auto-test-latency-graph.py

Open at the same time Windows Task manager , Observe CPU and iGPU Utilization ratio , The operation result is as shown in the figure 1-7 Shown .

 

chart 1-7  config={“PERFORMANE_HINT”:”LATENCY”} The execution of

Through the experiment , We can find out , According to config Parameter setting , bring AUTO Plug ins can work in different modes :

  1. stay Latency Pattern , It doesn't start automatically Auto Batching, After performing the device switching ,GPU The reasoning delay on is very small , Without shaking .
  2. stay THROUGHPUT Pattern , Auto start Auto Batching, After performing the device switching ,GPU The reasoning delay on is large , And it shakes .

Next , This article will discuss Auto Batching Influence on the behavior of reasoning and calculation .

Do it yourself Auto Batching Characteristics of

Such as 1.1.2 Section , Automatic batch execution combines multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request , Pictured 1-8 Shown .

 

chart 1-8  Auto Batching Implementation process of

Auto Batching When a specified number of asynchronous inference requests or timers are collected ( Default timeout =1,000 millisecond ) Start the batch inference calculation (batch-inference), Pictured 1-9 Shown .

 

chart 1-9  Start batch inference calculation

Auto Batching When prohibited

Auto Batching When prohibited , All inference requests are processed separately .

Please configure and run auto-test.py.

Device: AUTO

Config: { 'PERFORMANCE_HINT': 'LATENCY'}

niter: 20 , interval: 30 ms

OPTIMAL_NUMBER_OF_INFER_REQUESTS1

Number of infer requests: 1

The operation result is as shown in the figure 1-10 Shown , It can be seen that each inference request is processed separately .

 

chart 1-10  Auto Batching The result of the operation when it is prohibited

Auto Batching When enabled

Auto Batching When enabled , Asynchronous inference requests will be bound and processed as multi batch inference requests . When the reasoning is done , The results are distributed to each asynchronous inference request and returned . It should be noted that : Batch inference computation does not guarantee the inference order of asynchronous inference requests .

Please configure and run auto-test.py.

Device: GPU

Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT', 'ALLOW_AUTO_BATCHING': 'YES'}

niter: 200 , interval: 30 ms

OPTIMAL_NUMBER_OF_INFER_REQUESTS64

Number of infer requests: 16

The operation result is as shown in the figure 1-11 Shown , Visible every 16 Reasoning requests are combined into a batch for batch reasoning calculation , The order of reasoning is not guaranteed .

 

chart 1-11  Auto Batching Operation results when enabled

Auto Batching It will lead to a longer reasoning delay

Due to the long default timeout ( Default timeout = 1,000ms), In the case of low reasoning request frequency, a long reasoning delay may be introduced .

because Auto Batching Will wait for the specified number of inference requests to enter or the timeout timer timed out , In the case of low reasoning frequency , It cannot collect enough inference requests to start the batch inference computation within the specified timeout , therefore , The submitted reasoning request will be postponed , Until the timer times out , This will introduce greater than timeout Reasoning delay set .

To solve the above problems , The user can go through AUTO_BATCH_TIMEOUT The configuration parameter specifies the timeout time , To minimize this impact .

Please use AutoBatching Default timeout, function auto-test.py.

Device: GPU

Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT'}

niter: 20, interval: 300 ms

OPTIMAL_NUMBER_OF_INFER_REQUESTS64

Number of infer requests: 64

    The operation result is as shown in the figure 1-12 Shown , Because every time I can't timeout A specified number of inference requests were collected within the time , This leads to a high delay in reasoning requests .

 

chart 1-12  timeout=1000ms Running results

Please configure AutoBatching Of timeout=100ms, And then run auto-test.py.

Device: GPU

Config: { 'CACHE_DIR': './cache', 'PERFORMANCE_HINT': 'THROUGHPUT', 'AUTO_BATCH_TIMEOUT': '100'}

niter: 20 , interval: 300 ms

OPTIMAL_NUMBER_OF_INFER_REQUESTS64

Number of infer requests: 16

 

chart 1-13  timeout=100ms Running results

The operation result is as shown in the figure 1-13 Shown , timeout=100ms Within time , Only one inference request can be collected .

Auto Batching Best practices

in summary ,Auto Batching Best programming practices for :

  1. Remember , By default Auto Batching It will not be enabled .
  2. Only in {'PERFORMANCE_HINT': 'THROUGHPUT', 'ALLOW_AUTO_BATCHING': 'YES'} when ,Auto Batching To enable .
  3. If your application can Submit reasoning requests continuously at a high frequency , Please use automatic batch processing
  4. Warning : If your application intermittently submits reasoning requests , The last inference request may have an unexpected long delay .
  5. If the reasoning rhythm or frequency is low , That is, the reasoning frequency is much lower than AUTO_BATCH_TIMEOUT( The default is 1,000 millisecond ), Do not turn on automatic batch processing .
  6. • You can use AUTO_BATCH_TIMEOUT Parameter to change the timeout setting of automatic batch processing , To minimize unwanted long delays , The unit of parameter value is “ms”.
  7. If you know the optimal batch size for your workload , Please use PERFORMANCE_HINT_NUM_REQUESTS Specify the appropriate batch quantity , namely {'PERFORMANCE_HINT_NUM_REQUESTS':'4'}. meanwhile , With GPU For example ,AUTO The plug-in will use the available memory in the background , Model accuracy, etc. to calculate the optimal batch size

summary  

This section gives AUTO Plug ins and Auto Batching Quick summary of , As shown in the table 1-2 Shown .

surface 1-2  AUTO Plug in and automatic batch execution quick summary table

Automatic device selection

Automatic Batching

describe

- Enumerate the available devices on the system , Select the best equipment and use it for reasoning

- By using CPU Start reasoning to hide GPU Model compilation time , After compiling, switch to GPU

Combine multiple asynchronous inference requests from user programs , Think of them as multi batch reasoning requests , And disassemble the batch reasoning results , Return to each inference request

advantage

- Developers do not need to do detailed hardware configuration

- Applications can take advantage of the best performance of the system

- Shorter first inference delay :AUTO Plug ins can be hidden GPU Model compilation time

- Equipment utilization and efficiency will be improved

- Developers can enjoy multi batch throughput with minimal programming effort

shortcoming

* Not for people who need consistent and predictable performance

* The reasoning performance will be different before and after device switching ( for example ,“CPU”->“GPU”)

* Reasoning performance may decline at the time of device switching ( In the order of a few seconds )

* Only in GPU Available on the

* Default timeout =1,000 millisecond . This can cause unexpected long latency performance problems .

Default

Enable ?

Not started by default .

You need to specify “AUTO” As the plug-in name .

Enabled by default

Limited to GPU

how

start-up

Appoint “AUTO” As the plug-in name

ALLOW_AUTO_BATCHING=YES Is the default value .

1. ALLOW_AUTO_BATCHING=YES, device=GPU, PERFORMANCE_HINT=THROUGHPUT

2. Appoint “BATCH:GPU” As equipment name

additional

Be careful

Default device selection priority :dGPU > iGPU > VPU > CPU

Important note : If AUTO The plug-in can choose “GPU” As the final computing device , also

PERFORMANCE_HINT=THROUGHPUT Set , Automatic batch processing will be enabled .

How to disable Auto Batching?

Use compile_model() or set_property():

1. Set up ALLOW_AUTO_BATCHING = NO, perhaps

2 Appoint PERFORMANCE_HINT = LATENCY

原网站

版权声明
本文为[Intel edge computing community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231256479698.html

随机推荐