当前位置:网站首页>4、 Model optimizer and inference engine
4、 Model optimizer and inference engine
2022-07-28 06:18:00 【Aaaaaki】
Four 、 Model optimizer and inference engine
1 DLDT Introduce
DLDT: Deep learning deployment Kit (Deep Learning Deployment Toolkit)
adopt DLDT, The model can be transformed into IR File deployment on Intel Supported hardware devices , In this process, the original model goes through two levels of Optimization: model optimizer and inference engine , The specific process is shown in the figure below .

2 Model optimizer (Model Optimizer)
Model optimizer : Cross platform command line tools , Be responsible for transforming the models of various in-depth learning frameworks into IR file , So that the inference engine can read it 、 Loading and reasoning .
Characteristics of model optimizer
The use of the model optimizer is independent of the hardware environment , All the processing of the model is completed without knowing the final deployment equipment , therefore IR Files can run on any supported device .
Generated IR Documents can be found in AI Used repeatedly in the reasoning process of application , convert to IR After the document , The accuracy of the model will decrease slightly , But the performance will become stronger
stay ./OpenVINO/deployment_tools/model_optimizer/extensions/front/ Under the path , You can find the actual code of each layer of the model , It can be customized on this basis , With LayerNorm Layer as an example , Part of the code is as follows :

If a layer in the model is not supported , You can choose to create a custom layer . if OpenVINO Your topology is not supported , You can use the corresponding method to clip and paste the model network , And replace some of its parts or subgraphs with supported structures .
Model optimizer function
Models of various deep learning frameworks can be transformed into IR file
Model network operations can be mapped to supported libraries 、 Kernel or layer
Pretreatment operation can be carried out , If –reverse_input_channel Convert the input channel sequence , from RGB Convert to BGR
It can be optimized for Neural Networks , Adjust the input batch size and input size of neural network
The format of model data or weight can be adjusted , Such as FP32、FP16 And INT8, Different devices support different data formats , As follows :

Editable network model
Support to build custom layer
Use the model optimizer to optimize SSD-Mobilenet Model
Install the necessary components
The necessary step in using the model optimizer is to ensure that the necessary components are installed in the model , Get into ./model_optimizer/install_prerequisites/ Under the table of contents , function bat file , Select the framework installation to run the script , Install the necessary components for all software suites

Download the model through the model downloader
python downloader.py --name ssd_mobilenet_v2_coco -o output_dir
The content of the downloaded model is as follows :

pd The file is the model frozen at the end of the training , All variables in the frozen model have fixed values , If the model is not frozen , The model needs to be frozen
pipeline.config The file is an interpretation file of network topology , It needs to be used by the model optimizer . To find the parameters to be used by the model optimizer , Need to access ./deployment_tools/open_model_zoo/models/public/ Under the folder of the corresponding model

Can be found in yml file Find the parameters required by the model optimizer

according to yml file explain , Running the model optimizer will freeze pb file Convert to IR file , Run the following command
python $mo_dir$\mo.py --input_model $model_path$\frozen_inference_graph.pb --reverse_input_channels --input_shape=[1,300,300,3] --input=image_tensor --transformations_config=$model_optimizer_path$\extensions\front\tf\ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config=$pipeline_path$\pipeline.config --output=detection_classes,detection_scores,detection_boxes,num_detections --model_name ssd-mobilenet
use IR file Reasoning , Perform the detection task
It should be noted that , about layers new edition OpenVINO No longer support , Need to be commented out , Otherwise, it will prompt openvino.inference_engine.ie_api.IENetwork’ object has no attribute ‘layers’ error

Comment out this section , It can run normally

The output image is shown in the figure below :

Clip and paste network model
Look for each layer id The name of the corresponding layer
find "layer id" mobilenetv2-7.xmlgrep "layer id" mobilenetv2-7.xml | head -10
function mo.py , take Input Change to the specified layer name
python mo.py --input_model mobilenetv2-7.onnx --reverse_input_channels --output_dir $output_path$ --input mobilenetv20_features_conv0_fwd --model_name mobilenetv2-7-no-head
notes : In official teaching , It specifies mean_values And scale_values Value , But in personal experiments , After cutting and pasting the model, you will be prompted scale_values Value mismatch , Therefore, its input value is not specified .
3 Inference engine (Inference Engine)
Reasoning engine optimization
IR When the model is generated, it is not optimized for specific operating equipment , But in IR After the file is input into the inference engine , The inference engine is responsible for the specific hardware environment IR File optimization
Because various hardware devices have different instruction sets and memories , Therefore, the inference engine adopts a flexible plug-in architecture to implement environment configuration . The existence of plug-in architecture makes it possible to use almost the same code to perform tasks on completely different devices
Each plug-in has its own specific library , With CPU Medium MKL-DNN For example ,MKL-DNN For all Intel CPU Corresponding kernel 、 Layer or function to implement neural network optimization , If the library does not support your layer , You can build a custom layer and register it with the inference engine .

Before reasoning , The inference engine maps the network to the correct library unit , And send the network to the hardware plug-in , Perform multiple levels of hardware specific optimization .
- Network level optimization : All operations are not mapped to the kernel , But mapping the relationship between them , Such as data reorganization . This can improve network performance , Minimize data conversion time during reasoning
- Memory level optimization : Reorganize data in memory according to the requirements of specific devices
- Kernel level optimization : The inference engine will choose the correct implementation that best suits the architectural instruction set , if CPU Support AVX512 Instruction set , It will be used to implement the scheme
Inference engine API
Intel For all Intel The architecture hardware device provides a simple and unified set of API, Its special plug-in architecture supports optimizing reasoning performance and memory usage , Mainly adopts C++ Language
API Interface :
IECore class : Defines an inference engine object , There is no need to specify a specific device
- read_network(): Read in IR Functions of files
- load_network(): Load the network to the specified device ,HETERO The plug-in is responsible for returning the execution of the unsupported layer to other devices , Such as HETERO:FPGA,CPU ;MULTI The plug-in makes it possible to run each inference call on different devices , So as to make full use of all the equipment in the system , Execute reasoning in parallel , Such as : device_name = MULTI:MYRIAD,CPU
InferRequest class : Used for reasoning tasks
4 Performance evaluation
Inference engine workflow
- Declare inference engine objects , Read neural network , And load the network into the plug-in . According to the actual size of network input and output blob Reasoning
- It should be noted that , Model Accuracy Not equal to performance ,Accuracy Just a measure of deep learning , Not equal to performance , Even the higher the accuracy of the model, the greater the amount of parameters may be , The more susceptible performance is
Factors affecting model performance
- throughput : The number of frames that the neural network can process in one second , Unit is “ Reasoning per second ”, use FPS Express
- Delay : Time from data analysis to result reading , The unit is “ millisecond (ms)”
- efficiency : The unit is “ Watt ” or “ Frames per second per unit price ”, Based on power consumption or price factors of the system
Factors affecting the performance of Neural Networks
Topological structure or model parameter quantity of neural network
Heterogeneous equipment :CPU、GPU、FPGA、AI Accelerator (VPU Visual processing unit ), For example, building applications , First determine where the neural network runs , stay Video Reasoning on computing devices , stay GPU Run video processing on , stay CPU Run other tasks and logic on
Model accuracy ( data format ):Intel The instruction set architecture has many packaged data types , You can package many data in one packed data type and then perform one operation on all data at once , That is, single instruction 、 More data
- SSE4.2: It can be packed 16 Bytes of INT8 data , And perform the same operation on all data in one clock cycle
- AVX2: It can be packed 32 Bytes of INT8 data
- AVX512: It can be packed 64 Bytes of INT8 data
It's generating IR The calibration process is carried out after the document (Calibration), The calibration process converts as many layers of data as possible into integers without compromising accuracy , Its advantage is that smaller data types will occupy smaller memory space , It can reduce the amount of operation 、 Speed up execution . If the model data format is integer , Available VNNI Vector neural network instructions in Intel DL Boost On convolution layer 3 Performance improvement .

The batch : Increasing the batch processing level can improve the calculation efficiency , But large batches can also lead to increased delays
Asynchronous execution : Asynchronous processing of each frame can bring a huge increase in throughput
Throughput Pattern : By monitoring the parallel number , control CPU Intelligent allocation of resources , And assign multiple inference requests ,CPU The more cores , The more efficient the function is
边栏推荐
- Deep learning (incremental learning) -- iccv2021:ss-il: separated softmax for incremental learning
- 浪涌冲击抗扰度实验(SURGE)-EMC系列 硬件设计笔记6
- 深度学习(二)走进机器学习与深度学习编程部分
- 关于隔离电源断电瞬间MOSFET损坏问题分析
- How much does small program development cost? Analysis of two development methods!
- 压敏电阻设计参数及经典电路记录 硬件学习笔记5
- EMC实验实战案例-ESD静电实验
- 无约束低分辨率人脸识别综述三:同质低分辨率人脸识别方法
- 深度学习(增量学习)——ICCV2022:Contrastive Continual Learning
- Model Inversion Attacks that Exploit Confidence Informati on and Basic Countermeasures 阅读心得
猜你喜欢

使用MS图表控件创建基本报表

How to choose an applet development enterprise

Prime_ Series range from detection to weight lifting

Addition and multiplication calculation of GF (2^8)

Differences between processes and threads

Never leave its origin - bluecms1.6 vulnerability of the controller's shooting range

EIGamal 密码体制描述

What are the general wechat applet development languages?

The difference and relation between TCP and UDP

深度学习(二)走进机器学习与深度学习编程部分
随机推荐
关于Fusion on Apple Silicon的谨慎猜测
深度学习——Patches Are All You Need
神经网络优化
Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations
基于差值扩展的可逆水印方法
ESXi Arm Edition version 1.10更新
ESXi 社区版网卡驱动
ESXi on ARM v1.2 (2020年11月更新)
frameset 用法示例
无约束低分辨率人脸识别综述一:用于低分辨率人脸识别的数据集
Which enterprises are suitable for small program production and small program development?
What about the app store on wechat?
LED发光二极管选型-硬件学习笔记3
弹出消息对话框的方法
Prime_ Series range from detection to weight lifting
深度学习(增量学习)——(ICCV)Striking a Balance between Stability and Plasticity for Class-Incremental Learning
Getting started with latex
A comparative study of backdoor attack and counter sample attack
C语言EOF的理解
Neural network optimization