当前位置:网站首页>Openvino CPU acceleration survey
Openvino CPU acceleration survey
2022-06-23 01:13:00 【Johns】
Theoretical part
One . Introduce
OpenVINO It is used for optimization and deployment AI Open Source Toolkit for reasoning .
- Improve computer vision 、 Automatic speech recognition 、 Deep learning performance in natural language processing and other common tasks
- Use pass TensorFlow、PyTorch And other popular framework training models
- Reduce resource requirements and expand the range of Intel... From the edge to the cloud Deploy efficiently on the platform
Training 、 Optimize 、 Deploy
Two . Optimization principle
- Linear Operations Fusing( Operator fusion )
- Precision Calibration( Accuracy calibration ) It actually refers to the model INT8 quantitative , You can also use it inter Of NNCF Perform other model compression operations
3、 ... and . OpenVINO Introduction to common tools
- Deep learning model optimizer Deep Learning Model Optimizer- A cross platform command line tool , Used to import models and prepare them for best execution using the inference engine . Model optimizer import 、 Transform and optimize models , These models have been trained in popular frameworks , for example Caffe、TensorFlow、MXNet、Kaldi and ONNX*.
- Deep learning reasoning engine Deep Learning Inference Engine- A unified API, Allows high-performance reasoning on many hardware types , Including Intel CPU、 Intel integrated graphics 、 Intel Nerve calculator 2、 Intel Movidius Visual processing unit (VPU) Intel Visual accelerator design .
- Inference engine example Inference Engine Samples - A simple set of console applications , Demonstrates how to use the inference engine in your application .
- Deep learning workbench Deep Learning Workbench - be based on Web Drawing environment for , Allows you to easily use a variety of complex OpenVINO Kit components .
- Post training optimization tool Post-Training Optimization tool - A calibration model is then modeled with INT8 Tools for precision execution .
- Additional tools - A set of tools for working with models , Include (https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html), Cross Check Tool, Compile tool Tools
- Open Model Zoo Open Model Zoo
- Demos Demos - Console Application , Provide powerful application templates to help you implement specific deep learning scenarios .
- Other tools - A set of tools used with models , Include Accuracy Checker Utility and Model Downloader.Accuracy Checker Utility and Model Downloader.
- Pre training model document Documentation for Pretrained Models - Open Model Zoo Pre training model documentation available in the repository . Open Model Zoo repository.
The actual part
One . Environmental preparation
# Pull and start the container
docker pull openvino/ubuntu18_dev:latest
docker run -itd -p 8501:8501 -p 8500:8500 -p 8889:8889 -v "/root/openvino_notebooks:/openvino_notebooks" openvino/ubuntu18_dev:latest
# Into the container
docker exec -it -u root bc89fe5f98e6 /bin/bash
# Pull case base
git clone --depth=1 https://github.com/openvinotoolkit/openvino_notebooks.git
# install jupyter
cd openvino_notebooks
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python3-venv build-essential python3-dev git-all
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m ipykernel install --user --name openvino_env
apt-get install vim
# start-up jupyter
jupyter lab notebooks --allow-rootTwo . Model transformation ( Use jupyter notebook)
import time
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown
# Construct the command for Model Optimizer
mo_command = f"""mo
--saved_model_dir "/openvino_notebooks/open_model_zoo_models/custom/origin_model"
--data_type FP32
--input dense_input,sparse_ids_input,sparse_wgt_input,seq_50_input
--input_shape [100,587],[100,53],[100,53],[100,6,50]
--output_dir "/openvino_notebooks/open_model_zoo_models/custom/fp32"
--output "Identity"
"""
mo_command = " ".join(mo_command.split())
print("Model Optimizer command to convert TensorFlow to OpenVINO:")
display(Markdown(f"`{mo_command}`"))
! $mo_command3、 ... and . Model quantification ( Use jupyter notebook)
import os
from pathlib import Path
from openvino.tools.pot import DataLoader
import tensorflow as tf
import math
from yaspin import yaspin
# from TFRecord Reading data
def input_fn_tfrecord(filenames, batch_size=256):
"""make input fn for tfrecord file
"""
reader = tf.data.TFRecordDataset(
filenames,
num_parallel_reads=10,
).shuffle(100000, reshuffle_each_iteration=True)
features = {
'dense_input': tf.io.FixedLenSequenceFeature([], tf.float32, allow_missing=True),
'sparse_ids_input': tf.io.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
'sparse_wgt_input': tf.io.FixedLenSequenceFeature([], tf.float32, allow_missing=True),
'seq_50_input': tf.io.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
'is_click': tf.io.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
}
def _parse_example(example):
"""
parse data
"""
parse_data = tf.io.parse_single_example(example, features)
return [
tf.reshape(parse_data['dense_input'][:587], shape=[587]),
tf.reshape(tf.cast(parse_data["sparse_ids_input"], tf.int32), shape=[53]),
tf.reshape(parse_data["sparse_wgt_input"], shape=[53]),
tf.reshape(tf.reshape(tf.cast(parse_data['seq_50_input'], tf.int32), [-1, 50])[:6, :], shape=[6, 50]),
tf.reshape(parse_data['is_click'], shape=[1])]
dataset = reader.map(_parse_example, num_parallel_calls=11) # Parsing data
dataset = dataset.prefetch(buffer_size=batch_size)
batch = dataset.batch(batch_size=batch_size)
return batch
# Data preprocessing
data_file = "/openvino_notebooks/open_model_zoo_models/custom/eval_processed_data.tfrecords"
batch_size = 100
inputs_list = ['dense_input', 'sparse_ids_input', 'sparse_wgt_input', 'seq_50_input']
total_samples = sum(1 for _ in tf.compat.v1.python_io.tf_record_iterator(data_file))
n = math.ceil(float(total_samples) / batch_size)
data = []
with tf.compat.v1.Session() as sess:
dataset = input_fn_tfrecord(data_file, 100)
dataset_iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)
next_element = dataset_iterator.get_next()
next_element = sess.run(next_element)
for i in range(n):
records = {
'dense_input': next_element[0],
'sparse_ids_input': next_element[1],
'sparse_wgt_input': next_element[2],
'seq_50_input': next_element[3],
'label': next_element[4],
}
data.append(records)
class OriginModelDataLoader(DataLoader):
def __init__(self, data_list):
"""dataloader generator
Args:
data_location (str): tf recorder local path
batch_size (int): dataloader batch size
"""
self.data_list = data_list
def __getitem__(self, index):
if index >= len(self.data_list):
raise IndexError("Index out of dataset size")
current_item = self.data_list[index]
label = self.data_list[index]['label']
feat_names = {'dense_input', 'sparse_ids_input', 'sparse_wgt_input', 'seq_50_input'}
p2 = {key: value for key, value in current_item.items() if key in feat_names}
return ((index, label), p2)
def __len__(self):
return len(self.data_list)
# Perform model quantification
from openvino.tools.pot import IEEngine
import addict
from openvino.tools.pot import load_model,save_model
from openvino.tools.pot import compress_model_weights
from openvino.tools.pot import create_pipeline
from compression.api import DataLoader, Metric
path_to_xml = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.xml"
path_to_bin = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.bin"
data_file = "/openvino_notebooks/open_model_zoo_models/custom/eval_processed_data.tfrecords"
batch_size = 512
# Model config specifies the model name and paths to model .xml and .bin file
model_config = addict.Dict(
{
"model_name": "origin_model",
"model": path_to_xml,
"weights": path_to_bin,
}
)
# Engine config
engine_config = addict.Dict({"device": "CPU"})
algorithms = [
{
"name": "AccuracyAwareQuantization",
"params": {
"target_device": "CPU",
"stat_subset_size": 300,
"maximal_drop": 0.001, # The loss of accuracy shall not exceed 0.001
},
}
]
# Step 1: implement and create user's data loader
data_loader = OriginModelDataLoader(data)
# Step 2: load model
ir_model = load_model(model_config=model_config)
metric = Accuracy()
# Step 3: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)
# Step 4: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
algorithm_name = pipeline.algo_seq[0].name
with yaspin(
text=f"Executing POT pipeline on {model_config['model']} with {algorithm_name}"
) as sp:
start_time = time.perf_counter()
compressed_model = pipeline.run(ir_model)
end_time = time.perf_counter()
sp.ok("")
print(f"Quantization finished in {end_time - start_time:.2f} seconds")
# Step 5 (Optional): Compress model weights to quantized precision
# in order to reduce the size of the final .bin file.
compress_model_weights(compressed_model)
# Step 6: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved
compressed_model_paths = save_model(
model=compressed_model,
save_path="optimized_model",
model_name="optimized_model",
)
# Step 7 (Optional): Evaluate the compressed model. Print the results.
metric_results = pipeline.evaluate(compressed_model)
original_metric_results = pipeline.evaluate(ir_model)
if original_metric_results:
print(f"Accuracy of the original model: {next(iter(original_metric_results.values())):.5f}")
quantized_metric_results = pipeline.evaluate(compressed_model)
if quantized_metric_results:
print(f"Accuracy of the quantized model: {next(iter(quantized_metric_results.values())):.5f}")Comparison test before and after optimization
# Compare the model size before and after optimization
ir_path = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.xml"
quantized_model_path = "/openvino_notebooks/notebooks/002-openvino-api/optimized_model/optimized_model.xml"
original_model_size = Path(ir_path).with_suffix(".bin").stat().st_size / 1024
quantized_model_size = Path(quantized_model_path).with_suffix(".bin").stat().st_size / 1024
compression_ratio = (34231.60-12384.25)/342.3160
print(f"FP32 model size: {original_model_size:.2f} KB")
print(f"INT8 model size: {quantized_model_size:.2f} KB")
print(f"Compression ratio : {compression_ratio:.4f}%")
# Model performance comparison , benchmark_app yes openvion Official performance testing tools
#!benchmark_app --help
model_name = "quantized_model"
benchmark_command = f"benchmark_app -m {quantized_model_path} -t 15 -d CPU -api async -hint latency"
display(Markdown(f"Benchmark command: `{benchmark_command}`"))
display(Markdown(f"Benchmarking {model_name} on CPU with async inference for 15 seconds..."))
! $benchmark_command
#!benchmark_app --help
model_path = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.xml"
model_name = "origin_model"
benchmark_command = f"benchmark_app -m {model_path} -t 15 -hint latency "
display(Markdown(f"Benchmark command: `{benchmark_command}`"))
display(Markdown(f"Benchmarking {model_name} on CPU with async inference for 15 seconds..."))
! $benchmark_commandFour . The experimental conclusion
The model name | size | QPS |
|---|---|---|
origin_model |
| 88.93 |
quantiztion model |
| 105.58 |
Optimization ratio | Less | 18.72% |
By observing the log during the conversion , It is found that the model structure is simple and compact , Features are also very sparse , There are not many nodes that can be fused and quantized by operators during conversion , Therefore, the performance improvement is not particularly obvious .
边栏推荐
- 62. different paths
- Flowable global listener monitors the start and end of a process
- 打新债到底靠不靠谱呀?是不是安全的?
- Get the direction of mouse movement
- It's still like this
- OOP multiple storage (class template)
- "Hearing" marketing value highlights, Himalaya ushers in a new situation
- Tidb monitoring upgrade: a long way to solve panic
- New progress in the construction of meituan's Flink based real-time data warehouse platform
- What financial product does the new bond belong to?
猜你喜欢

What is the storage structure and mode of data in the database?

E-R图

I've been outsourcing for four years, but I feel it's useless

Typecho imite le modèle de thème du blog Lu songsongsong / modèle de thème du blog d'information technologique

Typecho imitation of Lu Songsong's blog theme template / Technology Information blog theme template

SFOD:无源域适配升级优化,让检测模型更容易适应新数据

LeetCode 206. 反转链表(迭代+递归)

Webdriver and selenium Usage Summary

Cadence spb17.4 - Allegro - optimize and specify the polyline connection angle of a single electrical line - polyline to arc

SAP mm transaction code vl04 create outbound delivery for sto
随机推荐
Similar to attention NLP
Is it safe for Hongyuan futures to open an account? Can Hongyuan futures company reduce the handling fee?
打新债属于什么理财产品?
JS prevent the PC side from copying correct links
Js--- SVG to png
Node fetch download file
New progress in the construction of meituan's Flink based real-time data warehouse platform
Swiftui swift tutorial 14 useful array operators
Charles garbled code problem solving
LeetCode刷题——715. Range 模块
Cadence spb17.4 - Allegro - optimize and specify the polyline connection angle of a single electrical line - polyline to arc
Tidb monitoring upgrade: a long way to solve panic
Ansible learning summary (7) -- ansible state management related knowledge summary
Typecho imitation of Lu Songsong's blog theme template / Technology Information blog theme template
OOP multiple storage (class template)
【滑动窗口】leetcode992. Subarrays with K Different Integers
Vector 2 (friend and copy construction)
The longest child sequence of the 2019 Blue Bridge Cup
中国国际期货有限公司怎么样,是正规的期货公司吗?网上开户安全吗?
Random decoding NLP