当前位置:网站首页>Openvino CPU acceleration survey

Openvino CPU acceleration survey

2022-06-23 01:13:00 Johns

Theoretical part

One . Introduce

OpenVINO It is used for optimization and deployment AI Open Source Toolkit for reasoning .

  • Improve computer vision 、 Automatic speech recognition 、 Deep learning performance in natural language processing and other common tasks
  • Use pass TensorFlow、PyTorch And other popular framework training models
  • Reduce resource requirements and expand the range of Intel... From the edge to the cloud Deploy efficiently on the platform
image.png

Training 、 Optimize 、 Deploy

image.png

Two . Optimization principle

  • Linear Operations Fusing( Operator fusion )
image.png
  • Precision Calibration( Accuracy calibration ) It actually refers to the model INT8 quantitative , You can also use it inter Of NNCF Perform other model compression operations

3、 ... and . OpenVINO Introduction to common tools

The actual part

One . Environmental preparation

#  Pull and start the container 
docker pull openvino/ubuntu18_dev:latest
docker run -itd  -p 8501:8501 -p 8500:8500 -p 8889:8889 -v "/root/openvino_notebooks:/openvino_notebooks" openvino/ubuntu18_dev:latest

#  Into the container 
docker exec -it -u root bc89fe5f98e6 /bin/bash

#  Pull case base 
git clone --depth=1 https://github.com/openvinotoolkit/openvino_notebooks.git

#  install jupyter
cd openvino_notebooks
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python3-venv build-essential python3-dev git-all
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m ipykernel install --user --name openvino_env

apt-get install vim
#  start-up jupyter
jupyter lab notebooks --allow-root

Two . Model transformation ( Use jupyter notebook)

import time
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown

# Construct the command for Model Optimizer
mo_command = f"""mo
                 --saved_model_dir "/openvino_notebooks/open_model_zoo_models/custom/origin_model"
                 --data_type FP32
                 --input dense_input,sparse_ids_input,sparse_wgt_input,seq_50_input
                 --input_shape [100,587],[100,53],[100,53],[100,6,50]
                 --output_dir "/openvino_notebooks/open_model_zoo_models/custom/fp32"
                 --output "Identity"
                 """
mo_command = " ".join(mo_command.split())
print("Model Optimizer command to convert TensorFlow to OpenVINO:")
display(Markdown(f"`{mo_command}`"))

! $mo_command

3、 ... and . Model quantification ( Use jupyter notebook) 

import os
from pathlib import Path
from openvino.tools.pot import DataLoader
import tensorflow as tf
import math
from yaspin import yaspin


#  from TFRecord Reading data 
def input_fn_tfrecord(filenames, batch_size=256):
    """make input fn for tfrecord file
    """
    reader = tf.data.TFRecordDataset(
        filenames,
        num_parallel_reads=10,
    ).shuffle(100000, reshuffle_each_iteration=True)

    features = {
        'dense_input': tf.io.FixedLenSequenceFeature([], tf.float32, allow_missing=True),
        'sparse_ids_input': tf.io.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
        'sparse_wgt_input': tf.io.FixedLenSequenceFeature([], tf.float32, allow_missing=True),
        'seq_50_input': tf.io.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
        'is_click': tf.io.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
    }

    def _parse_example(example):
        """
            parse data
        """
        parse_data = tf.io.parse_single_example(example, features)
        return [
            tf.reshape(parse_data['dense_input'][:587], shape=[587]),
            tf.reshape(tf.cast(parse_data["sparse_ids_input"], tf.int32), shape=[53]),
            tf.reshape(parse_data["sparse_wgt_input"], shape=[53]),
            tf.reshape(tf.reshape(tf.cast(parse_data['seq_50_input'], tf.int32), [-1, 50])[:6, :], shape=[6, 50]),
            tf.reshape(parse_data['is_click'], shape=[1])]

    dataset = reader.map(_parse_example, num_parallel_calls=11)  #  Parsing data 
    dataset = dataset.prefetch(buffer_size=batch_size)
    batch = dataset.batch(batch_size=batch_size)
    return batch

#  Data preprocessing 
data_file = "/openvino_notebooks/open_model_zoo_models/custom/eval_processed_data.tfrecords"
batch_size = 100
inputs_list = ['dense_input', 'sparse_ids_input', 'sparse_wgt_input', 'seq_50_input']
total_samples = sum(1 for _ in tf.compat.v1.python_io.tf_record_iterator(data_file))
n = math.ceil(float(total_samples) / batch_size)
data = []
with tf.compat.v1.Session() as sess:
    dataset = input_fn_tfrecord(data_file, 100)
    dataset_iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)
    next_element = dataset_iterator.get_next()
    next_element = sess.run(next_element)
    for i in range(n):
        records = {
            'dense_input': next_element[0],
            'sparse_ids_input': next_element[1],
            'sparse_wgt_input': next_element[2],
            'seq_50_input': next_element[3],
            'label': next_element[4],
        }
        data.append(records)


class OriginModelDataLoader(DataLoader):
    def __init__(self, data_list):
        """dataloader generator

        Args:
            data_location (str): tf recorder local path
            batch_size (int): dataloader batch size
        """
        self.data_list =  data_list

    def __getitem__(self, index):
        if index >= len(self.data_list):
            raise IndexError("Index out of dataset size")
        current_item = self.data_list[index]
        label = self.data_list[index]['label']
        feat_names = {'dense_input', 'sparse_ids_input', 'sparse_wgt_input', 'seq_50_input'}
        p2 = {key: value for key, value in current_item.items() if key in feat_names}
        return ((index, label), p2)

    def __len__(self):
        return len(self.data_list)

#  Perform model quantification 
from openvino.tools.pot import IEEngine
import addict
from openvino.tools.pot import load_model,save_model
from openvino.tools.pot import compress_model_weights
from openvino.tools.pot import create_pipeline
from compression.api import DataLoader, Metric

path_to_xml = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.xml"
path_to_bin = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.bin"
data_file = "/openvino_notebooks/open_model_zoo_models/custom/eval_processed_data.tfrecords"
batch_size = 512

# Model config specifies the model name and paths to model .xml and .bin file
model_config = addict.Dict(
    {
        "model_name": "origin_model",
        "model": path_to_xml,
        "weights": path_to_bin,
    }
)

# Engine config
engine_config = addict.Dict({"device": "CPU"})

algorithms = [
    {
        "name": "AccuracyAwareQuantization",
        "params": {
            "target_device": "CPU",
            "stat_subset_size": 300,
            "maximal_drop": 0.001, #  The loss of accuracy shall not exceed 0.001
        },
    }
]

# Step 1: implement and create user's data loader
data_loader = OriginModelDataLoader(data)

# Step 2: load model
ir_model = load_model(model_config=model_config)
metric = Accuracy()

# Step 3: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)

# Step 4: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
algorithm_name = pipeline.algo_seq[0].name
with yaspin(
    text=f"Executing POT pipeline on {model_config['model']} with {algorithm_name}"
) as sp:
    start_time = time.perf_counter()
    compressed_model = pipeline.run(ir_model)
    end_time = time.perf_counter()
    sp.ok("")
print(f"Quantization finished in {end_time - start_time:.2f} seconds")

# Step 5 (Optional): Compress model weights to quantized precision
#                    in order to reduce the size of the final .bin file.
compress_model_weights(compressed_model)

# Step 6: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved
compressed_model_paths = save_model(
    model=compressed_model,
    save_path="optimized_model",
    model_name="optimized_model",
)

# Step 7 (Optional): Evaluate the compressed model. Print the results.
metric_results = pipeline.evaluate(compressed_model)

original_metric_results = pipeline.evaluate(ir_model)
if original_metric_results:
    print(f"Accuracy of the original model:  {next(iter(original_metric_results.values())):.5f}")
    
quantized_metric_results = pipeline.evaluate(compressed_model)
if quantized_metric_results:
    print(f"Accuracy of the quantized model: {next(iter(quantized_metric_results.values())):.5f}")

Comparison test before and after optimization

#  Compare the model size before and after optimization 
ir_path = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.xml"
quantized_model_path = "/openvino_notebooks/notebooks/002-openvino-api/optimized_model/optimized_model.xml"
original_model_size = Path(ir_path).with_suffix(".bin").stat().st_size / 1024
quantized_model_size = Path(quantized_model_path).with_suffix(".bin").stat().st_size / 1024
compression_ratio = (34231.60-12384.25)/342.3160
print(f"FP32 model size: {original_model_size:.2f} KB")
print(f"INT8 model size: {quantized_model_size:.2f} KB")
print(f"Compression ratio : {compression_ratio:.4f}%")

#  Model performance comparison , benchmark_app yes openvion Official performance testing tools 
#!benchmark_app --help
model_name = "quantized_model"
benchmark_command = f"benchmark_app -m {quantized_model_path} -t 15 -d CPU -api async -hint latency"
display(Markdown(f"Benchmark command: `{benchmark_command}`"))
display(Markdown(f"Benchmarking {model_name} on CPU with async inference for 15 seconds..."))
! $benchmark_command

#!benchmark_app --help
model_path = "/openvino_notebooks/open_model_zoo_models/custom/fp32/saved_model.xml"
model_name = "origin_model"
benchmark_command = f"benchmark_app -m {model_path} -t 15 -hint latency "
display(Markdown(f"Benchmark command: `{benchmark_command}`"))
display(Markdown(f"Benchmarking {model_name} on CPU with async inference for 15 seconds..."))
! $benchmark_command

Four . The experimental conclusion

The model name

size

QPS

origin_model

34231.60 KB

88.93

quantiztion model

12384.25 KB

105.58

Optimization ratio

Less 63.8222%

18.72%

By observing the log during the conversion , It is found that the model structure is simple and compact , Features are also very sparse , There are not many nodes that can be fused and quantized by operators during conversion , Therefore, the performance improvement is not particularly obvious .

原网站

版权声明
本文为[Johns]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206221318269422.html