当前位置:网站首页>Tflite model transformation and quantification
Tflite model transformation and quantification
2022-07-07 03:56:00 【Luchang-Li】
ref
https://www.tensorflow.org/lite/convert?hl=zh-cn
if you're using TF2 then the following will work for you to post quantized the .pb file.
import tensorflow as tf
graph_def_path = "resnet50_v1.pb"
tflite_model_path = "resnet50_v1.tflite"
input_arrays = ["input_tensor"]
input_shapes = {"input_tensor" :[1,224,224,3]}
output_arrays = ["softmax_tensor", "ArgMax"]
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(graph_def_path, input_arrays, output_arrays, input_shapes)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
with open(tflite_model_path, "wb") as f:
f.write(tflite_model)
incase if you want full int8 quantization then
import tensorflow as tf
from google.protobuf import text_format
graph_def_path = "resnet50_v1.pb"
tflite_model_path = "resnet50_v1_quant.tflite"
input_arrays = ["input_tensor"]
input_shapes = {"input_tensor" :[1,224,224,3]}
output_arrays = ["softmax_tensor", "ArgMax"]
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(graph_def_path, input_arrays, output_arrays, input_shapes)
tflite_model = converter.convert()
converter.optimizations = [tf.lite.Optimize.DEFAULT]
image_shape = [1,224,224,3]
def representative_dataset_gen():
for i in range(10):
# creating fake images
image = tf.random.normal(image_shape)
yield [image]
# IMAGE_MEAN = 127.5;
# IMAGE_STD = 127.5;
# def norm_image(img):
# img_f = (np.array(img, dtype="float32")- IMAGE_MEAN) / IMAGE_STD;
# return img_f
# def read_image(img_name):
# img = cv2.imread(img_name)
# tgt_shape = [299,299]
# img = cv2.resize(img, tgt_shape).reshape([1,299,299,3])
# return img
# def representative_dataset_gen():
# img_folder="resnet_v2_101/test_imgs/"
# img_names = glob.glob(img_folder+"*.jpg")
# for img_name in img_names:
# img = read_image(img_name)
# img = norm_image(img)
# yield [img]
converter.representative_dataset = tf.lite.RepresentativeDataset(representative_dataset_gen)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_model = converter.convert()
with open(tflite_model_path, "wb") as f:
f.write(tflite_model)
FP16 quantitative :
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()
float16 The advantages of quantification are as follows :
Reduce the size of the model by half ( Because all weights become half their original size ).
Achieve minimum accuracy loss .
Support can be directly to float16 Partial delegation of data to perform operations ( for example GPU entrust ), So that the execution speed is faster than float32 Faster calculation .
float16 The disadvantages of quantification are as follows :
It does not reduce the delay as much as quantifying fixed-point mathematics .
By default ,float16 The quantitative model is CPU When running on, the weight value will be “ Inverse quantization ” by float32.( Please note that ,GPU Delegates do not perform this dequantization , Because it can be good for float16 Data operation .)
have 8 Bit weighted 16 Bit activation
That is to say A18W8 quantitative
This is an experimental quantitative scheme . It is associated with “ Integer only ” Similar scheme , However, it will be quantified as 16 position , The weight will be quantified as 8 An integer , The deviation will be quantified as 64 An integer . This is further referred to as 16x8 quantitative .
The main advantage of this quantization is that it can significantly improve the accuracy , But it only slightly increases the size of the model .
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.representative_dataset = representative_dataset
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]
tflite_quant_model = converter.convert()
If some operators in the model do not support 16x8 quantitative , The model can still be quantified , However, unsupported operators are retained as floating-point operators . Allow this operation , The following options should be added to target_spec in .
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.representative_dataset = representative_dataset
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8, tf.lite.OpsSet.TFLITE_BUILTINS]
tflite_quant_model = converter.convert()
Examples of use cases that have improved accuracy through this quantization scheme include :* Super resolution 、* Audio signal processing ( Such as noise reduction and beamforming )、* Image denoising 、* Single image HDR The reconstruction .
The disadvantage of this quantification is :
- Due to the lack of optimized Kernel Implementation , The current inference speed is significantly faster than 8 Bit full integer slow .
- At present, it is not compatible with existing hardware acceleration TFLite entrust .
notes : This is an experimental function .
Can be in here Find the tutorial of the quantitative model .
Model accuracy
Because the weight is quantified after training , Therefore, it may cause a loss of accuracy , This is especially true for smaller networks .TensorFlow Lite Model repository It provides a fully quantitative model of pre training for a specific network . Be sure to check the accuracy of the quantitative model , To verify that any decline in accuracy is within an acceptable range . There are some tools to evaluate TensorFlow Lite Model accuracy .
in addition , If the accuracy drops too much , Please consider using Quantitative perception training . however , This requires modifications during model training to add pseudo quantization nodes , The post training quantization technology on this page uses the existing pre training model .
The representation of the quantized tensor
8 Bit quantization approximates the floating-point value obtained using the following formula .
The representation consists of two main parts :
from int8 The complement value represents the axis by axis ( That is, channel by channel ) Or tensor by tensor weight , The scope is [-127, 127], Zero equals 0.
from int8 The complement value represents the tensor by tensor activation / Input , The scope is [-128, 127], The zero point range is [-128, 127].
Details about the quantification scheme , Please refer to our Quantitative specification . For want to insert TensorFlow Lite The hardware supplier who entrusts the interface , We encourage you to implement the quantitative scheme described in this specification .
边栏推荐
- 一些常用软件相关
- Adaptive non European advertising retrieval system amcad
- Create commonly used shortcut icons at the top of the ad interface (menu bar)
- 2022年上半年HIT行业TOP50
- How to customize the shortcut key for latex to stop running
- Confirm the future development route! Digital economy, digital transformation, data This meeting is very important
- QT item table new column name setting requirement exercise (find the number and maximum value of the array disappear)
- SQL injection -day15
- Force buckle ----- path sum III
- ubuntu20安装redisjson记录
猜你喜欢
接口数据安全保证的10种方式
代码质量管理
SQL injection -day15
About Tolerance Intervals
R data analysis: how to predict Cox model and reproduce high score articles
10 ways of interface data security assurance
Ubuntu20 installation redisjson record
[security attack and Defense] how much do you know about serialization and deserialization?
史上最全学习率调整策略lr_scheduler
API data interface of A-share index component data
随机推荐
Open3D 网格滤波
哈夫曼树基本概念
MySQL的存储引擎
About Tolerance Intervals
HW notes (II)
SSL证书部署
华为小米互“抄作业”
如何自定义Latex停止运行的快捷键
21.(arcgis api for js篇)arcgis api for js矩形采集(SketchViewModel)
Basic concepts of Huffman tree
Tencent cloud native database tdsql-c was selected into the cloud native product catalog of the Academy of communications and communications
NoSQL之Redis配置与优化
Native MySQL
你心目中的数据分析 Top 1 选 Pandas 还是选 SQL?
19. (ArcGIS API for JS) ArcGIS API for JS line acquisition (sketchviewmodel)
Enter the rough outline of the URL question (continuously updated)
CMB's written test - quantitative relationship
Arduino droplet detection
机器学习笔记 - 使用机器学习进行鸟类物种分类
Index of MySQL