当前位置:网站首页>How to simply implement the quantization and compression of the model based on the OpenVINO POT tool
How to simply implement the quantization and compression of the model based on the OpenVINO POT tool
2022-08-05 01:58:00 【Intel Edge Computing Community】
biweeklyNotebook系列
See you guys again!
We will introduce a different theme for each course
供大家学习讨论~
I hope you can find the joy of deep learning application development
用OpenVINOComplete more fulfilling moments.
The subject of this course is
如何基于OpenVINO POTThe tool is simple to realize the quantization and compression of the model
I hope that the little friends will be in the collision of thinking in the course learning
能与Nono中找到共鸣~
一、课程准备
see the course content,You may haveNono一样,A question popped into my head involuntarily,什么是int8 量化?
Improves the inference performance of the model in general,一般有两种途径.The first way is when the model is running,Use multi-threading and other technologies to achieve parallel acceleration and improve inference performance.Another is to compress our model file,通过量化、剪枝、Distillation and other techniques to compress the model space volume,Achieve less consumption of computing resources.OpenVINO的Post-training Optimization ToolIt is such a quantitative tool,It can convert the parameters of the model,从fp42The high precision floating point is mapped toint8low-precision fixed-point,thereby compressing the volume of the model,Get better performance.
Today's lesson sharing theme is,based on how to usePOT工具的simplify模式,To achieve the quantitative operation of the model.
二、Preliminary Operation Example
首先,import some regular libraries,Set relevant environment variables.The environment variables here are mainly the data sets we use and the file paths of the models.,并以其命名.
Next, you need to prepare the data set for verification.Since the model after quantization,Accuracy is bound to decrease,So we need to introduce the method of verifying the data set,Pull the model accuracy back to the level of the original accuracy,To achieve our model compression capability without loss of accuracy or too much loss.在这一步,我们会用到pytorchBuilt-in interface to downloadCIFAR这样一个数据集,This data is basically used to do some classification tasks,is a very famous dataset template.
下载完成以后,它会被保存在cifar路径下,But our original dataset is alreadybatchAfter the data set,So, in order to further to use or to occupy it,We need to restore it to the original image form.Here we will use anotherpytorch自带的接口——Python ToPILTImage,restore it to a sheetrgb格式图片,我们来运行一下.
可以看到,under another path,This image data has been generated,That's this onePNG格式的图片.
然后,We need to prepare the original model file,并对其进行压缩.Here is another onepytorch常用的接口,也就是通过pytorch预训练模型,and export it as onnxFormat of the model to achieve our download of the model file.Here we use theresnet20such a classified network,You can see that the model has been downloaded.
紧接着,我们需要调用mo工具.也就是说,通过model optimizertool to convert it intoOpenVINO所支持的IR中间表达式.After completion, you can see the modelmodelThe path has been successfully generatedxml和.bin的文件,也就是OpenVINO的模型格式.
三、Compress and quantize the model
The next step is the most critical step,Compress and quantize the model.
由于pot工具支持api接口调用以及 cmdThere are two ways to call the command line,为了方便演示,So we show here如何通过cmdcall it from the command line.
第一步,We need to specify the path to the input model.In the selection of the quantization compression method adopted this time,由于simplifiedmode compared to other quantization modes,配置简单,Considering the convenience of beginners to get started quicklypot工具,We have adopted heresimplifiedmode to demonstrate it.
第二步,to define the path to the dataset used for validation,and the path to the final generated model.
第三步,我们来跑一下这个pot工具.Due to the need to traverse300多数据,So the running time will be a little longer.在等待过程中,Let's take a look at the originalfp32model size,可以看到它的xmlThat is, the file of the model extension valley structure is70KB,它的.binThat is, the weight file,是1mb左右.
▲点击可查看大图
Wait until the model is successfully quantified,大家可以发现,在、under the new directory,已经生成了compressed路径.compressedThe path is punctuated with a new folder calledoptimized,That is, we store the model path after quantization.
最后,我们看一下这个.bin文件的体积.可以看到int8The weights file for the model has been minified to274k,xmlThe file has not changed significantly.The overall model volume is compressed by nearly3/4,这也是 potPresentation of tool capabilities.
▲点击可查看大图
四、性能验证
The next step is the most critical step,Compress and quantify the model
After the model file is compressed,Will its performance change??接下来就让我们通过 benchmark app这个性能测试工具,Further comparison of model performance.
Let's specify the time of its loop,将其设置成15秒,我们用benchmark apptest it15run performance in seconds.The first command is to convert this rawfp32model to run its performance metrics,The second is that we through after the compressionint8Model to run its performance indicators.
可以看到 fp32The final throughput obtained by the model is approximately1000fps左右,int8The throughput of the model has nearly doubled,达到了1630fps,It has to be said that the performance improvement is still very intuitive..
▲点击可查看大图
After the performance is improved,Will the accuracy be affected??Next we do a simple experiment,把 int8The model of running the reasoning tasks,See if its final output is correct.
The first step is also routine,去定义core的对象,Prepare some environment variables and label data,Prepare some more image visualization methods.We need to put the label in the originallabel标签,Draw on our image for a comparison.
接下来,is to define each image to doinference的任务了.We will traverse them one by one300张图片,and then do it separatelyinference,接着会把inferenceThe label of the final result is also the output of the classification to it allappend到一个list里去, 这个listwe'll do a comparison.
好,我们首先来看一下,After we obtain the final inference result,We go to query the results of the reasoning in the first three chapters and our original three chapterslabel的一个比较.
显然,You can see that our reasoning in the first three chapters iscat,ship和ship.
Since we are in random order,So see the original label也是cat,ship,ship,It can be said that the final result is consistent with that of the original labeled image,This further proves that there is not much loss in model accuracy after quantization,In other words, its accuracy is always equal to the model accuracy before the original model quantization.
边栏推荐
- Activity Recommendation | Kuaishou StreamLake Brand Launch Conference, witness together on August 10!
- day14--postman interface test
- MySQL3
- 【Redis】Linux下Redis安装
- 使用OpenVINO实现飞桨版PGNet推理程序
- 第09章 性能分析工具的使用【2.索引及调优篇】【MySQL高级】
- 安装oracle11的时候为什么会报这个问题
- Day Fourteen & Postman
- Dotnet 6 Why does the network request not follow the change of the system network proxy and dynamically switch the proxy?
- 高数_复习_第1章:函数、极限、连续
猜你喜欢
第09章 性能分析工具的使用【2.索引及调优篇】【MySQL高级】
1349. 参加考试的最大学生数 状态压缩
程序员失眠时的数羊列表 | 每日趣闻
[Word] #() error occurs after Word formula is exported to PDF
释放技术创新引擎,英特尔携手生态合作伙伴推动智慧零售蓬勃发展
多线程涉及的其它知识(死锁(等待唤醒机制),内存可见性问题以及定时器)
【MySQL series】- Does LIKE query start with % will make the index invalid?
基于OpenVINO工具套件简单实现YOLOv7预训练模型的部署
LPQ(局部相位量化)学习笔记
手把手基于YOLOv5定制实现FacePose之《YOLO结构解读、YOLO数据格式转换、YOLO过程修改》
随机推荐
亚马逊云科技 + 英特尔 + 中科创达为行业客户构建 AIoT 平台
"Configuration" is a double-edged sword, it will take you to understand various configuration methods
.Net C# 控制台 使用 Win32 API 创建一个窗口
day14--postman接口测试
MySQL learning
【机器学习】21天挑战赛学习笔记(二)
方法重写与Object类
Leetcode brushing questions - 22. Bracket generation
短域名绕过及xss相关知识
Is DDOS attack really unsolvable?Do not!
std::string::find 返回值的坑
没有对象的程序员如何过七夕
1349. 参加考试的最大学生数 状态压缩
【翻译】CNCF对OpenTracing项目的存档
10年测试经验,在35岁的生理年龄面前,一文不值
MySQL学习
行业现状?互联网公司为什么宁愿花20k招人,也不愿涨薪留住老员工~
Opencv - video frame skipping processing
【Endnote】Word插入自定义形式的Endnote文献格式
【MySQL系列】- LIKE查询 以%开头一定会让索引失效吗