当前位置:网站首页>A summary of the quantification of deep network model
A summary of the quantification of deep network model
2022-06-30 01:31:00 【Martin の Blog】
Abstract
Because of work , We have worked on model quantification for half a year , Today, I want to make a small summary of the quantitative work over the past six months .
Here I will not explain the principle and calculation method of quantification , A lot of searches on the Internet .( It mainly promotes the quantification of commercial soup ~~)
Here is my summary of the quantitative work .( Share with you )
Why should models be quantified
At ordinary times, we are making models forward and backward When , Most of them are able to support 32bit Computing equipment .
When saving the model , What we often get is parameters , Its scope is also 32bit Floating point range of
Although the accuracy of the model is very high , But models are often very large , The speed of parameter calculation will also be slightly slow
And for the chip industry , Very few chips are equipped with 32bit Calculated ( What I know )
at present , Chip support 8bit and 16bit There are still more .
In order to make the chip more AI turn , That is to say, the model is 8bit perhaps 16bit Normal reasoning can also be carried out within the parameter range of , And maintain a certain accuracy .
So , Quantitative work is necessary .
The nature of quantification
In fact, the essence of quantification is :
It can reduce the loss of accuracy ( It can be understood as the precision loss of floating point number to fixed point number )
Fix the weights of floating-point models with continuous values or the tensor data flowing through the model , A process approximating a finite number of discrete values
Quantization is mainly based on data types with fewer digits, such as 8bit or 16bit To map 32bit Data range of
Pay attention to is , The inputs and outputs of the model are still floating point types ( Here is the process of inverse quantization after quantizing the parameters )
This reduces model size , The model memory consumption is small , And accelerate the reasoning speed of the model .&#
边栏推荐
- Varnish 基础概览1
- Cookie encryption 13
- How does webapi relate to the database of MS SQL?
- Varnish 基础概览5
- js逆向请求参数加密:
- Interview summary
- The Web3 era is coming? Inventory of five Web3 representative projects | footprint analytics
- [mrctf2020]ezpop-1 | PHP serialization
- Varnish 基础概览8
- Understanding of int argc, char * * argv in C language main function
猜你喜欢

ES6 one line code for array de duplication

C语言 换个格式输出整数

第八届“互联网+”大赛 | 云原生赛道邀你来挑战

C语言 说反话

Cookie encryption 11

81. search rotation sort array II

Cookie encryption 13

Sklearn notes: make_ Blobs generate clustering data
![[Thesis Writing] English thesis writing guide](/img/59/88d34814a88a2da19ed6a236825649.png)
[Thesis Writing] English thesis writing guide

Precautions for postoperative fundus hemorrhage / / must see every day
随机推荐
Varnish foundation overview 3
What is digital garbage? Follow the world's first AI artist to explore meta carbon Art
Varnish 基础概览3
Ctfshow competition original title 680-695
Interface Association of postman
【推荐系统】基于用户的协同过滤简明原理与代码实现
MySQL monitoring 1
【图神经网络】图分类学习研究综述[3]:图分类方法评价及未来研究方向
81. search rotation sort array II
Varnish 基础概览6
对深度网络模型量化工作的总结
OpenCV和Image之间的转换(亲测有效)
MySQL monitoring 3
Cookie加密12
传统微服务框架如何无缝过渡到服务网格 ASM
Varnish foundation overview 4
c语言期末不挂科(上)
JS content confusion, return content encryption
Varnish 基础概览10
JS returned content is encoded by Unicode