当前位置:网站首页>A summary of the quantification of deep network model
A summary of the quantification of deep network model
2022-06-30 01:31:00 【Martin の Blog】
Abstract
Because of work , We have worked on model quantification for half a year , Today, I want to make a small summary of the quantitative work over the past six months .
Here I will not explain the principle and calculation method of quantification , A lot of searches on the Internet .( It mainly promotes the quantification of commercial soup ~~)
Here is my summary of the quantitative work .( Share with you )
Why should models be quantified
At ordinary times, we are making models forward and backward When , Most of them are able to support 32bit Computing equipment .
When saving the model , What we often get is parameters , Its scope is also 32bit Floating point range of
Although the accuracy of the model is very high , But models are often very large , The speed of parameter calculation will also be slightly slow
And for the chip industry , Very few chips are equipped with 32bit Calculated ( What I know )
at present , Chip support 8bit and 16bit There are still more .
In order to make the chip more AI turn , That is to say, the model is 8bit perhaps 16bit Normal reasoning can also be carried out within the parameter range of , And maintain a certain accuracy .
So , Quantitative work is necessary .
The nature of quantification
In fact, the essence of quantification is :
It can reduce the loss of accuracy ( It can be understood as the precision loss of floating point number to fixed point number )
Fix the weights of floating-point models with continuous values or the tensor data flowing through the model , A process approximating a finite number of discrete values
Quantization is mainly based on data types with fewer digits, such as 8bit or 16bit To map 32bit Data range of
Pay attention to is , The inputs and outputs of the model are still floating point types ( Here is the process of inverse quantization after quantizing the parameters )
This reduces model size , The model memory consumption is small , And accelerate the reasoning speed of the model .&#
边栏推荐
猜你喜欢

cookie加密8

What is digital garbage? Follow the world's first AI artist to explore meta carbon Art

cookie加密9

R语言线性回归模型拟合诊断异常值分析家庭燃气消耗量和卡路里实例带自测题

Machinery -- nx2007 (UG) finite element analysis tutorial 2 -- assembly

【PyTorch实战】生成对抗网络GAN:生成动漫人物头像

Machinery -- nx2007 (UG) finite element analysis tutorial 1 -- simple object

Kubernetes core object overview details

【机器学习Q&A】余弦相似度、余弦距离、欧式距离以及机器学习中距离的含义

C语言 换个格式输出整数
随机推荐
c语言期末不挂科(上)
Cookie加密12
Cookie encryption 8
C语言 说反话
What is digital garbage? Follow the world's first AI artist to explore meta carbon Art
Mysql 监控
C语言 继续(3n+1)猜想
What should be paid attention to in the design and production of the Urban Planning Museum
81. search rotation sort array II
ES6 one line code for array de duplication
【PyTorch实战】生成对抗网络GAN:生成动漫人物头像
【论文写作】英文论文写作指南
Sklearn notes: make_ Blobs generate clustering data
How to seamlessly transition from traditional microservice framework to service grid ASM
Storage engine analysis
【二叉树】最大二叉树 II
Machine learning notes: time series decomposition STL
Cookie加密10
[Thesis Writing] English thesis writing guide
Cookie encryption 10