当前位置:网站首页>A summary of the quantification of deep network model
A summary of the quantification of deep network model
2022-06-30 01:31:00 【Martin の Blog】
Abstract
Because of work , We have worked on model quantification for half a year , Today, I want to make a small summary of the quantitative work over the past six months .
Here I will not explain the principle and calculation method of quantification , A lot of searches on the Internet .( It mainly promotes the quantification of commercial soup ~~)
Here is my summary of the quantitative work .( Share with you )
Why should models be quantified
At ordinary times, we are making models forward and backward When , Most of them are able to support 32bit Computing equipment .
When saving the model , What we often get is parameters , Its scope is also 32bit Floating point range of
Although the accuracy of the model is very high , But models are often very large , The speed of parameter calculation will also be slightly slow
And for the chip industry , Very few chips are equipped with 32bit Calculated ( What I know )
at present , Chip support 8bit and 16bit There are still more .
In order to make the chip more AI turn , That is to say, the model is 8bit perhaps 16bit Normal reasoning can also be carried out within the parameter range of , And maintain a certain accuracy .
So , Quantitative work is necessary .
The nature of quantification
In fact, the essence of quantification is :
It can reduce the loss of accuracy ( It can be understood as the precision loss of floating point number to fixed point number )
Fix the weights of floating-point models with continuous values or the tensor data flowing through the model , A process approximating a finite number of discrete values
Quantization is mainly based on data types with fewer digits, such as 8bit or 16bit To map 32bit Data range of
Pay attention to is , The inputs and outputs of the model are still floating point types ( Here is the process of inverse quantization after quantizing the parameters )
This reduces model size , The model memory consumption is small , And accelerate the reasoning speed of the model .&#
边栏推荐
- C语言 害死人不偿命的(3n+1)猜想
- Varnish foundation overview 5
- cookie加密8
- [535. encryption and decryption of tinyurl]
- Rubymine development tool, refactoring and intention operation
- Pytroch Learning Notes 6: NN network layer convolution layer
- c语言期末不挂科(上)
- What to remember about the penalty for deduction of points in Item 1
- JS reverse request parameter encryption:
- 【机器学习Q&A】数据抽样和模型验证方法、超参数调优以及过拟合和欠拟合问题
猜你喜欢
R语言线性回归模型拟合诊断异常值分析家庭燃气消耗量和卡路里实例带自测题
【推荐系统】基于用户的协同过滤简明原理与代码实现
How to seamlessly transition from traditional microservice framework to service grid ASM
Questions about database: database attachment
C语言 继续(3n+1)猜想
Application features and functions of painting Aquarium
What is digital garbage? Follow the world's first AI artist to explore meta carbon Art
Cookie encryption 13
Precautions for postoperative fundus hemorrhage / / must see every day
【论文写作】英文论文写作指南
随机推荐
Questions about database: database attachment
Cookie encryption 15 login encryption
C语言 素数对猜想
Interface Association of postman
一文读懂,MES管理系统模块功能
[machine learning Q & A] data sampling and model verification methods, hyperparametric optimization, over fitting and under fitting problems
Varnish 基础概览6
【二叉树】最大二叉树 II
Rubymine development tool, refactoring and intention operation
cookie加密8
C语言 数素数
Application features and functions of painting Aquarium
Machine learning notes: time series decomposition STL
Varnish foundation overview 2
js Array.from()的5个便捷应用
[Thesis Writing] English thesis writing guide
Varnish 基础概览4
The Web3 era is coming? Inventory of five Web3 representative projects | footprint analytics
Interview summary
Is the numpy index the same as the image index?