当前位置:网站首页>I-BERT
I-BERT
2022-07-06 08:57:00 【cyz0202】
background
In this paper, ICML2021 I-BERT: Integer-only BERT Quantization
The purpose of this article is to BERT Perform more thorough quantization and integer calculations ;
The author believes that the previous quantitative scheme is not right gelu、softmax These nonlinear operations are quantified ( Here's the picture 1), That is to keep float Type of calculation , Not only affects the computational efficiency , And it cannot be deployed on some chips that only support integer Computing ;
The quantitative scheme adopted by the author is 8bits Symmetric quantization ;
Existing schemes and deficiencies
The author mainly solves GELU、softmax The quantization problem of these two kinds of nonlinear layers ;
First look at it. GELU The expression of , as follows ,erf go by the name of error function
among
And
GELU It is difficult to quantify directly , Forced quantization will lead to a large loss of accuracy ;
Unlike linear layers ( Such as matrix product 、 Piecewise linear RELU etc. ), The linear property can be used to inverse quantize to float The result of the calculation is ( The author gives an example MatMul(Sq) = S*MatMul(q), among x=Sq,S by scale,q by x Quantized value of );
Some existing approximations GELU The plan , Include :
- sigmoid The approximate , as follows , Introduce nonlinearity sigmoid, It's still not good for integer calculation
- ReLU6 The approximate , as follows , Use ReLU6, Although it can be integer , But it didn't work ; The program is also known as h-GELU
The figure below 2 The picture on the left shows h-GELU The shortcomings of
GELU Solutions for
By analyzing , It is considered that second-order polynomial pairs can be introduced erf Make an approximation , Further to GELU Make an approximation , The calculation method is as follows
This idea comes from the theory that any function can be fitted by polynomial function , This type of polynomial is called interpolating polynomials( Interpolation polynomial ); For details, please move to the original ;
The result obtained by directly optimizing the above formula is not ideal , as a result of erf The definition domain of is a real domain scope ;
in consideration of erf The value range of is [-1, 1], And erf It's an odd function , namely
Therefore, the author designs the positive real number field part , And extended to negative real number field , Get the following L(x),
, among
clip Medium max Express |x| The maximum value is -b;
therefore , And it is an odd function ;
a、b By looking for some GELU To solve the fitting problem ;
As can be seen from the above ,
i-GELU Quantitative scheme of
With GELU Polynomial expression of , You can start designing quantitative solutions ;
L(x) It's a polynomial , So you have to know how to quantize polynomials first ;
The author gives a polynomial Quantization Algorithm I-POLY, as follows
Can verify ,
So arbitrary 2 Quantization of order polynomials 、 The above algorithm can be used for inverse quantization ;
( notes : I feel that the quantification here belongs to a kind of quantification for calculating quantification ; The calculation process is ok , The feeling is deliberately constructed ,q_out and S_out Are not necessarily the real quantized values of polynomial results and scale)
------
With the polynomial quantization method , You can continue to realize I-GELU The quantitative scheme of , The calculation process is as follows
The call stack is I-GELU -> I-ERF -> I-POLY
Pay attention to the picture 4 Some implementation tips in the algorithm , Such as
,
Notice the above formula max=-b/S, It may have to be changed to max=round(-b/S), Otherwise q’ There is no guarantee that it is integer ...
------
The above is the I-GELU Implementation process , The effect is as follows
SOFTMAX Solutions for
- Use higher-order polynomials for approximation , Available scenarios are limited ;
SOFTMAX Quantitative scheme of
For numerical stability , The author first gives a brief introduction to softmax To deal with , as follows
It is worth mentioning that ,
For a non positive real number , It can be approximated by the following formula
among z( merchant ) Is a non negative integer ,p( remainder ) Value range ;
Then there are
Upper form >> Indicates the right shift operation ;
further , If you can Expressed as integer calculation , Then it can be used for all as well as Softmax Perform integer calculation ;
and in p Value range of relative x perhaps Much smaller , It can be approximated better ;
To recall GELU, The author proposes to adopt 2 Order polynomial approximates nonlinear function ; You can do the same here ;
Author search The method of approximating second-order polynomials , It is through Calculate the optimal solution of the following formula in the range :
The resulting
be
among ,
chart 2 The figure on the right shows that the above approximation has a good effect ;
Quantitative calculation method of polynomials I-POLY It has been introduced above , So the whole thing Softmax The quantitative calculation method of is
Basic ideas and I-GELU almost
#TODO#: The last step There seems to be a problem ...
LayerNorm Quantitative scheme of
- To be continued
I-BERT Analysis of the implementation of
- Will be discussed in another article
summary
- This paper introduces I-BERT Improvement points and GELU/SOFTMAX Integer calculation of Implementation method ;
- The main idea is through 2 Order polynomial approximation , Right again 2 Order polynomial for quantitative calculation ;
边栏推荐
- Roguelike game into crack the hardest hit areas, how to break the bureau?
- [sword finger offer] serialized binary tree
- LeetCode:387. 字符串中的第一个唯一字符
- Improved deep embedded clustering with local structure preservation (Idec)
- LeetCode:162. Looking for peak
- R language uses the principal function of psych package to perform principal component analysis on the specified data set. PCA performs data dimensionality reduction (input as correlation matrix), cus
- MongoDB 的安装和基本操作
- 数字人主播618手语带货,便捷2780万名听障人士
- BMINF的后训练量化实现
- [today in history] February 13: the father of transistors was born The 20th anniversary of net; Agile software development manifesto was born
猜你喜欢
Tcp/ip protocol
LeetCode41——First Missing Positive——hashing in place & swap
Excellent software testers have these abilities
Advanced Computer Network Review(4)——Congestion Control of MPTCP
【剑指offer】序列化二叉树
KDD 2022 paper collection (under continuous update)
JVM quick start
Roguelike game into crack the hardest hit areas, how to break the bureau?
Esp8266-rtos IOT development
【嵌入式】Cortex M4F DSP库
随机推荐
Swagger setting field required is mandatory
LeetCode:162. Looking for peak
[text generation] recommended in the collection of papers - Stanford researchers introduce time control methods to make long text generation more smooth
Target detection - pytorch uses mobilenet series (V1, V2, V3) to build yolov4 target detection platform
【剑指offer】序列化二叉树
LeetCode:剑指 Offer 03. 数组中重复的数字
在QWidget上实现窗口阻塞
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
【嵌入式】Cortex M4F DSP库
After reading the programmer's story, I can't help covering my chest...
LeetCode:124. Maximum path sum in binary tree
LeetCode:124. 二叉树中的最大路径和
Using C language to complete a simple calculator (function pointer array and callback function)
Fairguard game reinforcement: under the upsurge of game going to sea, game security is facing new challenges
Esp8266-rtos IOT development
Simclr: comparative learning in NLP
[NVIDIA development board] FAQ (updated from time to time)
多元聚类分析
Delay initialization and sealing classes
Leetcode: Jianzhi offer 03 Duplicate numbers in array