当前位置:网站首页>I-BERT
I-BERT
2022-07-06 08:57:00 【cyz0202】
background
In this paper, ICML2021 I-BERT: Integer-only BERT Quantization
The purpose of this article is to BERT Perform more thorough quantization and integer calculations ;
The author believes that the previous quantitative scheme is not right gelu、softmax These nonlinear operations are quantified ( Here's the picture 1), That is to keep float Type of calculation , Not only affects the computational efficiency , And it cannot be deployed on some chips that only support integer Computing ;

The quantitative scheme adopted by the author is 8bits Symmetric quantization ;
Existing schemes and deficiencies
The author mainly solves GELU、softmax The quantization problem of these two kinds of nonlinear layers ;
First look at it. GELU The expression of , as follows ,erf go by the name of error function
![\\GELU(x) = x*\frac{1}{2}[1+erf(\frac{x}{\sqrt2})]](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_31.gif)
among ![\\ erf=\frac{2}{\sqrt\pi}\int^x_0e^{-t^2}dt \in [-1, 1] \\](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_19.gif)
And 
GELU It is difficult to quantify directly , Forced quantization will lead to a large loss of accuracy ;
Unlike linear layers ( Such as matrix product 、 Piecewise linear RELU etc. ), The linear property can be used to inverse quantize to float The result of the calculation is ( The author gives an example MatMul(Sq) = S*MatMul(q), among x=Sq,S by scale,q by x Quantized value of );
Some existing approximations GELU The plan , Include :
- sigmoid The approximate , as follows , Introduce nonlinearity sigmoid, It's still not good for integer calculation

- ReLU6 The approximate , as follows , Use ReLU6, Although it can be integer , But it didn't work ; The program is also known as h-GELU

The figure below 2 The picture on the left shows h-GELU The shortcomings of

GELU Solutions for
By analyzing , It is considered that second-order polynomial pairs can be introduced erf Make an approximation , Further to GELU Make an approximation , The calculation method is as follows
![\\\underset{a,b,c}{min}\frac{1}{2}||GELU(x) - x*\frac{1}{2}[1+L(\frac{x}{\sqrt2})]||^2_2\\ s.t. \space\ \space\ \space\ L(x) = a(x+b)^2 + c](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_7.gif)
This idea comes from the theory that any function can be fitted by polynomial function , This type of polynomial is called interpolating polynomials( Interpolation polynomial ); For details, please move to the original ;
The result obtained by directly optimizing the above formula is not ideal , as a result of erf The definition domain of is a real domain scope ;
in consideration of erf The value range of is [-1, 1], And erf It's an odd function , namely

Therefore, the author designs the positive real number field part , And extended to negative real number field , Get the following L(x),
, among

clip Medium max Express |x| The maximum value is -b;
therefore
, And it is an odd function ;
a、b By looking for some GELU To solve the fitting problem ;
As can be seen from the above ,
![i-GELU(x) := x*\frac{1}{2}[1 + L(\frac{x}{\sqrt2})]](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_18.gif)
i-GELU Quantitative scheme of
With GELU Polynomial expression of , You can start designing quantitative solutions ;
L(x) It's a polynomial , So you have to know how to quantize polynomials first ;
The author gives a polynomial Quantization Algorithm I-POLY, as follows

Can verify
,
So arbitrary 2 Quantization of order polynomials 、 The above algorithm can be used for inverse quantization ;
( notes : I feel that the quantification here belongs to a kind of quantification for calculating quantification ; The calculation process is ok , The feeling is deliberately constructed ,q_out and S_out Are not necessarily the real quantized values of polynomial results and scale)
------
With the polynomial quantization method , You can continue to realize I-GELU The quantitative scheme of , The calculation process is as follows

The call stack is I-GELU -> I-ERF -> I-POLY
Pay attention to the picture 4 Some implementation tips in the algorithm , Such as
,

Notice the above formula max=-b/S, It may have to be changed to max=round(-b/S), Otherwise q’ There is no guarantee that it is integer ...
------
The above is the I-GELU Implementation process , The effect is as follows

SOFTMAX Solutions for
- Use higher-order polynomials for approximation , Available scenarios are limited ;
SOFTMAX Quantitative scheme of
For numerical stability , The author first gives a brief introduction to softmax To deal with , as follows


It is worth mentioning that ,
For a non positive real number
, It can be approximated by the following formula

among z( merchant ) Is a non negative integer ,p( remainder ) Value range
;
Then there are

Upper form >> Indicates the right shift operation ;
further , If you can
Expressed as integer calculation , Then it can be used for all
as well as Softmax Perform integer calculation ;
and
in p Value range of relative x perhaps
Much smaller , It can be approximated better ;
To recall GELU, The author proposes to adopt 2 Order polynomial approximates nonlinear function ; You can do the same here ;
Author search
The method of approximating second-order polynomials , It is through
Calculate the optimal solution of the following formula in the range :

The resulting

be

among
,
chart 2 The figure on the right shows that the above approximation has a good effect ;
Quantitative calculation method of polynomials I-POLY It has been introduced above , So the whole thing Softmax The quantitative calculation method of is

Basic ideas and I-GELU almost
#TODO#: The last step
There seems to be a problem ...
LayerNorm Quantitative scheme of
- To be continued
I-BERT Analysis of the implementation of
- Will be discussed in another article
summary
- This paper introduces I-BERT Improvement points and GELU/SOFTMAX Integer calculation of Implementation method ;
- The main idea is through 2 Order polynomial approximation , Right again 2 Order polynomial for quantitative calculation ;
边栏推荐
猜你喜欢

Promise 在uniapp的简单使用

Excellent software testers have these abilities

Detailed explanation of dynamic planning

TP-LINK enterprise router PPTP configuration

The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower

Warning in install. packages : package ‘RGtk2’ is not available for this version of R

使用latex导出IEEE文献格式

ROS compilation calls the third-party dynamic library (xxx.so)

IJCAI2022论文合集(持续更新中)

Intel Distiller工具包-量化实现1
随机推荐
MongoDB 的安装和基本操作
Implement window blocking on QWidget
Digital people anchor 618 sign language with goods, convenient for 27.8 million people with hearing impairment
LeetCode:836. Rectangle overlap
[embedded] cortex m4f DSP Library
Current situation and trend of character animation
LeetCode:34. 在排序数组中查找元素的第一个和最后一个位置
LeetCode:124. Maximum path sum in binary tree
CUDA实现focal_loss
Leetcode: Sword Finger offer 42. Somme maximale des sous - tableaux consécutifs
[today in history] February 13: the father of transistors was born The 20th anniversary of net; Agile software development manifesto was born
R language ggplot2 visualization: place the title of the visualization image in the upper left corner of the image (customize Title position in top left of ggplot2 graph)
Excellent software testers have these abilities
注意力机制的一种卷积替代方式
Problems encountered in connecting the database of the project and their solutions
CUDA implementation of self defined convolution attention operator
LeetCode:387. 字符串中的第一个唯一字符
Light of domestic games destroyed by cracking
LeetCode:162. 寻找峰值
ROS compilation calls the third-party dynamic library (xxx.so)