当前位置:网站首页>I-BERT
I-BERT
2022-07-06 08:57:00 【cyz0202】
background
In this paper, ICML2021 I-BERT: Integer-only BERT Quantization
The purpose of this article is to BERT Perform more thorough quantization and integer calculations ;
The author believes that the previous quantitative scheme is not right gelu、softmax These nonlinear operations are quantified ( Here's the picture 1), That is to keep float Type of calculation , Not only affects the computational efficiency , And it cannot be deployed on some chips that only support integer Computing ;

The quantitative scheme adopted by the author is 8bits Symmetric quantization ;
Existing schemes and deficiencies
The author mainly solves GELU、softmax The quantization problem of these two kinds of nonlinear layers ;
First look at it. GELU The expression of , as follows ,erf go by the name of error function
![\\GELU(x) = x*\frac{1}{2}[1+erf(\frac{x}{\sqrt2})]](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_31.gif)
among ![\\ erf=\frac{2}{\sqrt\pi}\int^x_0e^{-t^2}dt \in [-1, 1] \\](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_19.gif)
And 
GELU It is difficult to quantify directly , Forced quantization will lead to a large loss of accuracy ;
Unlike linear layers ( Such as matrix product 、 Piecewise linear RELU etc. ), The linear property can be used to inverse quantize to float The result of the calculation is ( The author gives an example MatMul(Sq) = S*MatMul(q), among x=Sq,S by scale,q by x Quantized value of );
Some existing approximations GELU The plan , Include :
- sigmoid The approximate , as follows , Introduce nonlinearity sigmoid, It's still not good for integer calculation

- ReLU6 The approximate , as follows , Use ReLU6, Although it can be integer , But it didn't work ; The program is also known as h-GELU

The figure below 2 The picture on the left shows h-GELU The shortcomings of

GELU Solutions for
By analyzing , It is considered that second-order polynomial pairs can be introduced erf Make an approximation , Further to GELU Make an approximation , The calculation method is as follows
![\\\underset{a,b,c}{min}\frac{1}{2}||GELU(x) - x*\frac{1}{2}[1+L(\frac{x}{\sqrt2})]||^2_2\\ s.t. \space\ \space\ \space\ L(x) = a(x+b)^2 + c](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_7.gif)
This idea comes from the theory that any function can be fitted by polynomial function , This type of polynomial is called interpolating polynomials( Interpolation polynomial ); For details, please move to the original ;
The result obtained by directly optimizing the above formula is not ideal , as a result of erf The definition domain of is a real domain scope ;
in consideration of erf The value range of is [-1, 1], And erf It's an odd function , namely

Therefore, the author designs the positive real number field part , And extended to negative real number field , Get the following L(x),
, among

clip Medium max Express |x| The maximum value is -b;
therefore
, And it is an odd function ;
a、b By looking for some GELU To solve the fitting problem ;
As can be seen from the above ,
![i-GELU(x) := x*\frac{1}{2}[1 + L(\frac{x}{\sqrt2})]](http://img.inotgo.com/imagesLocal/202207/06/202207060850360827_18.gif)
i-GELU Quantitative scheme of
With GELU Polynomial expression of , You can start designing quantitative solutions ;
L(x) It's a polynomial , So you have to know how to quantize polynomials first ;
The author gives a polynomial Quantization Algorithm I-POLY, as follows

Can verify
,
So arbitrary 2 Quantization of order polynomials 、 The above algorithm can be used for inverse quantization ;
( notes : I feel that the quantification here belongs to a kind of quantification for calculating quantification ; The calculation process is ok , The feeling is deliberately constructed ,q_out and S_out Are not necessarily the real quantized values of polynomial results and scale)
------
With the polynomial quantization method , You can continue to realize I-GELU The quantitative scheme of , The calculation process is as follows

The call stack is I-GELU -> I-ERF -> I-POLY
Pay attention to the picture 4 Some implementation tips in the algorithm , Such as
,

Notice the above formula max=-b/S, It may have to be changed to max=round(-b/S), Otherwise q’ There is no guarantee that it is integer ...
------
The above is the I-GELU Implementation process , The effect is as follows

SOFTMAX Solutions for
- Use higher-order polynomials for approximation , Available scenarios are limited ;
SOFTMAX Quantitative scheme of
For numerical stability , The author first gives a brief introduction to softmax To deal with , as follows


It is worth mentioning that ,
For a non positive real number
, It can be approximated by the following formula

among z( merchant ) Is a non negative integer ,p( remainder ) Value range
;
Then there are

Upper form >> Indicates the right shift operation ;
further , If you can
Expressed as integer calculation , Then it can be used for all
as well as Softmax Perform integer calculation ;
and
in p Value range of relative x perhaps
Much smaller , It can be approximated better ;
To recall GELU, The author proposes to adopt 2 Order polynomial approximates nonlinear function ; You can do the same here ;
Author search
The method of approximating second-order polynomials , It is through
Calculate the optimal solution of the following formula in the range :

The resulting

be

among
,
chart 2 The figure on the right shows that the above approximation has a good effect ;
Quantitative calculation method of polynomials I-POLY It has been introduced above , So the whole thing Softmax The quantitative calculation method of is

Basic ideas and I-GELU almost
#TODO#: The last step
There seems to be a problem ...
LayerNorm Quantitative scheme of
- To be continued
I-BERT Analysis of the implementation of
- Will be discussed in another article
summary
- This paper introduces I-BERT Improvement points and GELU/SOFTMAX Integer calculation of Implementation method ;
- The main idea is through 2 Order polynomial approximation , Right again 2 Order polynomial for quantitative calculation ;
边栏推荐
- MYSQL卸载方法与安装方法
- Nacos 的安装与服务的注册
- Leetcode: Jianzhi offer 03 Duplicate numbers in array
- LeetCode:41. 缺失的第一个正数
- LeetCode:34. 在排序数组中查找元素的第一个和最后一个位置
- Roguelike game into crack the hardest hit areas, how to break the bureau?
- 可变长参数
- 多元聚类分析
- Fairguard game reinforcement: under the upsurge of game going to sea, game security is facing new challenges
- BMINF的后训练量化实现
猜你喜欢

Export IEEE document format using latex
![[MySQL] limit implements paging](/img/94/2e84a3878e10636460aa0fe0adef67.jpg)
[MySQL] limit implements paging

Simclr: comparative learning in NLP

Unsupported operation exception

LeetCode:221. 最大正方形

KDD 2022论文合集(持续更新中)

项目连接数据库遇到的问题及解决

Improved deep embedded clustering with local structure preservation (Idec)

Excellent software testers have these abilities
![[embedded] print log using JLINK RTT](/img/22/c37f6e0f3fb76bab48a9a5a3bb3fe5.png)
[embedded] print log using JLINK RTT
随机推荐
Intel Distiller工具包-量化实现2
[Hacker News Weekly] data visualization artifact; Top 10 Web hacker technologies; Postman supports grpc
如何正确截取字符串(例:应用报错信息截取入库操作)
Revit secondary development Hof method calls transaction
Simclr: comparative learning in NLP
BN折叠及其量化
[OC-Foundation框架]--<Copy对象复制>
Delay initialization and sealing classes
R language ggplot2 visualization: place the title of the visualization image in the upper left corner of the image (customize Title position in top left of ggplot2 graph)
LeetCode:剑指 Offer 48. 最长不含重复字符的子字符串
Tdengine biweekly selection of community issues | phase III
I-BERT
Deep anatomy of C language -- C language keywords
Nacos 的安装与服务的注册
Intel Distiller工具包-量化实现1
MYSQL卸载方法与安装方法
Using pkgbuild:: find in R language_ Rtools check whether rtools is available and use sys The which function checks whether make exists, installs it if not, and binds R and rtools with the writelines
Leetcode: Jianzhi offer 04 Search in two-dimensional array
Navicat Premium 创建MySql 创建存储过程
Target detection - pytorch uses mobilenet series (V1, V2, V3) to build yolov4 target detection platform