当前位置:网站首页>Intel distiller Toolkit - Quantitative implementation 3
Intel distiller Toolkit - Quantitative implementation 3
2022-07-06 08:57:00 【cyz0202】
This series of articles
Intel Distiller tool kit - Quantitative realization 1
Intel Distiller tool kit - Quantitative realization 2
Intel Distiller tool kit - Quantitative realization 3
review
- The above article introduces Distiller And Quantizer Base class , Post training quantizer ; Base classes define important variables , Such as replacement_factory(dict, Used to record to be quantified module Corresponding wrapper); In addition, the quantitative process is defined , Include Preprocessing (BN Fold , Activate optimization, etc )、 Quantization module replacement 、 post-processing And other main steps ; The post training quantizer realizes the function of post training quantization based on the base class ;
- This article continues to introduce inheritance from Quantizer Subclass quantizer of , Include
- PostTrainLinearQuantizer( above )
- QuantAwareTrainRangeLinearQuantizer( this paper )
- PACTQuantizer( follow-up )
- NCFQuantAwareTrainQuantizer( follow-up )
- There are many codes in this article , Because I can't post them all , Some places are not clear , Please also refer to the source code ;
QuantAwareTrainRangeLinearQuantizer
- Quantization perception training quantizer ; Insert the quantification process into the model code , Train the model ; This process makes the model parameters fit the quantization process , So the effect of the final model It's generally better than The post training quantitative model is better ;
- QuantAwareTrainRangeLinearQuantizer The class definition of is as follows : It can be seen that the definition of post training quantizer is much simpler ;
- Constructors : Check and default settings are all in the front ; The core is in the red box Yes Parameters 、 Activation value Set up Quantitative perception The way ;
- activation_replace_fn: This is the realization of quantitative perception of activation value , And the quantitative use of the previous post training Module replacement equally , I.e. return to a wrapper, Here is FakeQuantizationWrapper
- FakeQuantizationWrapper: The definition is as follows ,forward Input in the first pass through the original module Calculation ( Get the original activation output ), Then output ( next module The input of ) Do pseudo quantification (fake_q);
- FakeLinearQuantization: The definition is as follows , The module What I do is Pseudo quantize the input ; Details include The training process is determined Activate the range of values and update scale、zp(infer Then directly use the last of the training process scale、zp); Use LinearQuantizeSTE(straight-through-estimator) Realize pseudo quantization ;
class FakeLinearQuantization(nn.Module): def __init__(self, num_bits=8, mode=LinearQuantMode.SYMMETRIC, ema_decay=0.999, dequantize=True, inplace=False): """ :param num_bits: :param mode: :param ema_decay: The activation value range uses EMA Tracking :param dequantize: :param inplace: """ super(FakeLinearQuantization, self).__init__() self.num_bits = num_bits self.mode = mode self.dequantize = dequantize self.inplace = inplace # We track activations ranges with exponential moving average, as proposed by Jacob et al., 2017 # https://arxiv.org/abs/1712.05877( The activation value range uses EMA Tracking ) # We perform bias correction on the EMA, so we keep both unbiased and biased values and the iterations count # For a simple discussion of this see here: # https://www.coursera.org/lecture/deep-neural-network/bias-correction-in-exponentially-weighted-averages-XjuhD self.register_buffer('ema_decay', torch.tensor(ema_decay)) # Set up buffer,buffer For nonparametric storage , Will be stored in model state_dict self.register_buffer('tracked_min_biased', torch.zeros(1)) self.register_buffer('tracked_min', torch.zeros(1)) # Save unbiased values self.register_buffer('tracked_max_biased', torch.zeros(1)) # Save biased values self.register_buffer('tracked_max', torch.zeros(1)) self.register_buffer('iter_count', torch.zeros(1)) # Save iterations self.register_buffer('scale', torch.ones(1)) self.register_buffer('zero_point', torch.zeros(1)) def forward(self, input): # We update the tracked stats only in training # # Due to the way DataParallel works, we perform all updates in-place so the "main" device retains # its updates. (see https://pytorch.org/docs/stable/nn.html#dataparallel) # However, as it is now, the in-place update of iter_count causes an error when doing # back-prop with multiple GPUs, claiming a variable required for gradient calculation has been modified # in-place. Not clear why, since it's not used in any calculations that keep a gradient. # It works fine with a single GPU. TODO: Debug... if self.training: # Receipts should be collected during training with torch.no_grad(): current_min, current_max = get_tensor_min_max(input) # input Is the output value of the activation function self.iter_count += 1 # The biased value is the normal weighted value , The unbiased value is Biased value /(1-decay**step) self.tracked_min_biased.data, self.tracked_min.data = update_ema(self.tracked_min_biased.data, current_min, self.ema_decay, self.iter_count) self.tracked_max_biased.data, self.tracked_max.data = update_ema(self.tracked_max_biased.data, current_max, self.ema_decay, self.iter_count) if self.mode == LinearQuantMode.SYMMETRIC: max_abs = max(abs(self.tracked_min), abs(self.tracked_max)) actual_min, actual_max = -max_abs, max_abs if self.training: # The range value of the activation value is EMA Recalculate after update scale and zp self.scale.data, self.zero_point.data = symmetric_linear_quantization_params(self.num_bits, max_abs) else: actual_min, actual_max = self.tracked_min, self.tracked_max signed = self.mode == LinearQuantMode.ASYMMETRIC_SIGNED if self.training: # The range value of the activation value is EMA Recalculate after update scale and zp self.scale.data, self.zero_point.data = asymmetric_linear_quantization_params(self.num_bits, self.tracked_min, self.tracked_max, signed=signed) input = clamp(input, actual_min.item(), actual_max.item(), False) # Perform quantification 、 Inverse quantization operation , And the process does not require additional gradients input = LinearQuantizeSTE.apply(input, self.scale, self.zero_point, self.dequantize, False) return input
- LinearQuantizeSTE: This is the core of pseudo quantization , The definition is as follows ; It is defined as torch.autograd.Function, Specifies how to back propagate (STE The way )
- Next, let's look at the implementation of quantitative perception of parameters (linear_quantize_param), Use it directly LinearQuantizeSTE
- notes :distiller Although the quantization perception training quantizer defines how Do quantitative perception training for parameters , But it doesn't use , It's a little strange. ;
summary
- This paper introduces distiller Quantizer base class Quantizer A subclass of :PostTrainLinearQuantizer;
- The core part is Activation value 、 Parameter value Realization of quantitative perception training ; The realization of quantitative perception of activation value still adopts wrapper The way , Parameters are used directly STE; Specific details include FakeQuantizationWrapper、FakeLinearQuantization、LinearQuantizeSTE;
边栏推荐
- Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
- LeetCode:836. 矩形重叠
- Simple use of promise in uniapp
- After reading the programmer's story, I can't help covering my chest...
- Computer graduation design PHP Zhiduo online learning platform
- 随手记01
- Warning in install. packages : package ‘RGtk2’ is not available for this version of R
- [OC]-<UI入门>--常用控件-提示对话框 And 等待提示器(圈)
- [OC-Foundation框架]-<字符串And日期与时间>
- LeetCode:26. Remove duplicates from an ordered array
猜你喜欢
[text generation] recommended in the collection of papers - Stanford researchers introduce time control methods to make long text generation more smooth
【嵌入式】使用JLINK RTT打印log
BMINF的後訓練量化實現
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
Using pkgbuild:: find in R language_ Rtools check whether rtools is available and use sys The which function checks whether make exists, installs it if not, and binds R and rtools with the writelines
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
[MySQL] limit implements paging
Deep anatomy of C language -- C language keywords
一改测试步骤代码就全写 为什么不试试用 Yaml实现数据驱动?
数字人主播618手语带货,便捷2780万名听障人士
随机推荐
KDD 2022 paper collection (under continuous update)
Revit 二次开发 HOF 方式调用transaction
Leetcode: Jianzhi offer 04 Search in two-dimensional array
How to effectively conduct automated testing?
Advanced Computer Network Review(5)——COPE
Warning in install. packages : package ‘RGtk2’ is not available for this version of R
LeetCode:26. 删除有序数组中的重复项
多元聚类分析
LeetCode:236. The nearest common ancestor of binary tree
如何有效地进行自动化测试?
Revit secondary development Hof method calls transaction
甘肃旅游产品预订增四倍:“绿马”走红,甘肃博物馆周边民宿一房难求
SimCLR:NLP中的对比学习
Intel Distiller工具包-量化实现3
704 binary search
LeetCode:剑指 Offer 48. 最长不含重复字符的子字符串
Tdengine biweekly selection of community issues | phase III
Advance Computer Network Review(1)——FatTree
Leetcode: Sword finger offer 48 The longest substring without repeated characters
R language uses the principal function of psych package to perform principal component analysis on the specified data set. PCA performs data dimensionality reduction (input as correlation matrix), cus