当前位置:网站首页>Intel distiller Toolkit - Quantitative implementation 3
Intel distiller Toolkit - Quantitative implementation 3
2022-07-06 08:57:00 【cyz0202】
This series of articles
Intel Distiller tool kit - Quantitative realization 1
Intel Distiller tool kit - Quantitative realization 2
Intel Distiller tool kit - Quantitative realization 3
review
- The above article introduces Distiller And Quantizer Base class , Post training quantizer ; Base classes define important variables , Such as replacement_factory(dict, Used to record to be quantified module Corresponding wrapper); In addition, the quantitative process is defined , Include Preprocessing (BN Fold , Activate optimization, etc )、 Quantization module replacement 、 post-processing And other main steps ; The post training quantizer realizes the function of post training quantization based on the base class ;
- This article continues to introduce inheritance from Quantizer Subclass quantizer of , Include
- PostTrainLinearQuantizer( above )
- QuantAwareTrainRangeLinearQuantizer( this paper )
- PACTQuantizer( follow-up )
- NCFQuantAwareTrainQuantizer( follow-up )
- There are many codes in this article , Because I can't post them all , Some places are not clear , Please also refer to the source code ;
QuantAwareTrainRangeLinearQuantizer
- Quantization perception training quantizer ; Insert the quantification process into the model code , Train the model ; This process makes the model parameters fit the quantization process , So the effect of the final model It's generally better than The post training quantitative model is better ;
- QuantAwareTrainRangeLinearQuantizer The class definition of is as follows : It can be seen that the definition of post training quantizer is much simpler ;
- Constructors : Check and default settings are all in the front ; The core is in the red box Yes Parameters 、 Activation value Set up Quantitative perception The way ;
- activation_replace_fn: This is the realization of quantitative perception of activation value , And the quantitative use of the previous post training Module replacement equally , I.e. return to a wrapper, Here is FakeQuantizationWrapper
- FakeQuantizationWrapper: The definition is as follows ,forward Input in the first pass through the original module Calculation ( Get the original activation output ), Then output ( next module The input of ) Do pseudo quantification (fake_q);
- FakeLinearQuantization: The definition is as follows , The module What I do is Pseudo quantize the input ; Details include The training process is determined Activate the range of values and update scale、zp(infer Then directly use the last of the training process scale、zp); Use LinearQuantizeSTE(straight-through-estimator) Realize pseudo quantization ;
class FakeLinearQuantization(nn.Module): def __init__(self, num_bits=8, mode=LinearQuantMode.SYMMETRIC, ema_decay=0.999, dequantize=True, inplace=False): """ :param num_bits: :param mode: :param ema_decay: The activation value range uses EMA Tracking :param dequantize: :param inplace: """ super(FakeLinearQuantization, self).__init__() self.num_bits = num_bits self.mode = mode self.dequantize = dequantize self.inplace = inplace # We track activations ranges with exponential moving average, as proposed by Jacob et al., 2017 # https://arxiv.org/abs/1712.05877( The activation value range uses EMA Tracking ) # We perform bias correction on the EMA, so we keep both unbiased and biased values and the iterations count # For a simple discussion of this see here: # https://www.coursera.org/lecture/deep-neural-network/bias-correction-in-exponentially-weighted-averages-XjuhD self.register_buffer('ema_decay', torch.tensor(ema_decay)) # Set up buffer,buffer For nonparametric storage , Will be stored in model state_dict self.register_buffer('tracked_min_biased', torch.zeros(1)) self.register_buffer('tracked_min', torch.zeros(1)) # Save unbiased values self.register_buffer('tracked_max_biased', torch.zeros(1)) # Save biased values self.register_buffer('tracked_max', torch.zeros(1)) self.register_buffer('iter_count', torch.zeros(1)) # Save iterations self.register_buffer('scale', torch.ones(1)) self.register_buffer('zero_point', torch.zeros(1)) def forward(self, input): # We update the tracked stats only in training # # Due to the way DataParallel works, we perform all updates in-place so the "main" device retains # its updates. (see https://pytorch.org/docs/stable/nn.html#dataparallel) # However, as it is now, the in-place update of iter_count causes an error when doing # back-prop with multiple GPUs, claiming a variable required for gradient calculation has been modified # in-place. Not clear why, since it's not used in any calculations that keep a gradient. # It works fine with a single GPU. TODO: Debug... if self.training: # Receipts should be collected during training with torch.no_grad(): current_min, current_max = get_tensor_min_max(input) # input Is the output value of the activation function self.iter_count += 1 # The biased value is the normal weighted value , The unbiased value is Biased value /(1-decay**step) self.tracked_min_biased.data, self.tracked_min.data = update_ema(self.tracked_min_biased.data, current_min, self.ema_decay, self.iter_count) self.tracked_max_biased.data, self.tracked_max.data = update_ema(self.tracked_max_biased.data, current_max, self.ema_decay, self.iter_count) if self.mode == LinearQuantMode.SYMMETRIC: max_abs = max(abs(self.tracked_min), abs(self.tracked_max)) actual_min, actual_max = -max_abs, max_abs if self.training: # The range value of the activation value is EMA Recalculate after update scale and zp self.scale.data, self.zero_point.data = symmetric_linear_quantization_params(self.num_bits, max_abs) else: actual_min, actual_max = self.tracked_min, self.tracked_max signed = self.mode == LinearQuantMode.ASYMMETRIC_SIGNED if self.training: # The range value of the activation value is EMA Recalculate after update scale and zp self.scale.data, self.zero_point.data = asymmetric_linear_quantization_params(self.num_bits, self.tracked_min, self.tracked_max, signed=signed) input = clamp(input, actual_min.item(), actual_max.item(), False) # Perform quantification 、 Inverse quantization operation , And the process does not require additional gradients input = LinearQuantizeSTE.apply(input, self.scale, self.zero_point, self.dequantize, False) return input
- LinearQuantizeSTE: This is the core of pseudo quantization , The definition is as follows ; It is defined as torch.autograd.Function, Specifies how to back propagate (STE The way )
- Next, let's look at the implementation of quantitative perception of parameters (linear_quantize_param), Use it directly LinearQuantizeSTE
- notes :distiller Although the quantization perception training quantizer defines how Do quantitative perception training for parameters , But it doesn't use , It's a little strange. ;
summary
- This paper introduces distiller Quantizer base class Quantizer A subclass of :PostTrainLinearQuantizer;
- The core part is Activation value 、 Parameter value Realization of quantitative perception training ; The realization of quantitative perception of activation value still adopts wrapper The way , Parameters are used directly STE; Specific details include FakeQuantizationWrapper、FakeLinearQuantization、LinearQuantizeSTE;
边栏推荐
- ESP8266-RTOS物联网开发
- LeetCode:剑指 Offer 42. 连续子数组的最大和
- vb. Net changes with the window, scales the size of the control and maintains its relative position
- What is the role of automated testing frameworks? Shanghai professional third-party software testing company Amway
- KDD 2022论文合集(持续更新中)
- Excellent software testers have these abilities
- 项目连接数据库遇到的问题及解决
- The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
- Variable length parameter
- After reading the programmer's story, I can't help covering my chest...
猜你喜欢
【嵌入式】使用JLINK RTT打印log
Target detection - pytorch uses mobilenet series (V1, V2, V3) to build yolov4 target detection platform
Compétences en mémoire des graphiques UML
UnsupportedOperationException异常
Ijcai2022 collection of papers (continuously updated)
Navicat premium create MySQL create stored procedure
KDD 2022论文合集(持续更新中)
[OC]-<UI入门>--常用控件-UIButton
[OC]-<UI入门>--常用控件的学习
Excellent software testers have these abilities
随机推荐
LeetCode:836. Rectangle overlap
Revit secondary development Hof method calls transaction
LeetCode:39. Combined sum
LeetCode:387. 字符串中的第一个唯一字符
Variable length parameter
Leetcode: Sword finger offer 48 The longest substring without repeated characters
MySQL uninstallation and installation methods
LeetCode:221. 最大正方形
Delay initialization and sealing classes
Implement window blocking on QWidget
R language ggplot2 visualization: place the title of the visualization image in the upper left corner of the image (customize Title position in top left of ggplot2 graph)
BN folding and its quantification
甘肃旅游产品预订增四倍:“绿马”走红,甘肃博物馆周边民宿一房难求
UML图记忆技巧
【文本生成】论文合集推荐丨 斯坦福研究者引入时间控制方法 长文本生成更流畅
LeetCode:124. 二叉树中的最大路径和
LeetCode:673. 最长递增子序列的个数
Advanced Computer Network Review(3)——BBR
Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
LeetCode:236. The nearest common ancestor of binary tree