当前位置：网站首页>Intel distiller Toolkit - Quantitative implementation 3

Intel distiller Toolkit - Quantitative implementation 3

2022-07-06 08:57:00 【cyz0202】

This series of articles

Intel Distiller tool kit - Quantitative realization 1

Intel Distiller tool kit - Quantitative realization 2

Intel Distiller tool kit - Quantitative realization 3

review

The above article introduces Distiller And Quantizer Base class , Post training quantizer ; Base classes define important variables , Such as replacement_factory（dict, Used to record to be quantified module Corresponding wrapper）; In addition, the quantitative process is defined , Include Preprocessing （BN Fold , Activate optimization, etc ）、 Quantization module replacement 、 post-processing And other main steps ; The post training quantizer realizes the function of post training quantization based on the base class ;
This article continues to introduce inheritance from Quantizer Subclass quantizer of , Include
- PostTrainLinearQuantizer（ above ）
- QuantAwareTrainRangeLinearQuantizer（ this paper ）
- PACTQuantizer（ follow-up ）
- NCFQuantAwareTrainQuantizer（ follow-up ）
There are many codes in this article , Because I can't post them all , Some places are not clear , Please also refer to the source code ;

QuantAwareTrainRangeLinearQuantizer

Quantization perception training quantizer ; Insert the quantification process into the model code , Train the model ; This process makes the model parameters fit the quantization process , So the effect of the final model It's generally better than The post training quantitative model is better ;
QuantAwareTrainRangeLinearQuantizer The class definition of is as follows ： It can be seen that the definition of post training quantizer is much simpler ;
Constructors ： Check and default settings are all in the front ; The core is in the red box Yes Parameters 、 Activation value Set up Quantitative perception The way ;
activation_replace_fn： This is the realization of quantitative perception of activation value , And the quantitative use of the previous post training Module replacement equally , I.e. return to a wrapper, Here is FakeQuantizationWrapper
FakeQuantizationWrapper： The definition is as follows ,forward Input in the first pass through the original module Calculation （ Get the original activation output ）, Then output （ next module The input of ） Do pseudo quantification （fake_q）;

FakeLinearQuantization： The definition is as follows , The module What I do is Pseudo quantize the input ; Details include The training process is determined Activate the range of values and update scale、zp（infer Then directly use the last of the training process scale、zp）; Use LinearQuantizeSTE（straight-through-estimator） Realize pseudo quantization ;

class FakeLinearQuantization(nn.Module):
    def __init__(self, num_bits=8, mode=LinearQuantMode.SYMMETRIC, ema_decay=0.999, dequantize=True, inplace=False):
        """

        :param num_bits:
        :param mode:
        :param ema_decay:  The activation value range uses EMA Tracking 
        :param dequantize:
        :param inplace:
        """
        super(FakeLinearQuantization, self).__init__()

        self.num_bits = num_bits
        self.mode = mode
        self.dequantize = dequantize
        self.inplace = inplace

        # We track activations ranges with exponential moving average, as proposed by Jacob et al., 2017
        # https://arxiv.org/abs/1712.05877（ The activation value range uses EMA Tracking ）
        # We perform bias correction on the EMA, so we keep both unbiased and biased values and the iterations count
        # For a simple discussion of this see here:
        # https://www.coursera.org/lecture/deep-neural-network/bias-correction-in-exponentially-weighted-averages-XjuhD
        self.register_buffer('ema_decay', torch.tensor(ema_decay))  #  Set up buffer,buffer For nonparametric storage , Will be stored in model state_dict
        self.register_buffer('tracked_min_biased', torch.zeros(1))
        self.register_buffer('tracked_min', torch.zeros(1))  #  Save unbiased values 
        self.register_buffer('tracked_max_biased', torch.zeros(1))  #  Save biased values 
        self.register_buffer('tracked_max', torch.zeros(1))
        self.register_buffer('iter_count', torch.zeros(1))  #  Save iterations 
        self.register_buffer('scale', torch.ones(1))
        self.register_buffer('zero_point', torch.zeros(1))

    def forward(self, input):
        # We update the tracked stats only in training
        #
        # Due to the way DataParallel works, we perform all updates in-place so the "main" device retains
        # its updates. (see https://pytorch.org/docs/stable/nn.html#dataparallel)
        # However, as it is now, the in-place update of iter_count causes an error when doing
        # back-prop with multiple GPUs, claiming a variable required for gradient calculation has been modified
        # in-place. Not clear why, since it's not used in any calculations that keep a gradient.
        # It works fine with a single GPU. TODO: Debug...
        if self.training:  #  Receipts should be collected during training 
            with torch.no_grad():
                current_min, current_max = get_tensor_min_max(input)  # input Is the output value of the activation function 
            self.iter_count += 1
            #  The biased value is the normal weighted value , The unbiased value is   Biased value /(1-decay**step)
            self.tracked_min_biased.data, self.tracked_min.data = update_ema(self.tracked_min_biased.data,
                                                                             current_min, self.ema_decay,
                                                                             self.iter_count)
            self.tracked_max_biased.data, self.tracked_max.data = update_ema(self.tracked_max_biased.data,
                                                                             current_max, self.ema_decay,
                                                                             self.iter_count)

        if self.mode == LinearQuantMode.SYMMETRIC:
            max_abs = max(abs(self.tracked_min), abs(self.tracked_max))
            actual_min, actual_max = -max_abs, max_abs
            if self.training:  #  The range value of the activation value is EMA Recalculate after update scale and zp
                self.scale.data, self.zero_point.data = symmetric_linear_quantization_params(self.num_bits, max_abs)
        else:
            actual_min, actual_max = self.tracked_min, self.tracked_max
            signed = self.mode == LinearQuantMode.ASYMMETRIC_SIGNED
            if self.training:  #  The range value of the activation value is EMA Recalculate after update scale and zp
                self.scale.data, self.zero_point.data = asymmetric_linear_quantization_params(self.num_bits,
                                                                                              self.tracked_min,
                                                                                              self.tracked_max,
                                                                                              signed=signed)

        input = clamp(input, actual_min.item(), actual_max.item(), False)
        #  Perform quantification 、 Inverse quantization operation , And the process does not require additional gradients 
        input = LinearQuantizeSTE.apply(input, self.scale, self.zero_point, self.dequantize, False)

        return input

LinearQuantizeSTE： This is the core of pseudo quantization , The definition is as follows ; It is defined as torch.autograd.Function, Specifies how to back propagate （STE The way ）
Next, let's look at the implementation of quantitative perception of parameters （linear_quantize_param）, Use it directly LinearQuantizeSTE
notes ：distiller Although the quantization perception training quantizer defines how Do quantitative perception training for parameters , But it doesn't use , It's a little strange. ;

summary

This paper introduces distiller Quantizer base class Quantizer A subclass of ：PostTrainLinearQuantizer;
The core part is Activation value 、 Parameter value Realization of quantitative perception training ; The realization of quantitative perception of activation value still adopts wrapper The way , Parameters are used directly STE; Specific details include FakeQuantizationWrapper、FakeLinearQuantization、LinearQuantizeSTE;

原网站

版权声明
本文为[cyz0202]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060850361181.html