当前位置:网站首页>Pytorch quantitative practice (1)
Pytorch quantitative practice (1)
2022-06-30 21:43:00 【Breeze_】
Translation source https://pytorch.org/blog/quantization-in-practice/
Quantification is a cheap and simple method , It can make the deep neural network model run faster , And has lower memory requirements .PyTorch Several different methods of quantifying models are provided . In this blog post , We will ( Fast ) Lay the foundation for quantification in deep learning , Then look at how each technology works in practice . Last , We will conclude with the recommendations in the literature on the use of quantification in workflow .
Principle of quantification
If someone asks you what time it is , You can't answer “10:14:34:430705”, And they say “10 A quarter past ”.
The essence of quantification is information compression , In deep networks , It refers to reducing its weight and / Or the activated numerical precision .
Over parameterized deep neural network (DNN) There are more degrees of freedom , This makes them good candidates for information compression [1]. When you quantify a model , Two things usually happen —— The model becomes smaller , More efficient operation . Hardware vendors explicitly allow faster processing 8 Bit data ( comparison 32 Bit data ), So as to achieve higher throughput . Smaller models have lower memory footprint and power consumption [2], This is critical for edge deployment .
Function mapping
The mapping function is a function that converts a value from Floating point space Mapping to Integer space Function of . The commonly used mapping function is given by the linear transformation Q ( r ) = round ( r / S + Z ) Q(r)=\operatorname{round}(r / S+Z) Q(r)=round(r/S+Z), among r r r It's input , S S S and Z Z Z Is a quantitative parameter .
To re convert to floating point space , The corresponding inverse function is r ~ = ( Q ( r ) − Z ) ⋅ S , r ~ ≠ r \tilde{r}=(Q(r)-Z) \cdot S,\tilde{r}≠r r~=(Q(r)−Z)⋅S,r~=r, The difference between them constitutes the quantization error .
Quantizing parameters
The mapping function consists of The scaling factor S S S and Zero parameter (zero-point) Z Z Z constitute , among , S S S Is the ratio of input and output range , Yes S = β − α β q − α q S=\frac{\beta-\alpha}{\beta_{q}-\alpha_{q}} S=βq−αqβ−α, among , [ α , β ] [\alpha,\beta] [α,β] Is the input crop range , That is, the allowed input boundary . and [ α q , β q ] [\alpha_q,\beta_q] [αq,βq] Is the range of the mapped quantized output space , for example 8bit The quantification of , Output range β q − α q < = ( 2 8 − 1 ) \beta_{q}-\alpha_{q}<=(2^8-1) βq−αq<=(28−1)
Z Z Z As a deviation , To ensure that... In the input space 0 Perfectly mapped to the 0, Yes Z = − ( α S − α q ) Z=-\left(\frac{\alpha}{S}-\alpha_{q}\right) Z=−(Sα−αq)
calibration
The process of selecting the input shear range is called calibration . The simplest Technology ( It's also PyTorch The default method for ) Is the minimum and maximum value of the record operation , And assign them to α \alpha α and β \beta β.TensorRT Entropy minimization is also used (KL The divergence ), Mean square error minimization , Or enter the percentage of the range .
stay PyTorch in ,Observer
modular (docs, code) Collect statistical information of input values and calculate quantitative parameters S S S and Z Z Z. Different calibration schemes will produce different quantized outputs , It is best to verify through experience which scheme is most suitable for applications and architectures ( It will be described in detail later ).
from torch.quantization.observer import MinMaxObserver, MovingAverageMinMaxObserver, HistogramObserver
C, L = 3, 4
normal = torch.distributions.normal.Normal(0,1)
inputs = [normal.sample((C, L)), normal.sample((C, L))]
print(inputs)
# >>>>>
# [tensor([[-0.0590, 1.1674, 0.7119, -1.1270],
# [-1.3974, 0.5077, -0.5601, 0.0683],
# [-0.0929, 0.9473, 0.7159, -0.4574]]]),
# tensor([[-0.0236, -0.7599, 1.0290, 0.8914],
# [-1.1727, -1.2556, -0.2271, 0.9568],
# [-0.2500, 1.4579, 1.4707, 0.4043]])]
observers = [MinMaxObserver(), MovingAverageMinMaxObserver(), HistogramObserver()]
for obs in observers:
for x in inputs: obs(x)
print(obs.__class__.__name__, obs.calculate_qparams())
# >>>>>
# MinMaxObserver (tensor([0.0112]), tensor([124], dtype=torch.int32))
# MovingAverageMinMaxObserver (tensor([0.0101]), tensor([139], dtype=torch.int32))
# HistogramObserver (tensor([0.0100]), tensor([106], dtype=torch.int32))
Affine and symmetric quantization schemes
Affine or asymmetric quantization schemes Assign input ranges to minimum and maximum observations . Affine schemes usually use tighter scopes , Useful for quantifying nonnegative activation ( If the input tensor is never negative , The input range does not need to contain negative values ). The scope is α = m i n ( r ) , β = m a x ( r ) \alpha=min(r),\beta=max(r) α=min(r),β=max(r). When used for weight tensor [3] when , Affine quantization leads to greater computational inference costs .
Symmetric quantization scheme Focus the input range on 0 near , It eliminates the need to calculate the zero offset . The range is calculated as − α = β = m a x ( ∣ m i n ( r ) ∣ , ∣ m a x ( r ) ∣ ) -\alpha=\beta=max(|min(r)|,|max(r)|) −α=β=max(∣min(r)∣,∣max(r)∣). For tilt signals ( If non negative activation ), This can lead to poor quantization resolution , Because the range may include values that have never been displayed in the input .
In conclusion , Asymmetric quantization is useful for nonnegative activation , Quantization of the weight tensor is computationally expensive ; Symmetric quantization schemes can be bad for non negative activation .
act = torch.distributions.pareto.Pareto(1, 10).sample((1,1024))
weights = torch.distributions.normal.Normal(0, 0.12).sample((3, 64, 7, 7)).flatten()
def get_symmetric_range(x):
beta = torch.max(x.max(), x.min().abs())
return -beta.item(), beta.item()
def get_affine_range(x):
return x.min().item(), x.max().item()
def plot(plt, data, scheme):
boundaries = get_affine_range(data) if scheme == 'affine' else get_symmetric_range(data)
a, _, _ = plt.hist(data, density=True, bins=100)
ymin, ymax = np.quantile(a[a>0], [0.25, 0.95])
plt.vlines(x=boundaries, ls='--', colors='purple', ymin=ymin, ymax=ymax)
fig, axs = plt.subplots(2,2)
plot(axs[0, 0], act, 'affine')
axs[0, 0].set_title("Activation, Affine-Quantized")
plot(axs[0, 1], act, 'symmetric')
axs[0, 1].set_title("Activation, Symmetric-Quantized")
plot(axs[1, 0], weights, 'affine')
axs[1, 0].set_title("Weights, Affine-Quantized")
plot(axs[1, 1], weights, 'symmetric')
axs[1, 1].set_title("Weights, Symmetric-Quantized")
plt.show()
stay PyTorch in , You can initialize Observer Specifies an affine or symmetric scheme . Be careful , Not all obeserver Both modes are supported .
for qscheme in [torch.per_tensor_affine, torch.per_tensor_symmetric]:
obs = MovingAverageMinMaxObserver(qscheme=qscheme)
for x in inputs: obs(x)
print(f"Qscheme: {
qscheme} | {
obs.calculate_qparams()}")
# >>>>>
# Qscheme: torch.per_tensor_affine | (tensor([0.0101]), tensor([139], dtype=torch.int32))
# Qscheme: torch.per_tensor_symmetric | (tensor([0.0109]), tensor([128]))
Tensor by tensor and channel by channel quantization schemes
The quantization parameter can be used as a whole to calculate the whole weight tensor of the layer , The weight tensor of each channel can also be calculated separately . In tensor by tensor , The same shear range applies to all channels in a layer
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-m6yuuvXT-1653633614929)(https://pytorch.org/assets/images/quantization-practice/per-channel-tensor.svg)]
chart 3 Shown . Each channel uses a set of quantization parameters for each channel . Each tensor uses the same quantization parameter for the entire tensor . For weight quantification , Symmetric per channel quantization provides better accuracy ; The performance of each tensor quantification is poor , It may be due to batchnorm Fold [3] High variance of convolution weights across channels .
from torch.quantization.observer import MovingAveragePerChannelMinMaxObserver
obs = MovingAveragePerChannelMinMaxObserver(ch_axis=0) # calculate qparams for all `C` channels separately
for x in inputs: obs(x)
print(obs.calculate_qparams())
# >>>>>
# (tensor([0.0090, 0.0075, 0.0055]), tensor([125, 187, 82], dtype=torch.int32))
Back end engine
at present , The quantization operator passes through FBGEMM The back end runs on x86 On the machine , Or in ARM Use on the machine QNNPACK The original language . To the server gpu Back end support ( adopt TensorRT and cuDNN) Coming soon . Learn more about extending quantification to custom backend :RFC-0019.
backend = 'fbgemm' if x86 else 'qnnpack'
qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
QConfig
QConfig Storage Observer And a quantization scheme for quantizing activation and weighting .
Make sure that the message is Observer class ( Not an instance ), Or you can return Observer Instance's callable object . Use with_args() Override default parameters .
my_qconfig = torch.quantization.QConfig(
activation=MovingAverageMinMaxObserver.with_args(qscheme=torch.per_tensor_affine),
weight=MovingAveragePerChannelMinMaxObserver.with_args(qscheme=torch.qint8)
)
# >>>>>
# QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.MovingAverageMinMaxObserver'>, qscheme=torch.per_tensor_affine){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MovingAveragePerChannelMinMaxObserver'>, qscheme=torch.qint8){})
边栏推荐
- Clickhouse Native Monitoring item, System table Description
- VIM common shortcut keys
- ca i啊几次哦啊句iu家哦11111
- 1-19 using CORS to solve interface cross domain problems
- 【回溯】全排列 II leetcode47
- Jupyter notebook/lab switch CONDA environment
- Clickhouse distributed table engine
- A group of K inverted linked lists
- sdfsdf
- 【无标题】
猜你喜欢
Reading notes of Clickhouse principle analysis and Application Practice (2)
[untitled]
Spatiotemporal data mining: an overview
PyTorch量化实践(2)
Reading notes of Clickhouse principle analysis and Application Practice (3)
Ten security measures against unauthorized access attacks
clickhouse原生监控项,系统表描述
How to move forward when facing confusion in scientific research? How to give full play to women's advantages in scientific research?
Deployment and use of Nacos
It is urgent for enterprises to protect API security
随机推荐
Document Layout Analysis: A Comprehensive Survey 2019论文学习总结
激发新动能 多地发力数字经济
物联网僵尸网络Gafgyt家族与物联网设备后门漏洞利用
ML&DL:机器学习和深度学习中超参数优化的简介、评估指标、过拟合现象、常用的调参优化方法之详细攻略
Four Misunderstandings of Internet Marketing
Radar data processing technology
ca i啊几次哦啊句iu家哦
jupyterbook 清空控制台输出
Sqlserver string type converted to decimal or integer type
Spatiotemporal data mining: an overview
Phoenix architecture: an architect's perspective
Anaconda下安装Jupyter notebook
Is it safe to open an account for stock trading on mobile phones?
【无标题】
Reading notes of Clickhouse principle analysis and Application Practice (3)
1-14 express托管静态资源
开发属于自己的包
SQL server extracts pure numbers from strings
USBCAN分析仪的配套CAN和CANFD综合测试软件LKMaster软件解决工程师CAN总线测试难题
AKK菌——下一代有益菌