当前位置:网站首页>Pytorch quantitative practice (1)
Pytorch quantitative practice (1)
2022-06-30 21:43:00 【Breeze_】
Translation source https://pytorch.org/blog/quantization-in-practice/
Quantification is a cheap and simple method , It can make the deep neural network model run faster , And has lower memory requirements .PyTorch Several different methods of quantifying models are provided . In this blog post , We will ( Fast ) Lay the foundation for quantification in deep learning , Then look at how each technology works in practice . Last , We will conclude with the recommendations in the literature on the use of quantification in workflow .
Principle of quantification
If someone asks you what time it is , You can't answer “10:14:34:430705”, And they say “10 A quarter past ”.
The essence of quantification is information compression , In deep networks , It refers to reducing its weight and / Or the activated numerical precision .
Over parameterized deep neural network (DNN) There are more degrees of freedom , This makes them good candidates for information compression [1]. When you quantify a model , Two things usually happen —— The model becomes smaller , More efficient operation . Hardware vendors explicitly allow faster processing 8 Bit data ( comparison 32 Bit data ), So as to achieve higher throughput . Smaller models have lower memory footprint and power consumption [2], This is critical for edge deployment .
Function mapping
The mapping function is a function that converts a value from Floating point space Mapping to Integer space Function of . The commonly used mapping function is given by the linear transformation Q ( r ) = round ( r / S + Z ) Q(r)=\operatorname{round}(r / S+Z) Q(r)=round(r/S+Z), among r r r It's input , S S S and Z Z Z Is a quantitative parameter .
To re convert to floating point space , The corresponding inverse function is r ~ = ( Q ( r ) − Z ) ⋅ S , r ~ ≠ r \tilde{r}=(Q(r)-Z) \cdot S,\tilde{r}≠r r~=(Q(r)−Z)⋅S,r~=r, The difference between them constitutes the quantization error .
Quantizing parameters
The mapping function consists of The scaling factor S S S and Zero parameter (zero-point) Z Z Z constitute , among , S S S Is the ratio of input and output range , Yes S = β − α β q − α q S=\frac{\beta-\alpha}{\beta_{q}-\alpha_{q}} S=βq−αqβ−α, among , [ α , β ] [\alpha,\beta] [α,β] Is the input crop range , That is, the allowed input boundary . and [ α q , β q ] [\alpha_q,\beta_q] [αq,βq] Is the range of the mapped quantized output space , for example 8bit The quantification of , Output range β q − α q < = ( 2 8 − 1 ) \beta_{q}-\alpha_{q}<=(2^8-1) βq−αq<=(28−1)
Z Z Z As a deviation , To ensure that... In the input space 0 Perfectly mapped to the 0, Yes Z = − ( α S − α q ) Z=-\left(\frac{\alpha}{S}-\alpha_{q}\right) Z=−(Sα−αq)
calibration
The process of selecting the input shear range is called calibration . The simplest Technology ( It's also PyTorch The default method for ) Is the minimum and maximum value of the record operation , And assign them to α \alpha α and β \beta β.TensorRT Entropy minimization is also used (KL The divergence ), Mean square error minimization , Or enter the percentage of the range .
stay PyTorch in ,Observer modular (docs, code) Collect statistical information of input values and calculate quantitative parameters S S S and Z Z Z. Different calibration schemes will produce different quantized outputs , It is best to verify through experience which scheme is most suitable for applications and architectures ( It will be described in detail later ).
from torch.quantization.observer import MinMaxObserver, MovingAverageMinMaxObserver, HistogramObserver
C, L = 3, 4
normal = torch.distributions.normal.Normal(0,1)
inputs = [normal.sample((C, L)), normal.sample((C, L))]
print(inputs)
# >>>>>
# [tensor([[-0.0590, 1.1674, 0.7119, -1.1270],
# [-1.3974, 0.5077, -0.5601, 0.0683],
# [-0.0929, 0.9473, 0.7159, -0.4574]]]),
# tensor([[-0.0236, -0.7599, 1.0290, 0.8914],
# [-1.1727, -1.2556, -0.2271, 0.9568],
# [-0.2500, 1.4579, 1.4707, 0.4043]])]
observers = [MinMaxObserver(), MovingAverageMinMaxObserver(), HistogramObserver()]
for obs in observers:
for x in inputs: obs(x)
print(obs.__class__.__name__, obs.calculate_qparams())
# >>>>>
# MinMaxObserver (tensor([0.0112]), tensor([124], dtype=torch.int32))
# MovingAverageMinMaxObserver (tensor([0.0101]), tensor([139], dtype=torch.int32))
# HistogramObserver (tensor([0.0100]), tensor([106], dtype=torch.int32))
Affine and symmetric quantization schemes
Affine or asymmetric quantization schemes Assign input ranges to minimum and maximum observations . Affine schemes usually use tighter scopes , Useful for quantifying nonnegative activation ( If the input tensor is never negative , The input range does not need to contain negative values ). The scope is α = m i n ( r ) , β = m a x ( r ) \alpha=min(r),\beta=max(r) α=min(r),β=max(r). When used for weight tensor [3] when , Affine quantization leads to greater computational inference costs .
Symmetric quantization scheme Focus the input range on 0 near , It eliminates the need to calculate the zero offset . The range is calculated as − α = β = m a x ( ∣ m i n ( r ) ∣ , ∣ m a x ( r ) ∣ ) -\alpha=\beta=max(|min(r)|,|max(r)|) −α=β=max(∣min(r)∣,∣max(r)∣). For tilt signals ( If non negative activation ), This can lead to poor quantization resolution , Because the range may include values that have never been displayed in the input .
In conclusion , Asymmetric quantization is useful for nonnegative activation , Quantization of the weight tensor is computationally expensive ; Symmetric quantization schemes can be bad for non negative activation .
act = torch.distributions.pareto.Pareto(1, 10).sample((1,1024))
weights = torch.distributions.normal.Normal(0, 0.12).sample((3, 64, 7, 7)).flatten()
def get_symmetric_range(x):
beta = torch.max(x.max(), x.min().abs())
return -beta.item(), beta.item()
def get_affine_range(x):
return x.min().item(), x.max().item()
def plot(plt, data, scheme):
boundaries = get_affine_range(data) if scheme == 'affine' else get_symmetric_range(data)
a, _, _ = plt.hist(data, density=True, bins=100)
ymin, ymax = np.quantile(a[a>0], [0.25, 0.95])
plt.vlines(x=boundaries, ls='--', colors='purple', ymin=ymin, ymax=ymax)
fig, axs = plt.subplots(2,2)
plot(axs[0, 0], act, 'affine')
axs[0, 0].set_title("Activation, Affine-Quantized")
plot(axs[0, 1], act, 'symmetric')
axs[0, 1].set_title("Activation, Symmetric-Quantized")
plot(axs[1, 0], weights, 'affine')
axs[1, 0].set_title("Weights, Affine-Quantized")
plot(axs[1, 1], weights, 'symmetric')
axs[1, 1].set_title("Weights, Symmetric-Quantized")
plt.show()

stay PyTorch in , You can initialize Observer Specifies an affine or symmetric scheme . Be careful , Not all obeserver Both modes are supported .
for qscheme in [torch.per_tensor_affine, torch.per_tensor_symmetric]:
obs = MovingAverageMinMaxObserver(qscheme=qscheme)
for x in inputs: obs(x)
print(f"Qscheme: {
qscheme} | {
obs.calculate_qparams()}")
# >>>>>
# Qscheme: torch.per_tensor_affine | (tensor([0.0101]), tensor([139], dtype=torch.int32))
# Qscheme: torch.per_tensor_symmetric | (tensor([0.0109]), tensor([128]))
Tensor by tensor and channel by channel quantization schemes
The quantization parameter can be used as a whole to calculate the whole weight tensor of the layer , The weight tensor of each channel can also be calculated separately . In tensor by tensor , The same shear range applies to all channels in a layer
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-m6yuuvXT-1653633614929)(https://pytorch.org/assets/images/quantization-practice/per-channel-tensor.svg)]
chart 3 Shown . Each channel uses a set of quantization parameters for each channel . Each tensor uses the same quantization parameter for the entire tensor . For weight quantification , Symmetric per channel quantization provides better accuracy ; The performance of each tensor quantification is poor , It may be due to batchnorm Fold [3] High variance of convolution weights across channels .
from torch.quantization.observer import MovingAveragePerChannelMinMaxObserver
obs = MovingAveragePerChannelMinMaxObserver(ch_axis=0) # calculate qparams for all `C` channels separately
for x in inputs: obs(x)
print(obs.calculate_qparams())
# >>>>>
# (tensor([0.0090, 0.0075, 0.0055]), tensor([125, 187, 82], dtype=torch.int32))
Back end engine
at present , The quantization operator passes through FBGEMM The back end runs on x86 On the machine , Or in ARM Use on the machine QNNPACK The original language . To the server gpu Back end support ( adopt TensorRT and cuDNN) Coming soon . Learn more about extending quantification to custom backend :RFC-0019.
backend = 'fbgemm' if x86 else 'qnnpack'
qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
QConfig
QConfig Storage Observer And a quantization scheme for quantizing activation and weighting .
Make sure that the message is Observer class ( Not an instance ), Or you can return Observer Instance's callable object . Use with_args() Override default parameters .
my_qconfig = torch.quantization.QConfig(
activation=MovingAverageMinMaxObserver.with_args(qscheme=torch.per_tensor_affine),
weight=MovingAveragePerChannelMinMaxObserver.with_args(qscheme=torch.qint8)
)
# >>>>>
# QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.MovingAverageMinMaxObserver'>, qscheme=torch.per_tensor_affine){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MovingAveragePerChannelMinMaxObserver'>, qscheme=torch.qint8){})
边栏推荐
- Multi table operation - foreign key constraint
- The Jenkins download Plug-in can't be downloaded. Solution
- 兴奋神经递质——谷氨酸与大脑健康
- Akk bacteria - the next generation of beneficial bacteria
- 请问,启牛证券开户,可以开户吗?安全吗?你想要的答案全在这里
- 本地浏览器打开远程服务器上的Jupyter Notebook/Lab以及常见问题&设置
- Clickhouse Native Monitoring item, System table Description
- 【回溯】全排列 II leetcode47
- . NETCORE redis geo type
- 升级kube出现unknown flag: --network-plugin
猜你喜欢

Markdown notes concise tutorial

Spatiotemporal data mining: an overview

Ten security measures against unauthorized access attacks

Summary of errors reported when using YML file to migrate CONDA environment
![[untitled]](/img/42/47a8c8faaed33a1d9e864cb2ef7b72.png)
[untitled]

Akk bacteria - the next generation of beneficial bacteria

Text recognition svtr paper interpretation

Open source internship experience sharing: openeuler software package reinforcement test

Rethink healthy diet based on intestinal microbiome

Reading notes of Clickhouse principle analysis and Application Practice (3)
随机推荐
Five years after graduation, I wondered if I would still be so anxious if I hadn't taken the test
1-21 JSONP接口
漫谈Clickhouse Join
[untitled]
To the Sultanate of Anderson
【回溯】全排列 leetcode46
Open source internship experience sharing: openeuler software package reinforcement test
1-10 根据不同的url响应客户端的内容
《ClickHouse原理解析与应用实践》读书笔记(1)
Testing media cache
模板方法模式介绍与示例
1-2 install and configure MySQL related software
Reading notes of Clickhouse principle analysis and Application Practice (1)
VIM common shortcut keys
ca i啊几次哦啊句iu家哦
1-17 express Middleware
A group of K inverted linked lists
Coefficient of variation method matlab code [easy to understand]
asp. Net core JWT delivery
Spatiotemporal data mining: an overview