当前位置：网站首页>[model compression pruning / quantification / distillation /automl]

[model compression pruning / quantification / distillation /automl]

2022-06-10 03:40:00 【Network sky (LUOC)】

List of articles

reason
Solution
Purpose
1. prune
2. quantitative
3. Distillation
4. AutoML

reason

Deep learning has high computational complexity , Parameter redundancy .

Solution

（1） Linear or nonlinear quantization .
（2） Structural or unstructured pruning .
（3） Network structure search .
（4） Low rank decomposition of weight matrix .（ Distillation ）

Purpose

Optimization accuracy 、 performance 、 Storage …… This enables the deployment of corresponding models in some scenarios and devices .

1. prune

（1） The determination of pruning position is generally based on the weight . The less weight , It is proved that the role of the neuron is smaller .

（2） The way of pruning ： Delete the weight vector on the network layer / The whole neuron / Single pixel （ data ）. Due to the parallelization of matrix operations , Subtracting a single pixel or vector does not reduce the amount of computation . That is, some hardware does not support the transportation of sparse matrix , So the general pruning operation is to directly subtract the whole neuron .

（3） Identify the location of scissor neurons , The number of neurons activated , The closer the 0 The more useless .

（4） Pruning process ： Training 、 prune 、 Get the weight 、 Retraining .

（5） Training skills ： For pruning , The optimizer of the training process should not be too drastic , Not too gentle . Otherwise it will destroy what has been learned . Commonly used SGD Optimizer （ It is mild ）.Adam Optimizer for comparing distances .

（6） Pruning way ：

① According to some rules , Randomly prune by pixel position .

② According to some rules , Random pruning by vector .

③ Then prune the convolution kernel .（ According to the size of the convolution kernel module , Determine the pruning position ）

④ Direct random subtraction of channels .

⑥ Pruning is divided into ： Structural pruning and unstructured pruning .

（7） Realization principle ： Lower the weight , Retain according to the data with significant weight , The useless numerical weight is getting smaller and smaller , Fade away .L1 Regularization , Regularize the net output of the data .（ The net output is normalized （norm）： Make a part of the weight lower , The weight of the other part will increase .）

2. quantitative

（1） The basic theory ：
precision ： The conventional accuracy is generally FP32, Store model weights ; The low accuracy is generally FP16,INT8…… Fast calculation .
Mixing accuracy ： Use... In the model FP32 and FP16.FP16 Reduced memory by half , But some parameters and operators have to use INT8.
The principle of quantification ： Quantify general values INT8, That is, map the weight to INT8 Between the ranges of , Fast calculation .（ The mapping range of quantification is generally unequal , Because the weight is generally small , At the origin, it can be regarded as an approximate bisection .）

（2） According to the weight storage, it is divided into ： Binary neural network 、 Ternary weight network 、XNOR The Internet .

（3） Generally used in industry FP32 Train the model （ The pursuit of precision ） , Use... For the reasoning part INT8（ Improve performance ）.

（4） Basic code steps ： Details can be learned through official documents

① Network packaging is divided into blocks （ After packaging a network sub block , Quantify together . for example ：conv + BN + Relu Combine them into a module .）—— Note here that you want to use quantitative support for the plate .
② Prepare assessment tools .（ for example ：top1、top5 accuracy 、 Time consuming 、 Storage size ……）
③ Train the original network .（ Train first , Post quantization ）
④ Start quantifying .（ Compare the evaluation indicators before and after quantification ）
⑤ do QAT Pseudo training .（ The accuracy may decrease after quantification , According to performance requirements , Do more training ）

3. Distillation

（1） Distillation , Also called teacher student model , It belongs to transfer learning .

The principle of distillation ： First, pre train a large model , Teach small models with large models （ The results of the large model act as a priori of the small model at the neuron level ）, Make the small model have the precision of the large model , The performance is higher than that of the large model .

（2） Common methods ： The blogger wrote this article very well

4. AutoML

（1）NAS Neural architecture search ：

① First define some neural network structures .
② Random combination of data and network structure .（ Let the neural network choose the next component by itself ）
③ One structure predicts the next , Finally, it is combined into a neural network . And then judge the quality of the network through training .

（2） Simulated annealing ：（Light-NAS、Paddleslim……）

（3）DARTS： Gradient based architecture search See this blogger for a detailed explanation
① This is based on reinforcement learning NAS
② Eight components are defined .（33/55 Convolution 、33/55 empty 、 Maximum pooling 、 The average pooling 、 No operation 、 crack ……）
③ Initialize first N Nodes , The operation between two points is a random optional operation in a given seven components .
④ The optional operations between each component are defined as softmax, Jointly optimize the mixed operation probability and weight .（ The weight of an operation increases , The weight of other operations will be reduced , Know that there is only one option left ）
⑤ The final network structure is obtained from the mixed operation probability .

原网站

版权声明
本文为[Network sky (LUOC)]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/161/202206100325155776.html