当前位置:网站首页>[model compression pruning / quantification / distillation /automl]
[model compression pruning / quantification / distillation /automl]
2022-06-10 03:40:00 【Network sky (LUOC)】
List of articles
reason
Deep learning has high computational complexity , Parameter redundancy .
Solution
(1) Linear or nonlinear quantization .
(2) Structural or unstructured pruning .
(3) Network structure search .
(4) Low rank decomposition of weight matrix .( Distillation )
Purpose
Optimization accuracy 、 performance 、 Storage …… This enables the deployment of corresponding models in some scenarios and devices .
1. prune
(1) The determination of pruning position is generally based on the weight . The less weight , It is proved that the role of the neuron is smaller .
(2) The way of pruning : Delete the weight vector on the network layer / The whole neuron / Single pixel ( data ). Due to the parallelization of matrix operations , Subtracting a single pixel or vector does not reduce the amount of computation . That is, some hardware does not support the transportation of sparse matrix , So the general pruning operation is to directly subtract the whole neuron .
(3) Identify the location of scissor neurons , The number of neurons activated , The closer the 0 The more useless .
(4) Pruning process : Training 、 prune 、 Get the weight 、 Retraining .
(5) Training skills : For pruning , The optimizer of the training process should not be too drastic , Not too gentle . Otherwise it will destroy what has been learned . Commonly used SGD Optimizer ( It is mild ).Adam Optimizer for comparing distances .
(6) Pruning way :
① According to some rules , Randomly prune by pixel position .
② According to some rules , Random pruning by vector .
③ Then prune the convolution kernel .( According to the size of the convolution kernel module , Determine the pruning position )
④ Direct random subtraction of channels .
⑥ Pruning is divided into : Structural pruning and unstructured pruning .
(7) Realization principle : Lower the weight , Retain according to the data with significant weight , The useless numerical weight is getting smaller and smaller , Fade away .L1 Regularization , Regularize the net output of the data .( The net output is normalized (norm): Make a part of the weight lower , The weight of the other part will increase .)
2. quantitative
(1) The basic theory :
precision : The conventional accuracy is generally FP32, Store model weights ; The low accuracy is generally FP16,INT8…… Fast calculation .
Mixing accuracy : Use... In the model FP32 and FP16.FP16 Reduced memory by half , But some parameters and operators have to use INT8.
The principle of quantification : Quantify general values INT8, That is, map the weight to INT8 Between the ranges of , Fast calculation .( The mapping range of quantification is generally unequal , Because the weight is generally small , At the origin, it can be regarded as an approximate bisection .)
(2) According to the weight storage, it is divided into : Binary neural network 、 Ternary weight network 、XNOR The Internet .
(3) Generally used in industry FP32 Train the model ( The pursuit of precision ) , Use... For the reasoning part INT8( Improve performance ).
(4) Basic code steps : Details can be learned through official documents
① Network packaging is divided into blocks ( After packaging a network sub block , Quantify together . for example :conv + BN + Relu Combine them into a module .)—— Note here that you want to use quantitative support for the plate .
② Prepare assessment tools .( for example :top1、top5 accuracy 、 Time consuming 、 Storage size ……)
③ Train the original network .( Train first , Post quantization )
④ Start quantifying .( Compare the evaluation indicators before and after quantification )
⑤ do QAT Pseudo training .( The accuracy may decrease after quantification , According to performance requirements , Do more training )
3. Distillation
(1) Distillation , Also called teacher student model , It belongs to transfer learning .
The principle of distillation : First, pre train a large model , Teach small models with large models ( The results of the large model act as a priori of the small model at the neuron level ), Make the small model have the precision of the large model , The performance is higher than that of the large model .
(2) Common methods : The blogger wrote this article very well
4. AutoML
(1)NAS Neural architecture search :
① First define some neural network structures .
② Random combination of data and network structure .( Let the neural network choose the next component by itself )
③ One structure predicts the next , Finally, it is combined into a neural network . And then judge the quality of the network through training .
(2) Simulated annealing :(Light-NAS、Paddleslim……)
(3)DARTS: Gradient based architecture search See this blogger for a detailed explanation
① This is based on reinforcement learning NAS
② Eight components are defined .(33/55 Convolution 、33/55 empty 、 Maximum pooling 、 The average pooling 、 No operation 、 crack ……)
③ Initialize first N Nodes , The operation between two points is a random optional operation in a given seven components .
④ The optional operations between each component are defined as softmax, Jointly optimize the mixed operation probability and weight .( The weight of an operation increases , The weight of other operations will be reduced , Know that there is only one option left )
⑤ The final network structure is obtained from the mixed operation probability .
边栏推荐
- Unable to start web server
- 文本输入,js防注入,识别网址
- Using the responsibility chain pattern to reconstruct the original code
- Tensor programming
- Is it the same thing to evaluate the network security level and the security of commercial password applications?
- C#封装FluentValidation,用了之后通篇还是AbstractValidator,真的看不下去了
- Script Bash
- Anti fraud system and equipment fingerprint
- Cmake record
- 【yolov3损失函数】
猜你喜欢

Vulnhub's hacksudo:thor

Milestone events Net Maui officially released

Dapr - what are the advantages of the microservice framework used by major manufacturers?

vulnhub之doubletrouble: 1

先序遍历二叉树

反欺诈体系与设备指纹

EasyExcel 实现动态导入导出

推特同意开放数据库供马斯克核查

vulnhub——BILLU: B0X

All MySQL collections from 0 to 1 are highly recommended
随机推荐
重构手法--Extract Class
EasyExcel 实现动态导入导出
[actual combat] Application Based on chromedriver and crawler related
135. distribute candy
All MySQL collections from 0 to 1 are highly recommended
Vulnhub's harripot: Fawkes
RPC 实战与核心原理-高级篇笔记
Redis core technology and practice - practice reading notes 20 ~ end
RPC practice and core principles - Advanced notes
重构--Introduce Parameter Object
A reading note of the book "deep anatomy of C language" (2nd Edition)
答辩前电脑坏了......
Text input, JS anti injection, web address recognition
Advanced pointer (C language super detailed pointer introduction)
JVM memory structure analysis (easy to understand)
Decision engine system & real-time index calculation & risk situation awareness system & risk data list system & fraud intelligence system
【yolov3损失函数】
The computer broke before the defense
【模型压缩- 剪枝/量化/蒸馏/AutoML】
Open source framework support for range mode