当前位置:网站首页>Deep learning model compression and acceleration technology (VII): mixed mode
Deep learning model compression and acceleration technology (VII): mixed mode
2022-07-07 20:18:00 【Breeze_】
Catalog
Compression and acceleration of deep learning model refers to The model is simplified by using the redundancy of neural network parameters and the redundancy of network structure , Without affecting the completion of the task , Get fewer parameters 、 A more streamlined model . The compressed model has smaller computational resource requirements and memory requirements , Compared with the original model, it can meet a wider range of application needs . In the context of the increasing popularity of deep learning technology , The strong application demand for deep learning model makes people occupy less memory 、 Low computing resource requirements 、 At the same time, it still ensures a high accuracy “ Little model ” Pay special attention to . The model compression and acceleration of deep learning by using the redundancy of neural network has aroused widespread interest in academia and industry , All kinds of jobs emerge in endlessly .
In this paper, the reference 2021 Published in the Journal of software 《 A survey of compression and acceleration of deep learning models 》 Summarized and learned .
Related links :
Deep learning model compression and acceleration technology ( One ): Parameter pruning
Deep learning model compression and acceleration technology ( Two ): Parameter quantification
Deep learning model compression and acceleration technology ( 3、 ... and ): Low rank decomposition
Deep learning model compression and acceleration technology ( Four ): Parameters of the Shared
Deep learning model compression and acceleration technology ( 5、 ... and ): Compact network
Deep learning model compression and acceleration technology ( 7、 ... and ): Mixed mode
summary
Model compression and acceleration technology | describe |
---|---|
Parameter pruning (A) | Design evaluation criteria for the importance of parameters , Based on this criterion, the importance of network parameters is judged , Delete redundant parameters |
Parameter quantification (A) | Change the network parameters from 32 Bit full precision floating-point number quantized to lower bits |
Low rank decomposition (A) | The dimension reduction of high-dimensional parameter vector is decomposed into sparse low-dimensional vector |
Parameters of the Shared (A) | Using structured matrix or clustering method to map the internal parameters of the network |
Compact network (B) | From convolution kernel 、 Special layer and network structure 3 Design a new lightweight network at three levels |
Distillation of knowledge (B) | Refine the information from the larger teacher model to the smaller student model |
Mixed mode (A+B) | The combination of the first several methods |
A: Compression parameters B: Compression structure
Mixed mode
Definition
A combination of commonly used model compression and acceleration techniques , It's a hybrid way .
characteristic
The hybrid method can integrate the advantages of various compression and acceleration methods , Further strengthen the compression and acceleration effect , It will be an important research direction in the field of deep learning model compression and acceleration in the future .
1. Combine parameter pruning and parameter quantification
- Ullrich wait forsomeone [165] be based on Soft weight sharing The regularization term of , Parameter quantification and parameter pruning are realized in the process of model retraining .
- Tung wait forsomeone [166] An integrated compression and acceleration framework of parameter pruning and parameter quantization is proposed Compression learning by in parallel pruning-quantization(CLIP-Q).
- Han wait forsomeone [167] Put forward Deep compression, Prune parameters 、 Combination of parameter quantization and Huffman coding , Achieved a good compression effect ; And on the basis of it, soft / Hardware co compression design , Put forward Efficient inference engine(Eie) frame [168].
- Dubey wait forsomeone [169] Also use this 3 A combination of two methods for network compression .
2. Combine parameter pruning and parameter sharing
- Louizos wait forsomeone [170] Adopt Bayesian principle , Introduce sparsity through prior distribution to prune the network , Use a posteriori uncertainty to determine the optimal fixed-point accuracy to encode weights .
- Ji wait forsomeone [171] Enter by reordering / Output dimension for pruning , The irregular distribution weights with small values are clustered into structured groups , Achieve better hardware utilization and higher sparsity .
- Zhang wait forsomeone [172] Not only do regularizers encourage sparsity , It also learns which parameter groups should share a common value to explicitly identify highly correlated neurons .
3. Combined parameter quantification and knowledge distillation
- Polino wait forsomeone [173] It is proposed to add knowledge distillation loss Quantitative training method , There are floating-point model and quantitative model , Use quantitative model to calculate forward loss, And calculate the gradient , To update the floating point model . Before each forward calculation , Update the quantization model with the updated floating-point model .
- Mishra wait forsomeone [174] It is proposed to use high-precision teacher model to guide the training of low-precision student model , Yes 3 Ideas : The teacher model and the quantified student model are jointly trained ; The pre trained teacher model guides the quantitative student model to train from scratch ; Both the teacher model and the student model are pre trained , But the student model has been quantified , Then fine tune under the guidance of the teacher model .
reference
[165] Ullrich K, Meeds E, Welling M. Soft weight-sharing for neural network compression. arXiv Preprint arXiv: 1702.04008, 2017.
[166] Tung F, Mori G. Clip-q: Deep network compression learning by in-parallel pruning-quantization. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 78737882.
[167] Han S, Mao H, Dally WJ. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv Preprint arXiv: 1510.00149, 2015.
[168] Han S, Liu X, Mao H, et al. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 2016,44(3):243254.
[169] Dubey A, Chatterjee M, Ahuja N. Coreset-based neural network compression. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 454470.
[170] Louizos C, Ullrich K, Welling M. Bayesian compression for deep learning. In: Advances in Neural Information Processing Systems.\2017. 32883298.
[171] Ji Y, Liang L, Deng L, et al. TETRIS: Tile-matching the tremendous irregular sparsity. In: Advances in Neural Information Processing Systems. 2018. 41154125.
[172] Zhang D, Wang H, Figueiredo M, et al. Learning to share: Simultaneous parameter tying and sparsification in deep learning. In: Proc. of the 6th Int’l Conf. on Learning Representations. 2018.
[173] Polino A, Pascanu R, Alistarh D. Model compression via distillation and quantization. arXiv Preprint arXiv: 1802.05668, 2018.
[174] Mishra A, Marr D. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv Preprint arXiv: 1711.05852, 2017.
边栏推荐
- Mrs offline data analysis: process OBS data through Flink job
- [solution] package 'XXXX' is not in goroot
- CSDN语法说明
- 2022如何评估与选择低代码开发平台?
- Jenkins 用户权限管理
- 图扑数字孪生煤矿开采系统,打造采煤“硬实力”
- Prometheus remote_write InfluxDB,unable to parse authentication credentials,authorization failed
- 《数字图像处理原理与实践(MATLAB版)》一书之代码Part2[通俗易懂]
- TS快速入门-泛型
- Chapter 9 Yunji datacanvas company won the highest honor of the "fifth digital finance innovation competition"!
猜你喜欢
[philosophy and practice] the way of program design
Machine learning notes - explore object detection datasets using streamlit
Jenkins 用户权限管理
ASP. Net learning & ASP's one word
AIRIOT助力城市管廊工程,智慧物联守护城市生命线
vulnhub之school 1
Network principle (1) - overview of basic principles
Cloud component development and upgrading
About cv2 dnn. Readnetfromonnx (path) reports error during processing node with 3 inputs and 1 outputs [exclusive release]
机器学习笔记 - 使用Streamlit探索对象检测数据集
随机推荐
力扣674. 最长连续递增序列
【解决】package ‘xxxx‘ is not in GOROOT
Graduation season | regretful and lucky graduation season
使用高斯Redis实现二级索引
How to cooperate among multiple threads
Network principle (1) - overview of basic principles
pom. Brief introduction of XML configuration file label function
Force buckle 1790 Can two strings be equal by performing string exchange only once
图扑数字孪生煤矿开采系统,打造采煤“硬实力”
Force buckle 459 Duplicate substring
浅尝不辄止系列之试试腾讯云的TUIRoom(晚上有约,未完待续...)
微服务远程Debug,Nocalhost + Rainbond微服务开发第二弹
Force buckle 2319 Judge whether the matrix is an X matrix
Solve the problem of incomplete display around LCD display of rk3128 projector
Micro service remote debug, nocalhost + rainbow micro service development second bullet
一键部署Redis任意版本
Force buckle 2315 Statistical asterisk
力扣 2319. 判断矩阵是否是一个 X 矩阵
Machine learning notes - explore object detection datasets using streamlit
Implement secondary index with Gaussian redis