当前位置:网站首页>Quantitative calculation research
Quantitative calculation research
2022-07-03 11:53:00 【Diros1g】
1.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1.1 methods
Prunes the network: Keep only some important connections ;
Quantize the weights: Share some through weight quantization weights;
Huffman coding: Further compression by Huffman coding ;
1.1.1 prune
1. Learn to connect from the normally trained network
2. Trim connections with small weights , Those less than the threshold are deleted ;
3 Retraining the web , Its parameters are obtained from the remaining sparse connections ;
1.1.2 Quantification and weight sharing
quantitative


Weight sharing
It uses a very simple K-means, Make one for each floor weight The clustering , Belong to the same cluster Of share the same weight size . One thing to watch out for : Cross layer weight Do not share weights ;
1.1.3 Huffman code
Huffman coding will first use the frequency of characters to create a tree , Then a specific code is generated for each character through the structure of the tree , Characters with high frequency use shorter encoding , If the frequency is low, a longer code is used , This will reduce the average length of the encoded string , So as to achieve the purpose of data lossless compression .
1.2 effect
Pruning: Reduce the number of connections to the original 1/13~1/9;
Quantization: Every connection from the original 32bits Reduced to 5bits;
Final effect :
- hold AlextNet Compressed 35 times , from 240MB, Reduce to 6.9MB;
- hold VGG-16 Compressed 49 north , from 552MB Reduce to 11.3MB;
- The calculation speed is the original 3~4 times , Energy consumption is the original 3~7 times ;
1.3 The experimental requirements
1. The weight preservation requirements of training are complete , It can't be model.state_dict(), But most of our current weight files are parameter states rather than complete models
2. It requires a complete network structure
3. Have enough training data
Reference resources :1.【 Deep neural network compression 】Deep Compression
2. Deep compression : Adopt pruning , Quantization training and Huffman coding to compress the depth neural network
3.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
4.pytorch Official documents
5. Understanding of Huffman coding (Huffman Coding)
2.torch The official quantitative calculation
https://pytorch.apachecn.org/#/
https://pytorch.org/docs/stable/quantization.html
2.0 Data type quantification Quantized Tensor
Can be stored int8/uint8/int32 Data of type , And carry with you scale、zero_point These parameters
>>> x = torch.rand(2,3, dtype=torch.float32)
>>> x
tensor([[0.6839, 0.4741, 0.7451],
[0.9301, 0.1742, 0.6835]])
>>> xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
tensor([[0.5000, 0.5000, 0.5000],
[1.0000, 0.0000, 0.5000]], size=(2, 3), dtype=torch.quint8,
quantization_scheme=torch.per_tensor_affine, scale=0.5, zero_point=8)
>>> xq.int_repr()
tensor([[ 9, 9, 9],
2.1 Two quantitative methods
2.2Post Training Static Quantization, Static quantization after model training torch.quantize_per_tensor
scale ( scale ) and zero_point( Zero position ) You need to customize it . The quantified model , Can't train ( It's not back propagation ), Nor can you reason , After dequantization , In order to do the calculation
2.3. Post Training Dynamic Quantization, Dynamic quantification after model training : torch.quantization.quantize_dynamic
The system automatically selects the most appropriate scale ( scale ) and zero_point( Zero position ), No need to customize . The quantified model , You can infer , But not training ( It's not back propagation )
边栏推荐
- 利用Zabbix动态监控磁盘I/O
- 在CoreOS下部署WordPress实例教程
- P3250 [hnoi2016] Network + [necpc2022] f.tree path tree section + segment tree maintenance heap
- VS2015的下载地址和安装教程
- Niuniu's team competition
- Understand go language context in one article
- 并发编程-单例
- 《剑指offer 04》二维数组查找
- Machine learning 3.2 decision tree model learning notes (to be supplemented)
- ORACLE进阶(一) 通过EXPDP IMPDP命令实现导dmp
猜你喜欢

This article explains the complex relationship between MCU, arm, MCU, DSP, FPGA and embedded system

Wrong arrangement (lottery, email)

Kibana~Kibana的安装和配置

《剑指offer 03》数组中重复的数字

小鹏 P7 撞护栏安全气囊未弹出,官方回应称撞击力度未达到弹出要求

OpenGL 绘制彩色的三角形

【学习笔记】dp 状态与转移

STL tutorial 10 container commonalities and usage scenarios

外插散点数据

Based on MCU, how to realize OTA differential upgrade with zero code and no development?
随机推荐
MySQL searches and sorts out common methods according to time
typeScript
《剑指offer 04》二维数组查找
Cacti monitors redis implementation process
MCDF实验1
Yintai department store ignites the city's "night economy"
【学习笔记】dp 状态与转移
基于turtlebot3实现SLAM建图及自主导航仿真
银泰百货点燃城市“夜经济”
vulnhub之GeminiInc
OpenStack中的测试分类
并发编程-单例
The world's most popular font editor FontCreator tool
Test classification in openstack
OpenGL 绘制彩色的三角形
Kibana - installation and configuration of kibana
外插散点数据
Visual Studio 2022下载及配置OpenCV4.5.5
After watching the video, AI model learned to play my world: cutting trees, making boxes, making stone picks, everything is good
Notes on 32-96 questions of sword finger offer