当前位置:网站首页>Quantitative calculation research
Quantitative calculation research
2022-07-03 11:53:00 【Diros1g】
1.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1.1 methods
Prunes the network: Keep only some important connections ;
Quantize the weights: Share some through weight quantization weights;
Huffman coding: Further compression by Huffman coding ;
1.1.1 prune
1. Learn to connect from the normally trained network
2. Trim connections with small weights , Those less than the threshold are deleted ;
3 Retraining the web , Its parameters are obtained from the remaining sparse connections ;
1.1.2 Quantification and weight sharing
quantitative


Weight sharing
It uses a very simple K-means, Make one for each floor weight The clustering , Belong to the same cluster Of share the same weight size . One thing to watch out for : Cross layer weight Do not share weights ;
1.1.3 Huffman code
Huffman coding will first use the frequency of characters to create a tree , Then a specific code is generated for each character through the structure of the tree , Characters with high frequency use shorter encoding , If the frequency is low, a longer code is used , This will reduce the average length of the encoded string , So as to achieve the purpose of data lossless compression .
1.2 effect
Pruning: Reduce the number of connections to the original 1/13~1/9;
Quantization: Every connection from the original 32bits Reduced to 5bits;
Final effect :
- hold AlextNet Compressed 35 times , from 240MB, Reduce to 6.9MB;
- hold VGG-16 Compressed 49 north , from 552MB Reduce to 11.3MB;
- The calculation speed is the original 3~4 times , Energy consumption is the original 3~7 times ;
1.3 The experimental requirements
1. The weight preservation requirements of training are complete , It can't be model.state_dict(), But most of our current weight files are parameter states rather than complete models
2. It requires a complete network structure
3. Have enough training data
Reference resources :1.【 Deep neural network compression 】Deep Compression
2. Deep compression : Adopt pruning , Quantization training and Huffman coding to compress the depth neural network
3.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
4.pytorch Official documents
5. Understanding of Huffman coding (Huffman Coding)
2.torch The official quantitative calculation
https://pytorch.apachecn.org/#/
https://pytorch.org/docs/stable/quantization.html
2.0 Data type quantification Quantized Tensor
Can be stored int8/uint8/int32 Data of type , And carry with you scale、zero_point These parameters
>>> x = torch.rand(2,3, dtype=torch.float32)
>>> x
tensor([[0.6839, 0.4741, 0.7451],
[0.9301, 0.1742, 0.6835]])
>>> xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
tensor([[0.5000, 0.5000, 0.5000],
[1.0000, 0.0000, 0.5000]], size=(2, 3), dtype=torch.quint8,
quantization_scheme=torch.per_tensor_affine, scale=0.5, zero_point=8)
>>> xq.int_repr()
tensor([[ 9, 9, 9],
2.1 Two quantitative methods
2.2Post Training Static Quantization, Static quantization after model training torch.quantize_per_tensor
scale ( scale ) and zero_point( Zero position ) You need to customize it . The quantified model , Can't train ( It's not back propagation ), Nor can you reason , After dequantization , In order to do the calculation
2.3. Post Training Dynamic Quantization, Dynamic quantification after model training : torch.quantization.quantize_dynamic
The system automatically selects the most appropriate scale ( scale ) and zero_point( Zero position ), No need to customize . The quantified model , You can infer , But not training ( It's not back propagation )
边栏推荐
- OpenGL 索引缓存对象EBO和线宽模式
- Dynamically monitor disk i/o with ZABBIX
- Xml的(DTD,xml解析,xml建模)
- Vulnhub geminiinc
- Sheet1$. Output [excel source output] Error in column [xxx]. The returned column status is: "the text is truncated, or one or more characters have no matches in the target code page.".
- Niuniu's team competition
- vulnhub之Ripper
- ORACLE进阶(一) 通过EXPDP IMPDP命令实现导dmp
- Phpcms prompt message page Jump to showmessage
- previous permutation lintcode51
猜你喜欢
随机推荐
STL教程8-map
Qt OpenGL 旋转、平移、缩放
STL教程9-容器元素深拷贝和浅拷贝问题
Master and backup role election strategy in kept
STL教程10-容器共性和使用场景
vulnhub之raven2
STL Tutorial 9 deep copy and shallow copy of container elements
剑指offer专项32-96题做题笔记
Groovy test class and JUnit test
银泰百货点燃城市“夜经济”
如何将数字字符串转换为整数
Unity3D学习笔记5——创建子Mesh
libvirt 中体验容器
R语言ggplot2可视化:gganimate包创建动态折线图动画(gif)、使用transition_reveal函数在动画中沿给定维度逐步显示数据、在折线移动方向添加数据点
DNS multi-point deployment IP anycast+bgp actual combat analysis
Sheet1$. Output [excel source output] Error in column [xxx]. The returned column status is: "the text is truncated, or one or more characters have no matches in the target code page.".
Uniapp implementation Click to load more
R language uses grid of gridextra package The array function combines multiple visual images of the ggplot2 package horizontally, and the ncol parameter defines the number of columns of the combined g
Keepalived中Master和Backup角色选举策略
How to make others fear you









![Capturing and sorting out external Fiddler -- Conversation bar and filter [2]](/img/04/e9cc027d753e7049f273d866eefdce.png)