当前位置:网站首页>Quantitative calculation research
Quantitative calculation research
2022-07-03 11:53:00 【Diros1g】
1.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1.1 methods
Prunes the network: Keep only some important connections ;
Quantize the weights: Share some through weight quantization weights;
Huffman coding: Further compression by Huffman coding ;
1.1.1 prune
1. Learn to connect from the normally trained network
2. Trim connections with small weights , Those less than the threshold are deleted ;
3 Retraining the web , Its parameters are obtained from the remaining sparse connections ;
1.1.2 Quantification and weight sharing
quantitative
Weight sharing
It uses a very simple K-means, Make one for each floor weight The clustering , Belong to the same cluster Of share the same weight size . One thing to watch out for : Cross layer weight Do not share weights ;
1.1.3 Huffman code
Huffman coding will first use the frequency of characters to create a tree , Then a specific code is generated for each character through the structure of the tree , Characters with high frequency use shorter encoding , If the frequency is low, a longer code is used , This will reduce the average length of the encoded string , So as to achieve the purpose of data lossless compression .
1.2 effect
Pruning: Reduce the number of connections to the original 1/13~1/9;
Quantization: Every connection from the original 32bits Reduced to 5bits;
Final effect :
- hold AlextNet Compressed 35 times , from 240MB, Reduce to 6.9MB;
- hold VGG-16 Compressed 49 north , from 552MB Reduce to 11.3MB;
- The calculation speed is the original 3~4 times , Energy consumption is the original 3~7 times ;
1.3 The experimental requirements
1. The weight preservation requirements of training are complete , It can't be model.state_dict(), But most of our current weight files are parameter states rather than complete models
2. It requires a complete network structure
3. Have enough training data
Reference resources :1.【 Deep neural network compression 】Deep Compression
2. Deep compression : Adopt pruning , Quantization training and Huffman coding to compress the depth neural network
3.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
4.pytorch Official documents
5. Understanding of Huffman coding (Huffman Coding)
2.torch The official quantitative calculation
https://pytorch.apachecn.org/#/
https://pytorch.org/docs/stable/quantization.html
2.0 Data type quantification Quantized Tensor
Can be stored int8/uint8/int32 Data of type , And carry with you scale、zero_point These parameters
>>> x = torch.rand(2,3, dtype=torch.float32)
>>> x
tensor([[0.6839, 0.4741, 0.7451],
[0.9301, 0.1742, 0.6835]])
>>> xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
tensor([[0.5000, 0.5000, 0.5000],
[1.0000, 0.0000, 0.5000]], size=(2, 3), dtype=torch.quint8,
quantization_scheme=torch.per_tensor_affine, scale=0.5, zero_point=8)
>>> xq.int_repr()
tensor([[ 9, 9, 9],
2.1 Two quantitative methods
2.2Post Training Static Quantization, Static quantization after model training torch.quantize_per_tensor
scale ( scale ) and zero_point( Zero position ) You need to customize it . The quantified model , Can't train ( It's not back propagation ), Nor can you reason , After dequantization , In order to do the calculation
2.3. Post Training Dynamic Quantization, Dynamic quantification after model training : torch.quantization.quantize_dynamic
The system automatically selects the most appropriate scale ( scale ) and zero_point( Zero position ), No need to customize . The quantified model , You can infer , But not training ( It's not back propagation )
边栏推荐
- R language uses data The table package performs data aggregation statistics, calculates window statistics, calculates the median of sliding groups, and merges the generated statistical data into the o
- Excel quick cross table copy and paste
- After watching the video, AI model learned to play my world: cutting trees, making boxes, making stone picks, everything is good
- Repo ~ common commands
- GCC compilation process and dynamic link library and static link library
- typeScript
- 如何将数字字符串转换为整数
- Based on MCU, how to realize OTA differential upgrade with zero code and no development?
- 2022年中南大学夏令营面试经验
- cgroup简介
猜你喜欢
The excel table is transferred to word, and the table does not exceed the edge paper range
Web security summary
Machine learning 3.2 decision tree model learning notes (to be supplemented)
Viewing binary bin files with notepad++ editor
Vulnhub's Nagini
vulnhub之cereal
GCC compilation process and dynamic link library and static link library
vulnhub之raven2
Understand go language context in one article
鸿蒙第四次培训
随机推荐
Vulnhub's Tomato (tomato)
vulnhub之narak
Uniapp implementation Click to load more
Excel快速跨表复制粘贴
R语言使用gridExtra包的grid.arrange函数将lattice包的多个可视化图像横向组合起来,ncol参数自定义组合图列数、nrow参数自定义组合图行数
基于turtlebot3实现SLAM建图及自主导航仿真
The world's most popular font editor FontCreator tool
OpenGL 绘制彩色的三角形
R语言使用aggregate函数计算dataframe数据分组聚合的均值(sum)、不设置na.rm计算的结果、如果分组中包含缺失值NA则计算结果也为NA
Niuniu's team competition
ArcGIS应用(二十一)Arcmap删除图层指定要素的方法
vulnhub之pyexp
Modular programming of single chip microcomputer
Dynamic programming (interval DP)
How should intermediate software designers prepare for the soft test
Vulnhub narak
phpcms 提示信息頁面跳轉showmessage
Cacti monitors redis implementation process
Viewing binary bin files with notepad++ editor
Mmc5603nj geomagnetic sensor (Compass example)