当前位置：网站首页>Raki's notes on reading paper: memory replace with data compression for continuous learning

Raki's notes on reading paper: memory replace with data compression for continuous learning

2022-06-11 19:22:00 【Sleepy Raki】

Abstract & Introduction & Related Work

Research tasks
Continuous learning
Existing methods and related work
Facing the challenge
1. The existing work is mainly based on a small memory buffer containing a small number of original data , This does not fully describe the old data distribution
2. Existing work often requires training additional parameters or distilling old features
Innovative ideas
1. In this work , We propose memory replay with data compression , To reduce the storage cost of old training samples , This increases the number of buffers they can store in memory
2. We propose a process based on the decisive point （DPPs） New method of , To effectively determine the appropriate compression quality of the currently arrived training samples , thus , Use a naive data compression algorithm with appropriate selection quality , More compressed data can be saved in limited storage space , So as to greatly improve the recent strong baseline
The experimental conclusion
A lot of experiments show that , Use a simple data compression algorithm , And properly select the quality , More compressed data can be saved in the limited storage space , So as to greatly improve the memory playback

And DNN Medium " artificial " Memory playback is different , An important feature of biological memory is to encode and replay old experiences in a highly compressed form to overcome catastrophic forgetting

Insert picture description here

Data compression

Data compression aims to improve the storage efficiency of files , Including lossless compression and lossy compression . Lossless compression needs to perfectly reconstruct the original data from the compressed data , This limits its compression ratio （Shannon, 1948）. by comparison , Lossy compression can achieve higher compression rate by reducing the original data , So it is widely used in practical applications . Typical handmade methods include JPEG（ or JPG）（Wallace,1992）, It is the most commonly used lossy compression algorithm （Mentzer wait forsomeone ,2020）,WebP（Lian & Shilei,2012） and JPEG2000（Rabbani,2002）. On the other hand , Neural compression methods usually rely on optimizing Shannon rate - Distortion tradeoffs , adopt RNNs（Toderici wait forsomeone ,2015;2017）、 Automatic encoder （Agustsson wait forsomeone ,2017） and GANs（Mentzer wait forsomeone ,2020） To achieve

METHOD

MEMORY REPLAY WITH DATA COMPRESSION

If the compression ratio is too high , It will affect the training effect , Intuitively speaking , This leads to a trade-off between quality and quantity ： If storage space is limited , Reduce the quality of data compression q Will increase the amount of compressed data that can be stored in the memory buffer $N_q^{mb}$ , vice versa

ad locum , We use JPEG（Wallace, 1992） Compress the image to evaluate the proposed idea ,JPEG Is a simple but commonly used lossy compression algorithm .JPEG Can save quality in [1, 100] Images in range , Reducing the quality will result in smaller file sizes . The use is equivalent to each type 20 A memory buffer for the original image （Hou wait forsomeone ,2019 year ）, We use a representative memory playback method , Such as LUCIR（Hou wait forsomeone ,2019 year ） and PODNet（Douillard wait forsomeone ,2020 year ） Yes JPEG Quality grid search . Pictured 2 Shown , The memory playback of compressed data with appropriate quality can greatly exceed the performance of the original data . However , Whether the mass is too large or too small , Will affect performance . especially , The quality of achieving optimal performance varies depending on the memory playback method . The methods of memory playback are different , But the segmentation is consistent for different number of incremental stages
Insert picture description here

QUALITY-QUANTITY TRADE-OFF

q The smaller it is （ The higher the compression ） The larger the data set , That is to say, choose the right one q To reproduce the memory

use LUCIR Study 5 Stage ImageNet-sub after , Original subset （ Light dots ） And its compressed subset （ Dark dot ） Of t-SNE Visualization features . From left to right , The number from 37,85 Add to 200, and JPEG The quality of 90,50 Reduced to 10. We mapped the five categories in the latest task , And marked with different colors . The crossing area is distributed outside
Insert picture description here
Choose the right one q To optimize the following maximum likelihood

$D_q^{mb}$ The construction of can basically be seen as a sampling problem , We apply deterministic point processes （DPPs） To establish conditional possibilities ,DPPs Not just a beautiful probabilistic sampling model , It can describe the probability of each possible subset by determinant , It can also provide a geometric interpretation of probability by the capacity of all elements in the subset

Insert picture description here
Due to the high complexity , It is difficult to optimize directly

Change to optimize $M_q^*$

Insert picture description here

VALIDATE OUR METHOD WITH GRID SEARCH RESULTS

Insert picture description here
Now we use the grid search results to verify the quality determined by our method , among LUCIR and PODNet stay JPEG The quality is 50 and 75 Of ImageNet-sub We've got the best performance

Because they are satisfying $|R_q - 1| < \epsilon$ Minimum mass of . therefore , The quality determined by our method is consistent with the grid search results , But the calculation cost is saved 100 Many times . Interestingly , For each mass $q$ , $<\epsilon$ Whether it is consistent in the average value of each incremental stage and all incremental stages . We're in the appendix D.4 Further discussion in $R_q$ A more dynamic situation

EXPERIMENT

Insert picture description here

CONCLUSION

In this work , We have put forward , Use data compression with the appropriate selected compression quality , More compressed data can be saved in limited storage space , Thus, the efficiency of memory playback is greatly improved . In order to effectively determine the compression quality , We provide a process based on the decisive point （DPPs） New method of , To avoid repeated training , Our method is verified in the class incremental learning and semi supervised continuous learning of object detection . Our work not only provides an important but underexplored baseline , It also opens up a promising new way for continuous learning . Further work can be done to develop an adaptive compression algorithm for incremental data to improve the compression rate , Or propose a new regularization method to constrain the distribution changes caused by data compression . meanwhile , be based on DPPs The theoretical analysis can be used as a general framework to integrate the optimizable variables in memory playback , For example, the strategy of selecting prototypes . Besides , Our work also proposes how to save a batch of training data in a limited storage space , To best describe its distribution , This will promote a wider application in the field of data compression and data selection

Remark

Some improvements have been achieved by data compression , But I can't understand a bunch of complicated formulas , It proves that compressing data to store more pictures for recall can obtain better performance , Not bad , If it could be simpler, I would like it very much paper

原网站

版权声明
本文为[Sleepy Raki]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203011829412455.html