当前位置：网站首页>Deep learning (incremental learning) - iccv2022:continuous continuous learning

Deep learning (incremental learning) - iccv2022:continuous continuous learning

2022-07-28 06:09:00 【Food to doubt life】

List of articles

Preface
Contrastive Continual Learning
experiment
reflection

Preface

CVPR2022 The incremental articles on are a little complicated , Specifically, many articles have proposed new experimental settings to evaluate the model , Not very personal , So there is not much summary , But there is an interesting article that combines causal inference with increment , The article, 《Distilling Causal Effect of Data in Class-Incremental Learning》,ICCV2022 The incremental articles on are more regular , A lot of work will combine self-monitoring with increment , This article will summarize 《Contrastive Continual Learning》

Contrastive Continual Learning

generally speaking , Supervised algorithms will gradually filter out features unrelated to the task , For example, the binary classification task of Sydney and apple , The feature extractor only needs to encode color related features , Let the classifier distinguish the two , This shows that the features of supervised coding are relatively limited . If we can train a feature extractor that can encode redundant features , Even if the feature extractor forgets some features , These redundant features may also be used to characterize new categories , Based on this , The author proposes to use self supervised training feature extractors with redundant features , At the same time, we should use knowledge distillation to resist catastrophic forgetting .

Set the current batch The Communist Party of China has $N$ Zhang Current task Image , For this $N$ Two data enhancements were applied to each image , obtain $2 N$ Zhang image , this $2 N$ Images form a set $S$ , $p_i$ For image $i$ Set of positive examples of , Images $i$ The positive example set of contains images $i$ The image obtained after data enhancement , And with images $i$ Images of the same kind , The feature extractor is for images $i$ The output of is $z_i$ , be Contrastive Continual Learning The loss function of is
$L_{asym}^{sup}=\sum_{i \in S}\frac{-1}{|p_i|}\sum_{j\in p_i}\log \frac{\exp(z_i*z_j/T)}{\sum_{k \not= i }\exp(z_i*z_k/T)}\tag{1.0}$
In fact, that is InfoNCE Variants , $T$ Is a super parameter , It is worth noting that , The old sample will only be used as a negative example in comparative learning , Only participate in the calculation $\sum_{k \not= i }\exp(z_i*z_k/T)$ in , As shown below （IRD Knowledge distillation loss, As described below ）, The author found that it would be better to do so
Insert picture description here
set up $M$ by batch Image set in , In the picture above $L^{sup}$ by
$L^{sup}=\sum_{i \in M}\frac{-1}{|p_i|}\sum_{j\in p_i}\log \frac{\exp(z_i*z_j/T)}{\sum_{k \not= i }\exp(z_i*z_k/T)}$

In order to resist catastrophic forgetting , The author introduces knowledge distillation , For images $i$ , You can get a vector
$P=[P_{i,1},P_{i,2},....P_{i,i-1},P_{i,i+1},....P_{i,2N}]\tag{2.0}$
among
$P_{i,j}=\log \frac{\exp(z_i*z_j/T)}{\sum_{k \not= i }^{2N}\exp(z_i*z_k/T)}$
For the image $i$ , Set the formula of the output of the old model 2.0 by $P_{o}$ , The formula of the new model output 2.0 by $P_{n}$ , Knowledge is distilled loss by
$L_{IRD}=-\sum_{i=1}^{2N}P_{o}\log P_{n} \tag{3.0}$

The total loss function is
$L=L_{asym}^{sup}+\lambda L_{IRD}$
$\lambda$ Is a super parameter , The above process is the training feature extractor , After the training of the feature extractor , The author will freeze Feature extractor , At the same time, add classifiers at the top , Use the retained old data and current data to train the classifier , The total amount of old data saved is fixed .

experiment

To verify that self-monitoring is better than supervised , More features can be encoded , The author did an experiment , After continuous learning , Fix the feature extractor , At the same time, all training data are provided to train the classifier , stay CIFAR10 On , The comparison with supervised algorithm is shown in the following figure
Insert picture description here
Let's look at the two pictures on the right , This picture shows that after learning task i After the task , Fixed feature extractor , utilize CIFAR10 All data training classifier , stay CIFAR10 Accuracy on , It can be seen that self supervised learning can indeed encode some redundant features , And this kind of redundancy helps to characterize the characteristics of subsequent tasks .

And others method The experimental comparison of is shown in the following figure
Insert picture description here

reflection

Incremental learning focuses on how to retain through a certain policy Acquire Old knowledge of , This article does not propose new strategies for this , Instead, it proposes how to make the learned features more conducive to coding new knowledge （ This problem is difficult to solve , Only chance , For example, new tasks have serious domain shift）, let me put it another way , This article only gives a strategy to construct a more robust feature space , It doesn't seem to solve the root problem of incremental learning , I feel a little off topic when writing a composition . But in general, it is worth publishing , This article gives people insight That is, the self supervised model can encode some redundant features , Such redundant features may help code new tasks .

Self monitoring generally introduces additional data enhancements , If there is supervision, use such data to enhance , Whether the model can also encode redundant features ？

原网站

版权声明
本文为[Food to doubt life]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280518199532.html