当前位置:网站首页>Deep learning (incremental learning) - iccv2022:continuous continuous learning
Deep learning (incremental learning) - iccv2022:continuous continuous learning
2022-07-28 06:09:00 【Food to doubt life】
List of articles
Preface
CVPR2022 The incremental articles on are a little complicated , Specifically, many articles have proposed new experimental settings to evaluate the model , Not very personal , So there is not much summary , But there is an interesting article that combines causal inference with increment , The article, 《Distilling Causal Effect of Data in Class-Incremental Learning》,ICCV2022 The incremental articles on are more regular , A lot of work will combine self-monitoring with increment , This article will summarize 《Contrastive Continual Learning》
Contrastive Continual Learning
generally speaking , Supervised algorithms will gradually filter out features unrelated to the task , For example, the binary classification task of Sydney and apple , The feature extractor only needs to encode color related features , Let the classifier distinguish the two , This shows that the features of supervised coding are relatively limited . If we can train a feature extractor that can encode redundant features , Even if the feature extractor forgets some features , These redundant features may also be used to characterize new categories , Based on this , The author proposes to use self supervised training feature extractors with redundant features , At the same time, we should use knowledge distillation to resist catastrophic forgetting .
Set the current batch The Communist Party of China has N N N Zhang Current task Image , For this N N N Two data enhancements were applied to each image , obtain 2 N 2N 2N Zhang image , this 2 N 2N 2N Images form a set S S S, p i p_i pi For image i i i Set of positive examples of , Images i i i The positive example set of contains images i i i The image obtained after data enhancement , And with images i i i Images of the same kind , The feature extractor is for images i i i The output of is z i z_i zi, be Contrastive Continual Learning The loss function of is
L a s y m s u p = ∑ i ∈ S − 1 ∣ p i ∣ ∑ j ∈ p i log exp ( z i ∗ z j / T ) ∑ k ≠ i exp ( z i ∗ z k / T ) (1.0) L_{asym}^{sup}=\sum_{i \in S}\frac{-1}{|p_i|}\sum_{j\in p_i}\log \frac{\exp(z_i*z_j/T)}{\sum_{k \not= i }\exp(z_i*z_k/T)}\tag{1.0} Lasymsup=i∈S∑∣pi∣−1j∈pi∑log∑k=iexp(zi∗zk/T)exp(zi∗zj/T)(1.0)
In fact, that is InfoNCE Variants , T T T Is a super parameter , It is worth noting that , The old sample will only be used as a negative example in comparative learning , Only participate in the calculation ∑ k ≠ i exp ( z i ∗ z k / T ) \sum_{k \not= i }\exp(z_i*z_k/T) ∑k=iexp(zi∗zk/T) in , As shown below (IRD Knowledge distillation loss, As described below ), The author found that it would be better to do so 
set up M M M by batch Image set in , In the picture above L s u p L^{sup} Lsup by
L s u p = ∑ i ∈ M − 1 ∣ p i ∣ ∑ j ∈ p i log exp ( z i ∗ z j / T ) ∑ k ≠ i exp ( z i ∗ z k / T ) L^{sup}=\sum_{i \in M}\frac{-1}{|p_i|}\sum_{j\in p_i}\log \frac{\exp(z_i*z_j/T)}{\sum_{k \not= i }\exp(z_i*z_k/T)} Lsup=i∈M∑∣pi∣−1j∈pi∑log∑k=iexp(zi∗zk/T)exp(zi∗zj/T)
In order to resist catastrophic forgetting , The author introduces knowledge distillation , For images i i i, You can get a vector
P = [ P i , 1 , P i , 2 , . . . . P i , i − 1 , P i , i + 1 , . . . . P i , 2 N ] (2.0) P=[P_{i,1},P_{i,2},....P_{i,i-1},P_{i,i+1},....P_{i,2N}]\tag{2.0} P=[Pi,1,Pi,2,....Pi,i−1,Pi,i+1,....Pi,2N](2.0)
among
P i , j = log exp ( z i ∗ z j / T ) ∑ k ≠ i 2 N exp ( z i ∗ z k / T ) P_{i,j}=\log \frac{\exp(z_i*z_j/T)}{\sum_{k \not= i }^{2N}\exp(z_i*z_k/T)} Pi,j=log∑k=i2Nexp(zi∗zk/T)exp(zi∗zj/T)
For the image i i i, Set the formula of the output of the old model 2.0 by P o P_{o} Po, The formula of the new model output 2.0 by P n P_{n} Pn, Knowledge is distilled loss by
L I R D = − ∑ i = 1 2 N P o log P n (3.0) L_{IRD}=-\sum_{i=1}^{2N}P_{o}\log P_{n} \tag{3.0} LIRD=−i=1∑2NPologPn(3.0)
The total loss function is
L = L a s y m s u p + λ L I R D L=L_{asym}^{sup}+\lambda L_{IRD} L=Lasymsup+λLIRD
λ \lambda λ Is a super parameter , The above process is the training feature extractor , After the training of the feature extractor , The author will freeze Feature extractor , At the same time, add classifiers at the top , Use the retained old data and current data to train the classifier , The total amount of old data saved is fixed .
experiment
To verify that self-monitoring is better than supervised , More features can be encoded , The author did an experiment , After continuous learning , Fix the feature extractor , At the same time, all training data are provided to train the classifier , stay CIFAR10 On , The comparison with supervised algorithm is shown in the following figure 
Let's look at the two pictures on the right , This picture shows that after learning task i After the task , Fixed feature extractor , utilize CIFAR10 All data training classifier , stay CIFAR10 Accuracy on , It can be seen that self supervised learning can indeed encode some redundant features , And this kind of redundancy helps to characterize the characteristics of subsequent tasks .
And others method The experimental comparison of is shown in the following figure 
reflection
Incremental learning focuses on how to retain through a certain policy Acquire Old knowledge of , This article does not propose new strategies for this , Instead, it proposes how to make the learned features more conducive to coding new knowledge ( This problem is difficult to solve , Only chance , For example, new tasks have serious domain shift), let me put it another way , This article only gives a strategy to construct a more robust feature space , It doesn't seem to solve the root problem of incremental learning , I feel a little off topic when writing a composition . But in general, it is worth publishing , This article gives people insight That is, the self supervised model can encode some redundant features , Such redundant features may help code new tasks .
Self monitoring generally introduces additional data enhancements , If there is supervision, use such data to enhance , Whether the model can also encode redundant features ?
边栏推荐
- Digital collections strengthen reality with emptiness, enabling the development of the real economy
- 4个角度教你选小程序开发工具?
- What is the detail of the applet development process?
- 面试官:让你设计一套图片加载框架,你会怎么设计?
- self-attention学习笔记
- Service reliability guarantee -watchdog
- On July 7, the national wind 24 solar terms "Xiaoshu" came!! Attachment.. cooperation.. completion.. advance.. report
- 搭建集群之后崩溃的解决办法
- Uniapp WebView listens to the callback after the page is loaded
- Kubesphere installation version problem
猜你喜欢

【四】redis持久化(RDB与AOF)

What are the points for attention in the development and design of high-end atmospheric applets?

深度学习(增量学习)——(ICCV)Striking a Balance between Stability and Plasticity for Class-Incremental Learning

微信小程序开发语言一般有哪些?

小程序开发解决零售业的焦虑

微信团购小程序怎么做?一般要多少钱?
![Notice of attack: [bean Bingbing] send, sell, cash, draw, prize, etc](/img/53/d6db223712c4fe0cdcab313ec5dea8.jpg)
Notice of attack: [bean Bingbing] send, sell, cash, draw, prize, etc

强化学习——价值学习中的SARSA

深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning

小程序开发流程详细是什么呢?
随机推荐
Invalid packaging for parent POM x, must be “pom“ but is “jar“ @
tf.keras搭建神经网络功能扩展
Alpine, Debian replacement source
Digital collections become a new hot spot in tourism industry
小程序开发如何提高效率?
Svn incoming content cannot be updated, and submission error: svn: e155015: aborting commit: XXX remains in conflict
Two methods of covering duplicate records in tables in MySQL
Interface anti duplicate submission
How digital library realizes Web3.0 social networking
NLP中基于Bert的数据预处理
小程序开发系统有哪些优点?为什么要选择它?
深度学习——Pay Attention to MLPs
The project does not report an error, operates normally, and cannot request services
What should we pay attention to when making template application of wechat applet?
用于排序的sort方法
What is the detail of the applet development process?
What about the app store on wechat?
使用神经网络实现对天气的预测
深度学习——MetaFormer Is Actually What You Need for Vision
深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations