当前位置:网站首页>Deep learning (incremental learning) -- iccv2021:ss-il: separated softmax for incremental learning
Deep learning (incremental learning) -- iccv2021:ss-il: separated softmax for incremental learning
2022-07-28 06:10:00 【Food to doubt life】
List of articles
Preface
This paper solves the problem of catastrophic forgetting in continuous learning from the perspective of category imbalance .
Save some old data , There will be category imbalance between old and new data , This leads to the model paying too much attention to new data during training , Ignore old data , Leading to catastrophic forgetting .
This paper will briefly introduce the method proposed in this paper , And introduce some interesting experiments , Finally, I give my views on this article
Method
Motivation
The author first introduces GKD and TKD Two kinds of knowledge distillation loss, The differences are shown in the following figure 
Suppose the old model has 50 class , The new models are 60 class , Every task contain 10 Class data ,GKD The process of knowledge distillation is as follows
- Before the output of the new model 50 individual logit Conduct softmax operation
- The output of the old model 50 individual logit Conduct softmax operation
- Step one 、 The result of two is used to calculate the cross entropy , Step 2 is called cross entropy target Distribution
TKD The process of knowledge distillation is as follows
- Before the output of the new model 50 individual logit With task In units of , Every time 10 individual logit Calculation softmax, share 5 Group
- The output of the old model 50 individual logit With task In units of , Every time 10 individual logit Calculation softmax, share 5 Group
- Step one 、 The result of two By group Cross entropy calculation , Step 2 is called cross entropy target Distribution
Author points out GKD It will aggravate the classification preference of the model , As shown in the figure below 
Let's look at the picture on the right , for instance ,Current Task=3 Express Task 2 Model of , stay Task 3 Prediction results on training data , Green indicates that the classification result of the model is Task 2 The proportion of , Light pink indicates that the classification result of the model is Task 1 The proportion of , It can be seen that the model tends to divide the samples into recently learned task, If you use GKD Distillation of knowledge , It will cause the model to focus only on the recently learned old task, And ignore the earlier task.
The author in Bias caused by ordinary cross-entropy In the first section, it is pointed out that cross entropy will aggravate classification preference , And a simple explanation is made from the perspective of gradient , I don't quite agree with his statement ,one-hot In the case of coding , Cross entropy is equivalent to maximum likelihood estimation , Maximum likelihood estimation will consider the categories with more training data , It is more likely to appear in real scenes , Thus, it tends to divide the test samples into categories with more training data , This is the reason why I think there will be classification preference in cross entropy in the case of category imbalance .
Whether it's GKD Or cross entropy , They all do all the output of the classifier softmax, The author believes that this may lead to classification preference ( The text is as follows ), So the author puts forward Separated-Softmax.
- Above two observations suggest that the main reason for the prediction bias could be to compute the softmax probability by combining the old and new tasks altogether. Motivated by this, we propose Separated-Softmax for Incremen- tal Learning (SS-IL) in the next section.
Separated-Softmax
This paper will retain some old data , Suppose there are now 50 Class belongs to the old category ,10 Class belongs to the new category , When an image belongs to the old category , The front of the classifier 50 One output to do softmax Calculation , When an image belongs to a new category , After the classifier 10 Sub output softmax Calculation . besides , The author also proposes to use TKD Distillation of knowledge . The author will control one batch The proportion of new and old categories , amount to re-sample, But the article does not point out this trick What is the impact .
experiment

The accuracy of the initial stage must be basically consistent , The explanation given by the author is End2End The parameters of are not reasonable ,iCaRL Of NME The classifier is not good enough , This statement is not correct , The initial stage is trained with cross entropy , You adjust the parameters to make each method The fitting degree of is basically the same .
Ablation Experiment 
among L C E − S S L_{CE-SS} LCE−SS That is, the author proposed Separated-Softmax loss, It can be seen that the scheme proposed by the author can indeed slow down the classification preference
reflection
Of this article Motivation And the practice is quite strange . All outputs of the classifier do softmax How can it be the cause of classification preference ? Besides , The author divides the output of the classifier into two parts according to the new and old categories , Do the two parts separately softmax, This approach is a bit counter intuitive , for instance , Suppose we have four categories , The labels are 1、2、3、4, Old category is 1、2, The new category is 3、4, When we enter a category 4 The image of , How to guarantee the category 4 Output logit It must be better than the category 1、2、3 All are big ( Training can only guarantee the category 4 Output logit Than category 3 Big )?
Another part of the article contribution It is pointed out that TKD Than GKD The effect is good , From the confusion matrix ,TKD Compared with GKD, The degree of classification preference is indeed weaker .
Overall speaking , This article does have some highlights , It can give people some enlightenment , But I personally don't think it's enough ICCV Standards for publication
In recent years, the quality of incremental articles is relatively general , There are even many top papers that change the experimental settings according to the characteristics of the method , Such papers are difficult to give people some enlightenment , I prefer to see some papers that solve incremental learning from a new perspective , Even if the performance of these papers is not optimal . What this field lacks is a new perspective , Rather than a combination of various brush points . I hope reviewers will be more tolerant of papers that offer new perspectives , Instead of emphasizing performance.
边栏推荐
- 分布式集群架构场景化解决方案:集群时钟同步问题
- CertPathValidatorException:validity check failed
- 强化学习——多智能体强化学习
- 使用神经网络实现对天气的预测
- 深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations
- 强化学习——Proximal Policy Optimization Algorithms
- 强化学习——价值学习中的DQN
- 小程序商城制作一个需要多少钱?一般包括哪些费用?
- Self attention learning notes
- Distributed cluster architecture scenario optimization solution: session sharing problem
猜你喜欢

分布式集群架构场景优化解决方案:分布式ID解决方案

无约束低分辨率人脸识别综述一:用于低分辨率人脸识别的数据集

知识点21-泛型

微信小程序开发制作注意这几个重点方面

【2】 Redis basic commands and usage scenarios

深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning

深度学习(增量学习)——ICCV2021:SS-IL: Separated Softmax for Incremental Learning

Deep learning (self supervised: Moco V3): An Empirical Study of training self supervised vision transformers

小程序开发解决零售业的焦虑

What are the points for attention in the development and design of high-end atmospheric applets?
随机推荐
Distributed cluster architecture scenario optimization solution: distributed ID solution
transformer的理解
深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations
Shutter webivew input evokes camera albums
Xshell suddenly failed to connect to the virtual machine
vscode uniapp
强化学习——策略学习
Distributed lock database implementation
卷积神经网络
Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations
Applet development
Wechat applet development and production should pay attention to these key aspects
深度学习(自监督:CPC v2)——Data-Efficient Image Recognition with Contrastive Predictive Coding
【6】 Redis cache policy
matplotlib数据可视化
Flink CDC (MySQL as an example)
CertPathValidatorException:validity check failed
Two methods of covering duplicate records in tables in MySQL
XML parsing entity tool class
强化学习——Proximal Policy Optimization Algorithms