当前位置:网站首页>Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning
Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning
2022-07-28 06:09:00 【Food to doubt life】
List of articles
Preface
This article was published in ICCV2021, It is an article about incremental learning combined with self-monitoring , The problem studied in this paper is class Incremental
This paper will summarize the methods proposed in the paper , And make a simple analysis of the experimental part , Finally, let me talk about my views on this article
Method
There are three methods in this paper , Respectively SPB、SPB-I、SPB-M, This article will introduce the three in turn
SPB
SPB Namely UCIR Variants , When task T When the training data comes , The author uses feature extractor to extract task T Training data embedding, Yes embedding Conduct normalization( Should be L2 normalization ), After normalization, the same class embedding Take the average value to class prototype, utilize class prototype Initialize the classifier ( monolayer FC layer ) The corresponding weight of , for instance , classifier ( monolayer FC layer ) It can be regarded as a matrix C ∗ D C*D C∗D Matrix , D D D Is the output dimension of the feature extractor , C C C Is the number of categories , When new N N N When categories arrive , The matrix size is expanded to ( C + N ) ∗ D (C+N)*D (C+N)∗D, polymeric N N N Line is initialized in the above way .
Set the feature extractor to i i i The output of this picture is f ( x i ) f(x_i) f(xi),L2 After normalization, it is f ( x i ) ‾ \overline{f(x_i)} f(xi), The... In the classifier c c c The weight corresponding to the class is w c w_c wc( That is, the second of the matrix in the above example c c c That's ok ),L2 Owned by one becomes w c ‾ \overline {w_c} wc, Then the classifier is c c c The output of class is
P c ( x i ) = exp ( λ f ( x i ) ‾ T w c ‾ ) ∑ j exp ( λ f ( x i ) ‾ T w j ‾ ) ) (1.0) P_c(x_i)=\frac{\exp(\lambda \overline{f(x_i)}^T \ \overline{w_c})}{\sum_j \exp(\lambda \overline{f(x_i)}^T \ \overline{w_j}))}\tag{1.0} Pc(xi)=∑jexp(λf(xi)T wj))exp(λf(xi)T wc)(1.0)
among λ \lambda λ Is a super parameter . This paper distills knowledge through the feature level , So as to prevent forgetting . For images x i x_i xi, Output of the old feature extractor L2 After normalization, it is f o ( x i ) ‾ \overline{f^o(x_i)} fo(xi), Output of the new feature extractor L2 After normalization, it is f n ( x i ) ‾ \overline{f^n(x_i)} fn(xi), Then knowledge distillation loss by
L e m = ∣ ∣ f n ( x i ) ‾ − f o ( x i ) ‾ ∣ ∣ 2 (2.0) L_{em}=||\overline{f^n(x_i)}-\overline{f^o(x_i)}||^2\tag{2.0} Lem=∣∣fn(xi)−fo(xi)∣∣2(2.0)
Let the cross entropy loss function be L c e L_{ce} Lce, The number of old categories is N o c N_{oc} Noc, The number of new categories is N n c N_{nc} Nnc, be SPB In general loss by
L = N n c N o c L c e + N o c N n c L e m (3.0) L=\frac{N_{nc}}{N_{oc}}L_{ce}+\frac{N_{oc}}{N_{nc}}L_{em}\tag{3.0} L=NocNncLce+NncNocLem(3.0)
As the number of learning categories increases , L e m L_{em} Lem The weight of will be bigger and bigger , So as to prevent forgetting .
SPB Nothing new has been proposed
SPB-I
SPB-I stay SPB Self supervision is introduced on the basis of , Some redundant features are encoded by self-monitoring , These redundant features may be used to build new tasks ,SPB-I Two kinds of self supervised tasks are introduced , One is comparative learning , One is rotation prediction , These two kinds of tasks are actually building a more robust feature space , It does not solve the problem of catastrophic forgetting .
Comparative learning
Given an image , The author exerts N N N Secondary data enhancement , The image composition before and after data enhancement is a positive example , Negative example pairs are formed between different images , The output of the feature extractor passes through a double layer FC layer ( Write it down as δ \delta δ), Contrast learning loss by
L i n = ∑ i exp ( λ δ ( f ( x i ) ) ‾ T δ ( f ( x i ′ ) ) ‾ / T ) ∑ x j ∈ { x n g , x i ′ } exp ( λ δ ( f ( x j ) ) ‾ T δ ( f ( x i ) ) ‾ / T ) (4.0) L_{in}=\sum_{i}\frac{\exp(\lambda \overline{\delta(f(x_i))}^T \ \overline{\delta(f(x^{'}_i))}/T)}{\sum_{x_j \in{\{x^{ng},x'_i\}}} \exp(\lambda \overline{\delta(f(x_j))}^T \ \overline{\delta(f(x_i))}/T)}\tag{4.0} Lin=i∑∑xj∈{ xng,xi′}exp(λδ(f(xj))T δ(f(xi))/T)exp(λδ(f(xi))T δ(f(xi′))/T)(4.0)
x n g x^{ng} xng Represents an image x i x_i xi Negative example set of , x i ′ x^{'}_i xi′ yes x i x_i xi The result of data enhancement .
Rotation prediction
After a certain rotation of an image , Input to the feature extractor , Output of the feature extractor ( No global pooling ) After two residual BasicBlocks as well as cosine classifier Handle , The rotation angle of the image is { 0 . 、 9 0 . 、 18 0 . 、 27 0 . } \{0^.、90^.、180^.、270^.\} { 0.、90.、180.、270.} Four kinds of , The model needs to predict the rotation angle of the image , That is, four categories . By default ,SPB-I Use this supervisory task .
SPB-M
SPB-M yes SPB The revision of ( No SPB—I), The image is rotated before being input into the model , The rotation angle has γ \gamma γ Kind of , Each rotation angle has a corresponding classifier , rotate 90 Degree and 270 The classifier of degree is different , It means there are γ \gamma γ A classifier , this γ \gamma γ Classifiers are used for classification prediction , The corresponding loss function is
L m p = 1 γ ∑ b = 1 γ L c e b (5.0) L_{mp}=\frac{1}{\gamma}\sum_{b=1}^\gamma L_{ce}^b\tag{5.0} Lmp=γ1b=1∑γLceb(5.0)
L e m L_{em} Lem Distillation function for the knowledge mentioned above , The total loss function is
L = N n c N o c L m p + N o c N n c L e m (6.0) L=\frac{N_{nc}}{N_{oc}}L_{mp}+\frac{N_{oc}}{N_{nc}}L_{em}\tag{6.0} L=NocNncLmp+NncNocLem(6.0)
experiment
This paper is based on data free, Old data will not be stored , And others method stay CIFAR100 and imagenet-subset The results are as follows :
In the initial stage , The method proposed in this paper is better than all baseline Higher than 4%~8%, generally speaking , The initial stage is 1%~2% The floating of is normal , The initial fluctuation of this article is so large , It is enough to show that there are problems in the experiment , But it seems that the reviewer did not find this fatal problem .
Due to the use of additional data enhancements , The performance improvement of the model may come from data enhancement , The author also noticed this , Therefore, the following ablation experiments were performed 
According to the first grid , The author verifies that using self supervised data enhancement alone cannot improve the performance of the model , It is proved that the performance improvement mainly comes from self-monitoring
reflection
There are some problems in the experimental part of this paper , and 《 Deep learning ( Incremental learning )——ICCV2022:Contrastive Continual Learning》 Different , This paper does not refer to the role of self supervision in incremental learning , There is a strong A+B Taste of thesis , This is not very to my taste , But in general, it does show that self-monitoring helps to build a more robust feature space .
边栏推荐
- 强化学习——连续控制
- 用于排序的sort方法
- 深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning
- 【6】 Redis cache policy
- 小程序搭建制作流程是怎样的?
- How digital library realizes Web3.0 social networking
- Flink CDC (MySQL as an example)
- 小程序开发如何提高效率?
- Svn incoming content cannot be updated, and submission error: svn: e155015: aborting commit: XXX remains in conflict
- NLP中常用的utils
猜你喜欢

Distributed lock redis implementation

Interface anti duplicate submission

小程序开发流程详细是什么呢?

1: Why should databases be divided into databases and tables

微信团购小程序怎么做?一般要多少钱?

Dataset class loads datasets in batches

How digital library realizes Web3.0 social networking

pytorch深度学习单卡训练和多卡训练
![Notice of attack: [bean Bingbing] send, sell, cash, draw, prize, etc](/img/53/d6db223712c4fe0cdcab313ec5dea8.jpg)
Notice of attack: [bean Bingbing] send, sell, cash, draw, prize, etc

强化学习——策略学习
随机推荐
Distributed cluster architecture scenario optimization solution: session sharing problem
Linux(centOs7) 下安装redis
强化学习——不完全观测问题、MCTS
面试官:让你设计一套图片加载框架,你会怎么设计?
Kubesphere installation version problem
No module named yum
It's not easy to travel. You can use digital collections to brush the sense of existence in scenic spots
The signature of the update package is inconsistent with that of the installed app
搭建集群之后崩溃的解决办法
【三】redis特点功能
Record the problems encountered in online capacity expansion server nochange: partition 1 is size 419428319. It cannot be grown
Installing redis under Linux (centos7)
深度学习(自监督:MoCo V3):An Empirical Study of Training Self-Supervised Vision Transformers
深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations
Dataset类分批加载数据集
【5】 Redis master-slave synchronization and redis sentinel (sentinel)
【四】redis持久化(RDB与AOF)
Flink CDC (MySQL as an example)
【六】redis缓存策略
分布式集群架构场景优化解决方案:Session共享问题