当前位置：网站首页>Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning

Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning

2022-07-28 06:09:00 【Food to doubt life】

Preface

This article was published in ICCV2021, It is an article about incremental learning combined with self-monitoring , The problem studied in this paper is class Incremental

This paper will summarize the methods proposed in the paper , And make a simple analysis of the experimental part , Finally, let me talk about my views on this article

Method

There are three methods in this paper , Respectively SPB、SPB-I、SPB-M, This article will introduce the three in turn

SPB

SPB Namely UCIR Variants , When task T When the training data comes , The author uses feature extractor to extract task T Training data embedding, Yes embedding Conduct normalization（ Should be L2 normalization ）, After normalization, the same class embedding Take the average value to class prototype, utilize class prototype Initialize the classifier （ monolayer FC layer ） The corresponding weight of , for instance , classifier （ monolayer FC layer ） It can be regarded as a matrix $C * D$ Matrix , $D$ Is the output dimension of the feature extractor , $C$ Is the number of categories , When new $N$ When categories arrive , The matrix size is expanded to $(C + N) * D$ , polymeric $N$ Line is initialized in the above way .

Set the feature extractor to $i$ The output of this picture is $f(x_i)$ ,L2 After normalization, it is $\overline{f(x_i)}$ , The... In the classifier $c$ The weight corresponding to the class is $w_c$ （ That is, the second of the matrix in the above example $c$ That's ok ）,L2 Owned by one becomes $\overline {w_c}$ , Then the classifier is $c$ The output of class is
$P_c(x_i)=\frac{\exp(\lambda \overline{f(x_i)}^T \ \overline{w_c})}{\sum_j \exp(\lambda \overline{f(x_i)}^T \ \overline{w_j}))}\tag{1.0}$
among $\lambda$ Is a super parameter . This paper distills knowledge through the feature level , So as to prevent forgetting . For images $x_i$ , Output of the old feature extractor L2 After normalization, it is $\overline{f^o(x_i)}$ , Output of the new feature extractor L2 After normalization, it is $\overline{f^n(x_i)}$ , Then knowledge distillation loss by
$L_{em}=||\overline{f^n(x_i)}-\overline{f^o(x_i)}||^2\tag{2.0}$

Let the cross entropy loss function be $L_{ce}$ , The number of old categories is $N_{oc}$ , The number of new categories is $N_{nc}$ , be SPB In general loss by

$L=\frac{N_{nc}}{N_{oc}}L_{ce}+\frac{N_{oc}}{N_{nc}}L_{em}\tag{3.0}$

As the number of learning categories increases , $L_{em}$ The weight of will be bigger and bigger , So as to prevent forgetting .

SPB Nothing new has been proposed

SPB-I

SPB-I stay SPB Self supervision is introduced on the basis of , Some redundant features are encoded by self-monitoring , These redundant features may be used to build new tasks ,SPB-I Two kinds of self supervised tasks are introduced , One is comparative learning , One is rotation prediction , These two kinds of tasks are actually building a more robust feature space , It does not solve the problem of catastrophic forgetting .

Comparative learning

Given an image , The author exerts $N$ Secondary data enhancement , The image composition before and after data enhancement is a positive example , Negative example pairs are formed between different images , The output of the feature extractor passes through a double layer FC layer （ Write it down as $\delta$ ）, Contrast learning loss by
$L_{in}=\sum_{i}\frac{\exp(\lambda \overline{\delta(f(x_i))}^T \ \overline{\delta(f(x^{'}_i))}/T)}{\sum_{x_j \in{\{x^{ng},x'_i\}}} \exp(\lambda \overline{\delta(f(x_j))}^T \ \overline{\delta(f(x_i))}/T)}\tag{4.0}$

$x^{ng}$ Represents an image $x_i$ Negative example set of , $x^{'}_i$ yes $x_i$ The result of data enhancement .

Rotation prediction

After a certain rotation of an image , Input to the feature extractor , Output of the feature extractor （ No global pooling ） After two residual BasicBlocks as well as cosine classifier Handle , The rotation angle of the image is ${0^.、90^.、180^.、270^.\}$ Four kinds of , The model needs to predict the rotation angle of the image , That is, four categories . By default ,SPB-I Use this supervisory task .

SPB-M

SPB-M yes SPB The revision of （ No SPB—I）, The image is rotated before being input into the model , The rotation angle has $\gamma$ Kind of , Each rotation angle has a corresponding classifier , rotate 90 Degree and 270 The classifier of degree is different , It means there are $\gamma$ A classifier , this $\gamma$ Classifiers are used for classification prediction , The corresponding loss function is
$L_{mp}=\frac{1}{\gamma}\sum_{b=1}^\gamma L_{ce}^b\tag{5.0}$

$L_{em}$ Distillation function for the knowledge mentioned above , The total loss function is

$L=\frac{N_{nc}}{N_{oc}}L_{mp}+\frac{N_{oc}}{N_{nc}}L_{em}\tag{6.0}$

experiment

This paper is based on data free, Old data will not be stored , And others method stay CIFAR100 and imagenet-subset The results are as follows ：
Insert picture description here
In the initial stage , The method proposed in this paper is better than all baseline Higher than 4%~8%, generally speaking , The initial stage is 1%~2% The floating of is normal , The initial fluctuation of this article is so large , It is enough to show that there are problems in the experiment , But it seems that the reviewer did not find this fatal problem .

Due to the use of additional data enhancements , The performance improvement of the model may come from data enhancement , The author also noticed this , Therefore, the following ablation experiments were performed
Insert picture description here
According to the first grid , The author verifies that using self supervised data enhancement alone cannot improve the performance of the model , It is proved that the performance improvement mainly comes from self-monitoring

reflection

There are some problems in the experimental part of this paper , and 《 Deep learning ( Incremental learning )——ICCV2022：Contrastive Continual Learning》 Different , This paper does not refer to the role of self supervision in incremental learning , There is a strong A+B Taste of thesis , This is not very to my taste , But in general, it does show that self-monitoring helps to build a more robust feature space .

原网站

版权声明
本文为[Food to doubt life]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280518199471.html