当前位置：网站首页>Dynamic extensible representation for category incremental learning -- der

Dynamic extensible representation for category incremental learning -- der

2022-07-02 07:58:00 【MezereonXP】

Dynamic extensible representation for category incremental learning – DER

List of articles

Dynamic extensible representation for category incremental learning -- DER

This time, we introduce a training method similar to representation learning , For incremental learning of categories , From CVPR2021 An article from "DER: Dynamically Expandable Representation for Class Incremental Learning".

First , We need to add some pre concepts , Such as category incremental learning and representation learning .

Category incremental learning

In traditional classification learning , We usually have all categories when we train , When testing, it also tests all kinds of data .

In the real world , We often don't define all categories at the beginning , And collect all the corresponding data , The reality is that , We usually have some categories of data , Then train a classifier first , Wait until there is a new category , Then make adjustments to the network structure , Conduct data collection again 、 Training and testing .

Representational learning / Measure learning

Representational learning （Representation Learning）, Or measure learning （Metric Learning）, Its purpose is to , Learn a representation of data （ Usually in the form of a vector ）, Make the representation of the same kind close , The representation of dissimilarity is far away , The distance here can be Euclidean distance, etc .

When doing category incremental learning , We can often reuse the previously trained representation extractor , Tune on new data （fine-tune）.

here , The article divides representation learning into 3 class ：

Regularization based method
Distillation based method
Structure based approach

Regularization based methods generally have a strong assumption , It is mainly based on the estimation method , Fine tune the parameters .

Distillation based methods depend on the quantity and quality of the data used .

Structure based approach , Additional new parameters will be introduced , Used to model new categories of data .

The above classification is actually insufficient , If you use traditional measurement learning to learn a “ front end ”, To extract features , Then fine tuning the back-end classifier is also a method , But this article doesn't seem to discuss this method .

The basic flow

pipeline

As shown in the figure above , In fact, it is a process of feature splicing , First , We use some categories of data for training , Get a feature extractor $\Phi_{t-1}$ , For a new feature $\mathcal{F}_t$ , Given a picture $x\in \tilde{\mathcal{D}}_t$ , The features after splicing can be expressed as ：
$\Phi_{t}(x)=[\Phi_{t-1}(x), \mathcal{F}_t(x)]$
Then the feature will be input into a classifier $\mathcal{H}_t$ On , Output is :
$p_{\mathcal{H}_t}(y|x)=Softmax(\mathcal{H}_t(u))$
The predicted result is :
$\hat{y} = \arg\max p_{\mathcal{H}_t}(y|x)$
therefore , The basic training error is simple cross entropy error ：
$\mathcal{L}_{\mathcal{H}_t}=-\frac{1}{|\tilde{\mathcal{D}}_t|}\sum_{i=1}^{|\tilde{\mathcal{D}}_t|}\log(p_{\mathcal{H}_t}(y=y_i|x_i))$
We will classify $\mathcal{H}_t$ Replace with a classifier for the new category feature $\mathcal{H}_a$ , You can get an error for the characteristics of the new category $\mathcal{L}_{\mathcal{H}_a}$

The error form of fusion is ;
$\mathcal{L}_{ER} = \mathcal{L}_{\mathcal{H}_t} + \lambda_a\mathcal{L}_{\mathcal{H}_a}$
In order to reduce the parameter increment caused by category increment , Here a kind of Mask Mechanism , That is to learn one Mask, For the channel Mask, Use a variable $e_l$ Control .
$f_l'=f_l\odot m_l\\ m_l=\sigma(se_l)$
among $\sigma(\cdot)$ Express sigmoid Activation function , $s$ Is a scaling factor .

Introduce a sparsity error , It is used to encourage the model to compress parameters as much as possible ,Mask Drop more channels ：
$\mathcal{L}_S = \frac{\sum_{l=1}^LK_l||m_{l-1}||_1||m_l||_1}{\sum_{l=1}^LK_lc_{l-1}c_{l}}$
among , $L$ Is the number of layers , $K_l$ It's No $l$ Layer convolution Kernel Size.

Final , Get a comprehensive error expression ：
$\mathcal{L}_{DER} = \mathcal{L}_{\mathcal{H}_t} +\lambda_a\mathcal{L}_{\mathcal{H}_t^a} + \lambda_s\mathcal{L}_S$

experimental analysis

The first is the setting of data set , Three data sets are used ：

CIFAR-100
ImageNet-1000
Imagenet-100

about CIFAR-100 Of 100 class , Will be based on 5,10,20,50 Incremental processes to train . here , about 5 Incremental processes , That is, it will increase every time 20 Class new category data . Such a data set segmentation method is recorded as CIFAR100-B0.

Another incremental method is , First in 50 Training on class , And then the rest 50 class , according to 2、5、10 An incremental process for training . Write it down as CIFAR100-B50.

We only give here CIFAR-100 Result of dataset , More detailed , You can see the paper .

cifar

As shown in the figure above , The final average accuracy of this method is higher than that of other incremental learning methods . It should be noted that , When using Mask The mechanism is , That is to use Mask The result of is to cut the parameters , The parameters of the obtained model are greatly reduced , The accuracy can still be maintained .

原网站

版权声明
本文为[MezereonXP]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020623041956.html