当前位置:网站首页>Dynamic extensible representation for category incremental learning -- der
Dynamic extensible representation for category incremental learning -- der
2022-07-02 07:58:00 【MezereonXP】
Dynamic extensible representation for category incremental learning – DER
List of articles
This time, we introduce a training method similar to representation learning , For incremental learning of categories , From CVPR2021 An article from "DER: Dynamically Expandable Representation for Class Incremental Learning".
First , We need to add some pre concepts , Such as category incremental learning and representation learning .
Category incremental learning
In traditional classification learning , We usually have all categories when we train , When testing, it also tests all kinds of data .
In the real world , We often don't define all categories at the beginning , And collect all the corresponding data , The reality is that , We usually have some categories of data , Then train a classifier first , Wait until there is a new category , Then make adjustments to the network structure , Conduct data collection again 、 Training and testing .
Representational learning / Measure learning
Representational learning (Representation Learning), Or measure learning (Metric Learning), Its purpose is to , Learn a representation of data ( Usually in the form of a vector ), Make the representation of the same kind close , The representation of dissimilarity is far away , The distance here can be Euclidean distance, etc .
When doing category incremental learning , We can often reuse the previously trained representation extractor , Tune on new data (fine-tune).
here , The article divides representation learning into 3 class :
- Regularization based method
- Distillation based method
- Structure based approach
Regularization based methods generally have a strong assumption , It is mainly based on the estimation method , Fine tune the parameters .
Distillation based methods depend on the quantity and quality of the data used .
Structure based approach , Additional new parameters will be introduced , Used to model new categories of data .
The above classification is actually insufficient , If you use traditional measurement learning to learn a “ front end ”, To extract features , Then fine tuning the back-end classifier is also a method , But this article doesn't seem to discuss this method .
The basic flow
As shown in the figure above , In fact, it is a process of feature splicing , First , We use some categories of data for training , Get a feature extractor Φ t − 1 \Phi_{t-1} Φt−1, For a new feature F t \mathcal{F}_t Ft , Given a picture x ∈ D ~ t x\in \tilde{\mathcal{D}}_t x∈D~t , The features after splicing can be expressed as :
u = Φ t ( x ) = [ Φ t − 1 ( x ) , F t ( x ) ] u = \Phi_{t}(x)=[\Phi_{t-1}(x), \mathcal{F}_t(x)] u=Φt(x)=[Φt−1(x),Ft(x)]
Then the feature will be input into a classifier H t \mathcal{H}_t Ht On , Output is :
p H t ( y ∣ x ) = S o f t m a x ( H t ( u ) ) p_{\mathcal{H}_t}(y|x)=Softmax(\mathcal{H}_t(u)) pHt(y∣x)=Softmax(Ht(u))
The predicted result is :
y ^ = arg max p H t ( y ∣ x ) \hat{y} = \arg\max p_{\mathcal{H}_t}(y|x) y^=argmaxpHt(y∣x)
therefore , The basic training error is simple cross entropy error :
L H t = − 1 ∣ D ~ t ∣ ∑ i = 1 ∣ D ~ t ∣ log ( p H t ( y = y i ∣ x i ) ) \mathcal{L}_{\mathcal{H}_t}=-\frac{1}{|\tilde{\mathcal{D}}_t|}\sum_{i=1}^{|\tilde{\mathcal{D}}_t|}\log(p_{\mathcal{H}_t}(y=y_i|x_i)) LHt=−∣D~t∣1i=1∑∣D~t∣log(pHt(y=yi∣xi))
We will classify H t \mathcal{H}_t Ht Replace with a classifier for the new category feature H a \mathcal{H}_a Ha , You can get an error for the characteristics of the new category L H a \mathcal{L}_{\mathcal{H}_a} LHa
The error form of fusion is ;
L E R = L H t + λ a L H a \mathcal{L}_{ER} = \mathcal{L}_{\mathcal{H}_t} + \lambda_a\mathcal{L}_{\mathcal{H}_a} LER=LHt+λaLHa
In order to reduce the parameter increment caused by category increment , Here a kind of Mask Mechanism , That is to learn one Mask, For the channel Mask, Use a variable e l e_l el Control .
f l ′ = f l ⊙ m l m l = σ ( s e l ) f_l'=f_l\odot m_l\\ m_l=\sigma(se_l) fl′=fl⊙mlml=σ(sel)
among σ ( ⋅ ) \sigma(\cdot) σ(⋅) Express sigmoid Activation function , s s s Is a scaling factor .
Introduce a sparsity error , It is used to encourage the model to compress parameters as much as possible ,Mask Drop more channels :
L S = ∑ l = 1 L K l ∣ ∣ m l − 1 ∣ ∣ 1 ∣ ∣ m l ∣ ∣ 1 ∑ l = 1 L K l c l − 1 c l \mathcal{L}_S = \frac{\sum_{l=1}^LK_l||m_{l-1}||_1||m_l||_1}{\sum_{l=1}^LK_lc_{l-1}c_{l}} LS=∑l=1LKlcl−1cl∑l=1LKl∣∣ml−1∣∣1∣∣ml∣∣1
among , L L L Is the number of layers , K l K_l Kl It's No l l l Layer convolution Kernel Size.
Final , Get a comprehensive error expression :
L D E R = L H t + λ a L H t a + λ s L S \mathcal{L}_{DER} = \mathcal{L}_{\mathcal{H}_t} +\lambda_a\mathcal{L}_{\mathcal{H}_t^a} + \lambda_s\mathcal{L}_S LDER=LHt+λaLHta+λsLS
experimental analysis
The first is the setting of data set , Three data sets are used :
- CIFAR-100
- ImageNet-1000
- Imagenet-100
about CIFAR-100 Of 100 class , Will be based on 5,10,20,50 Incremental processes to train . here , about 5 Incremental processes , That is, it will increase every time 20 Class new category data . Such a data set segmentation method is recorded as CIFAR100-B0.
Another incremental method is , First in 50 Training on class , And then the rest 50 class , according to 2、5、10 An incremental process for training . Write it down as CIFAR100-B50.
We only give here CIFAR-100 Result of dataset , More detailed , You can see the paper .
As shown in the figure above , The final average accuracy of this method is higher than that of other incremental learning methods . It should be noted that , When using Mask The mechanism is , That is to use Mask The result of is to cut the parameters , The parameters of the obtained model are greatly reduced , The accuracy can still be maintained .
边栏推荐
- 【MagNet】《Progressive Semantic Segmentation》
- 将恶意软件嵌入到神经网络中
- Network metering - transport layer
- How do vision transformer work? [interpretation of the paper]
- Latex formula normal and italic
- Ppt skills
- TimeCLR: A self-supervised contrastive learning framework for univariate time series representation
- Graph Pooling 简析
- open3d学习笔记三【采样与体素化】
- 浅谈深度学习中的对抗样本及其生成方法
猜你喜欢
Graph Pooling 简析
【Cascade FPD】《Deep Convolutional Network Cascade for Facial Point Detection》
【双目视觉】双目矫正
[learning notes] matlab self compiled Gaussian smoother +sobel operator derivation
半监督之mixmatch
【Programming】
用于类别增量学习的动态可扩展表征 -- DER
【BiSeNet】《BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation》
浅谈深度学习模型中的后门
Open3d learning note 4 [surface reconstruction]
随机推荐
Remplacer l'auto - attention par MLP
【双目视觉】双目矫正
[binocular vision] binocular correction
Organigramme des activités
Feature Engineering: summary of common feature transformation methods
【MnasNet】《MnasNet:Platform-Aware Neural Architecture Search for Mobile》
w10升级至W11系统,黑屏但鼠标与桌面快捷方式能用,如何解决
【Cutout】《Improved Regularization of Convolutional Neural Networks with Cutout》
Graph Pooling 简析
Proof and understanding of pointnet principle
E-R画图明确内容
Replace convolution with full connection layer -- repmlp
图像增强的几个方法以及Matlab代码
Mmdetection model fine tuning
将恶意软件嵌入到神经网络中
Machine learning theory learning: perceptron
The internal network of the server can be accessed, but the external network cannot be accessed
ABM thesis translation
How do vision transformer work?【论文解读】
Nacos service registration in the interface