当前位置:网站首页>[meta learning] classic work MAML and reply (Demo understands meta learning mechanism)

[meta learning] classic work MAML and reply (Demo understands meta learning mechanism)

2022-06-22 06:54:00 chad_ lee

Meta Learning

MAML ICML’17

MAML And model structure 、 Task independent , The only requirement is that the model has parameters .

MAML Generate an initialization of the weight , Other models can be based on a small number of samples fine-tuning. therefore MAML Input and function of and Pre-train It's the same .

Algorithm

MAML You can use this picture to make it clear , The left figure shows the algorithm flow ∣ T i ∣ = 1 |\mathcal{T}_{i}|=1 Ti=1 The situation of .

First , The model has an initialization parameter θ \theta θ

Take a tasks T i ∼ p ( T ) \mathcal{T}_{i} \sim p(\mathcal{T}) Tip(T)​. In this task Of K K K​ Training data ( also called support set) Last training , Calculate the gradient ∇ θ L T i ( f θ ) \nabla_{\theta} \mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right) θLTi(fθ)​【 The first 5 That's ok 】, Then update it with a gradient θ \theta θ​( It can be many times ), obtain θ ′ \theta^{\prime} θ​​​.【 The first 6 That's ok 】:

If it's normal training , Then a batch of data will be sampled , And then to θ ′ \theta^{\prime} θ Continue training as a starting point ,MAML It is :

And then use θ ′ \theta^{\prime} θ stay task i Test data for ( also called query set) Last training , Calculate the gradient ∇ θ L T i ( f θ i ′ ) \nabla_{\theta} \mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta_{i}^{\prime}}\right) θLTi(fθi), to update θ \theta θ【 The first 8 That's ok 】:

Such a ( A group of )task It's done , Go on to the next ( Next batch )task.

The form is the same for classification or regression tasks , It's just that the loss function is different from how to deal with data :

experiment

Omniglot Data sets :1623 Letters , Each letter has 20 A picture .【 But this data set is relatively simple , The accuracy has 99+ 了 】

MiniImagenet Data sets : Training set 64 Classes , Verification set 12 Classes , Test set 24 Classes .【 current sota It's already MAML Twice as much 】

The Omniglot and MiniImagenet image recognition tasks are the most com- mon recently used few-shot learning benchmarks.

Training is “N-ways K-shot”, That is, in every training and test tasks in , Yes N individual class,K Samples :

 Insert picture description here

Reptile:On First-Order Meta-Learning Algorithms Arxiv’18

OpenAI Articles produced ,follow And improve MAML, The number of citations has 700+

The core idea

Above, MAML My thoughts are rather convoluted , Initial parameters of the model θ 0 \theta^{0} θ0 First in the dataset A On the training to get θ ′ \theta^{\prime} θ; Then sample a data set B, Calculation θ ′ \theta^{\prime} θ​​​​ stay B The gradient above , Then on θ 0 \theta^{0} θ0 updated , obtain θ 1 \theta^1 θ1
θ 1 = θ 0 − ε ∇ θ ∑ T ⋅ ∈ P L T i ( f θ ′ ) \theta^{1}=\theta^{0}-\varepsilon \nabla_{\theta} \sum_{\mathcal{T} \cdot \in P} \mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta^{\prime}}\right) θ1=θ0εθTPLTi(fθ)
MAML A good initial parameter is obtained by training θ \theta θ( Better than pre training ), The calculation of the second derivative will be involved in the calculation of the gradient ,MAML Using the first derivative approximation (FOMAML) To deal with , simplified calculation .

What this article puts forward Reptile stay FOMAML To further simplify the parameter update method , There is no need to calculate the second loss gradient , Direct use θ 0 − θ ′ \theta^{0}-\theta^{\prime} θ0θ​ As a gradient , Update the parameters :
θ 1 = θ 0 − ε ( θ 0 − θ ′ ) \theta^{1}=\theta^{0}-\varepsilon\left(\theta^{0}-\theta^{\prime}\right) θ1=θ0ε(θ0θ)
 Insert picture description here

Pictured above , Initial parameter θ 0 \theta^0 θ0 In the data set A Do more batch The calculation of reached the parameter θ ′ \theta^{\prime} θ, And then back to θ 0 \theta^0 θ0, Calculation θ 0 − θ ′ \theta^{0}-\theta^{\prime} θ0θ, And update to get θ 1 \theta^1 θ1. If it's normal training , Will not return to θ 0 \theta^0 θ0, Will θ ′ \theta^{\prime} θ Continue updating for starting point .

Specially , If there is only one data in each data set / One batch data , that Reptile Will degenerate into ordinary training .

Pre-train、MAML and Reptile The difference between :

 Insert picture description here

Task Divide

OpenAI Of Reptile The blog of There is one of them. demo, The image shows Reptile The wonder of , It also helps to understand task The definition of

 Insert picture description here

There is already a built-in Reptile Well trained few shot learning Three classification model , The user can give three training samples ( One for each of the three categories ), The model will be fine tuned on three samples , This is it. 3-way 1-shot. Then draw a test sample on the right , The model will give the probability distribution of three classes .

here “ Chi Chi camel Chi ” The four pictures are one 3-way 1-shot Model task.

ProtoNet、MAML、Reptile、MetaOptNet、R2-D2 Data set partitioning 、 The task division is basically the same , They also mentioned in the article , To ensure the comparative effect of the experiment ,follow The experimental setup of the previous article , No new task. Just some training trick, among ProtoNet For model performance “Higher way”、R2-D2 use random shot Training models .

原网站

版权声明
本文为[chad_ lee]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202220543470612.html