当前位置：网站首页>[meta learning] classic work MAML and reply (Demo understands meta learning mechanism)

[meta learning] classic work MAML and reply (Demo understands meta learning mechanism)

2022-06-22 06:54:00 【chad_ lee】

Meta Learning

MAML ICML’17

MAML And model structure 、 Task independent , The only requirement is that the model has parameters .

MAML Generate an initialization of the weight , Other models can be based on a small number of samples fine-tuning. therefore MAML Input and function of and Pre-train It's the same .

Algorithm

MAML You can use this picture to make it clear , The left figure shows the algorithm flow $|\mathcal{T}_{i}|=1$ The situation of .

First , The model has an initialization parameter $\theta$ ：

Take a tasks $\mathcal{T}_{i} \sim p(\mathcal{T})$ . In this task Of $K$ Training data （ also called support set） Last training , Calculate the gradient $\nabla_{\theta} \mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right)$ 【 The first 5 That's ok 】, Then update it with a gradient $\theta$ （ It can be many times ）, obtain $\theta^{\prime}$ .【 The first 6 That's ok 】：

If it's normal training , Then a batch of data will be sampled , And then to $\theta^{\prime}$ Continue training as a starting point ,MAML It is ：

And then use $\theta^{\prime}$ stay task i Test data for （ also called query set） Last training , Calculate the gradient $\nabla_{\theta} \mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta_{i}^{\prime}}\right)$ , to update $\theta$ 【 The first 8 That's ok 】：

Such a （ A group of ）task It's done , Go on to the next （ Next batch ）task.

The form is the same for classification or regression tasks , It's just that the loss function is different from how to deal with data ：

experiment

Omniglot Data sets ：1623 Letters , Each letter has 20 A picture .【 But this data set is relatively simple , The accuracy has 99+ 了】

MiniImagenet Data sets ： Training set 64 Classes , Verification set 12 Classes , Test set 24 Classes .【 current sota It's already MAML Twice as much 】

The Omniglot and MiniImagenet image recognition tasks are the most com- mon recently used few-shot learning benchmarks.

Training is “N-ways K-shot”, That is, in every training and test tasks in , Yes N individual class,K Samples ：

Insert picture description here

Reptile：On First-Order Meta-Learning Algorithms Arxiv’18

OpenAI Articles produced ,follow And improve MAML, The number of citations has 700+

The core idea

Above, MAML My thoughts are rather convoluted , Initial parameters of the model $\theta^{0}$ First in the dataset A On the training to get $\theta^{\prime}$ ; Then sample a data set B, Calculation $\theta^{\prime}$ stay B The gradient above , Then on $\theta^{0}$ updated , obtain $\theta^1$ ：
$\theta^{1}=\theta^{0}-\varepsilon \nabla_{\theta} \sum_{\mathcal{T} \cdot \in P} \mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta^{\prime}}\right)$
MAML A good initial parameter is obtained by training $\theta$ （ Better than pre training ）, The calculation of the second derivative will be involved in the calculation of the gradient ,MAML Using the first derivative approximation （FOMAML） To deal with , simplified calculation .

What this article puts forward Reptile stay FOMAML To further simplify the parameter update method , There is no need to calculate the second loss gradient , Direct use $\theta^{0}-\theta^{\prime}$ As a gradient , Update the parameters ：
$\theta^{1}=\theta^{0}-\varepsilon\left(\theta^{0}-\theta^{\prime}\right)$
Insert picture description here

Pictured above , Initial parameter $\theta^0$ In the data set A Do more batch The calculation of reached the parameter $\theta^{\prime}$ , And then back to $\theta^0$ , Calculation $\theta^{0}-\theta^{\prime}$ , And update to get $\theta^1$ . If it's normal training , Will not return to $\theta^0$ , Will $\theta^{\prime}$ Continue updating for starting point .

Specially , If there is only one data in each data set / One batch data , that Reptile Will degenerate into ordinary training .

Pre-train、MAML and Reptile The difference between ：

Insert picture description here

Task Divide

OpenAI Of Reptile The blog of There is one of them. demo, The image shows Reptile The wonder of , It also helps to understand task The definition of

Insert picture description here

There is already a built-in Reptile Well trained few shot learning Three classification model , The user can give three training samples （ One for each of the three categories ）, The model will be fine tuned on three samples , This is it. 3-way 1-shot. Then draw a test sample on the right , The model will give the probability distribution of three classes .

here “ Chi Chi camel Chi ” The four pictures are one 3-way 1-shot Model task.

ProtoNet、MAML、Reptile、MetaOptNet、R2-D2 Data set partitioning 、 The task division is basically the same , They also mentioned in the article , To ensure the comparative effect of the experiment ,follow The experimental setup of the previous article , No new task. Just some training trick, among ProtoNet For model performance “Higher way”、R2-D2 use random shot Training models .

原网站

版权声明
本文为[chad_ lee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202220543470612.html