当前位置:网站首页>Interpretation of MAML (model agnostic meta learning)
Interpretation of MAML (model agnostic meta learning)
2022-06-22 11:00:00 【Qianyu QY】
Address of thesis :proceedings.mlr.press/v70/finn17a/finn17a.pdf
5.1 brief introduction
Model-Agnostic: Can be applied to Any gradient descent Model of , Can be used for Different learning tasks ( Such as classification 、 Return to 、 Policy gradient RL).
Meta-Learning: Training models on a large number of learning tasks , So that the model only uses A small number of training samples You can learn new task ( Speed up fine-tune). Different tasks have different models .
It is necessary to consider fusing previous experience with a small amount of new information , At the same time, avoid over fitting .
The core of the method yes The initial parameters of the training model , Thus, the model can achieve the best performance by only a few steps of gradient update on a small number of new task samples .
5.2 Method
First, the following algorithm is used 1 Initialize network weights , Then fine tune the training on the new task .
For the above algorithm 1 The interpretation of :
MAML Is the purpose of : Learn the initialization weight of the network , Thus, the network can achieve good results only by training one or several steps on new tasks .
And the core of deep learning , Algorithm 1 Medium Training tasks and Test task of fine tuning after initializing the weight All samples , And general deep learning Training set and Test set The purpose and concept of .
Formulate the above description : Suppose the initialization weight of the network is θ \theta θ, Network in different new tasks τ i \tau_i τi The weight of the training set after one-step gradient update is θ i ′ \theta'_i θi′, Use the updated weights θ i ′ \theta'_i θi′ On a new mission τ i \tau_i τi Calculate the loss on the test set of L i ( θ i ′ ) \mathcal{L}_i (\theta'_i) Li(θi′),MAML The aim is to make different new tasks τ i \tau_i τi The sum of the losses on the , The formula is as follows :
L = m i n ∑ τ i ∼ p ( τ ) L i ( θ i ′ ) L = min ~ \sum_{\tau_i \sim p(\tau)} \mathcal{L}_i (\theta'_i) L=min τi∼p(τ)∑Li(θi′)
Take the above as the total loss function , Network weight θ \theta θ Make a gradient descent , as follows :
θ ← θ − β ∇ θ ∑ τ i ∼ p ( τ ) L i ( θ i ′ ) = θ − β ∑ τ i ∼ p ( τ ) ∇ θ L i ( θ i ′ ) \theta \leftarrow \theta - \beta \nabla_{\theta} \sum_{\tau_i \sim p(\tau)} \mathcal{L}_i (\theta'_i) \\ ~~ = \theta - \beta \sum_{\tau_i \sim p(\tau)} \nabla_{\theta} \mathcal{L}_i (\theta'_i) θ←θ−β∇θτi∼p(τ)∑Li(θi′) =θ−βτi∼p(τ)∑∇θLi(θi′)
Calculation ∇ θ L i ( θ i ′ ) \nabla_{\theta} \mathcal{L}_i (\theta'_i) ∇θLi(θi′):
Borrow the formula in teacher lihongyi's handout , ϕ = θ \phi=\theta ϕ=θ, θ ^ = θ i ′ \hat{\theta}=\theta'_i θ^=θi′, ∇ θ L i ( θ i ′ ) = ∇ ϕ l ( θ ^ ) \nabla_{\theta} \mathcal{L}_i (\theta'_i) = \nabla_{\phi} l(\hat\theta) ∇θLi(θi′)=∇ϕl(θ^), ∇ ϕ l ( θ ^ ) \nabla_{\phi} l(\hat\theta) ∇ϕl(θ^) It can be decomposed into the following formula ,

among , θ ^ \hat{\theta} θ^ from ϕ \phi ϕ To calculate the , as follows :
It is calculated by the following formula ∇ ϕ l ( θ ^ ) \nabla_{\phi} l(\hat\theta) ∇ϕl(θ^) Every derivative in :
Calculating the second derivative is very time-consuming , therefore MAML The first derivative approximation method is proposed in this paper , That is, assume that the second derivative is 0, The formula is simplified as follows :

simplified , ∇ ϕ l ( θ ^ ) → ∇ θ ^ l ( θ ^ ) \nabla_{\phi} l(\hat\theta) \rightarrow \nabla_{\hat\theta} l(\hat\theta) ∇ϕl(θ^)→∇θ^l(θ^), The original gradient descent formula is transformed into :
θ ← θ − β ∑ τ i ∼ p ( τ ) ∇ θ i ′ L i ( θ i ′ ) \theta \leftarrow \theta - \beta \sum_{\tau_i \sim p(\tau)} \nabla_{\theta'_i} \mathcal{L}_i (\theta'_i) θ←θ−βτi∼p(τ)∑∇θi′Li(θi′)
namely , Directly for each updated θ i ′ \theta'_i θi′ Calculate the gradient , Apply the gradient to the... Before update θ \theta θ On .
problem :
1、 Why is it that multiple tasks are sampled circularly and randomly for learning ?
answer : Build enough different tasks , Make the network fully trained , Thus, when facing new tasks, only a few steps of updating can achieve better results .
2、 Why do the first gradient calculation and the second gradient calculation use different samples under the same task , namely support set and query set?
answer : The former is a training set , Used to calculate θ i ′ \theta'_i θi′, The latter is the test set , Used to calculate the loss .
3、 Instead of pre training on a lot of tasks first ( Calculate the gradient only once at a time ), Fine tune the new tasks , What are the advantages ?
answer : The purpose of pre training is to optimize the performance of the network on all tasks , When applying this optimal model to new task tuning , May fall into local optimal value and other problems ; and MAML The purpose of this paper is to optimize the performance of the model after several steps of training on a new task , Consider the future optimal value , So it won't be optimal on some tasks , And fall into suboptimal on other tasks .
For more details, please refer to :
https://zhuanlan.zhihu.com/p/57864886
https://www.bilibili.com/video/BV1w4411872t?p=7&vd_source=383540c0e1a6565a222833cc51962ed9
边栏推荐
- Construction details of Danzhou clean animal laboratory
- 世界上第一个“半机械人”去世,改造自己只为“逆天改命”
- Leetcode algorithm refers to offer 24 Reverse linked list
- 代碼簽名證書一旦泄露 危害有多大
- 数据库课程虚拟教研室负责人杜小勇:立足国产数据库重大需求,探索课程体系建设新模式
- Yolov3 target detection
- Gartner said: cloud database is developing strongly, but local database is still full of vitality
- Pule frog 5D flying cinema 5D dynamic cinema experience hall equipment 7d multi person interactive cinema
- The first "cyborg" in the world died, and he only transformed himself to "change his life against the sky"
- [shell] collection of common instructions
猜你喜欢

本周四晚19:00战码先锋第7期直播丨三方应用开发者如何为开源做贡献

Investment transaction management

Laravel 中类似 WordPress 的钩子和过滤器

等重构完这系统,我就提离职!

CVPR 2022 oral | a new motion oriented point cloud single target tracking paradigm
推薦一款M1芯片電腦快速搭建集群的虛擬機軟件

Learn to view object models with VisualStudio developer tools

Today, how does sysak implement business jitter monitoring and diagnosis Take you through Anolis OS 25-26

Isn't this another go bug?

A special file upload
随机推荐
MySQL使用SQL语句新增字段、删除字段
PHP website, how to achieve the function of batch printing express orders?
iNFTnews | 观点:市场降温或是让NFT应用走向台前的机会
Use of libevent
Force buckle 1108 IP address invalidation
【jmeter】shell脚本自动执行jmeter
nodejs 中 path.join() 和 path.resolve()的区别
Redis 切片集群的数据倾斜分析
QQ email for opencv face recognition
一次特殊的文件上传
What is the name of CITIC Securities app? Is it safe to open a stock account?
6-13 提高加载性能 - 应用缓存
爱可可AI前沿推介(6.22)
【Shell】常用指令集锦
LeetCode Algorithm 剑指 Offer 24. 反转链表
heidisql插入记录,总是出错,要怎么改?
论文精读:Generative Adversarial Imitation Learning(生成对抗模仿学习)
[this tool, combined with JMeter, will increase your work efficiency by at least 80%, which is highly recommended]
推荐一款M1芯片电脑快速搭建集群的虚拟机软件
Intensive reading: generative adversarial imitation learning