当前位置:网站首页>[Multi-task learning] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts KDD18
[Multi-task learning] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts KDD18
2022-08-01 20:01:00 【chad_lee】
Understand at the model level,We often spend a lot of energy on a single goal“Find strong features”和“Remove redundant features”输入到模型,提高模型效果.那么切换到MTL时,每个task所需要的“强特”and exclusionary“Negative”是不同的,MTLThe purpose is for eachtask Find their strong and negative specials as much as possible.
Understand at the optimization level,多个task同时优化模型,某些taskwill dominate the optimization process of the model,drowned out the otherstask.
Understand from the perspective of supervisory signals,MTL不仅仅是任务,It is also a data augmentation,相当于每个task多了k-1A supervisory signal to aid learning,Some features can be derived from otherstask学的更好.Monitor the quality of the signal andtasksimilarity between them,不相似的taskInstead, it's noise.
#SB、MOE、MMOE
《Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts》Google KDD 2018

share-bottom
Multiple tasks share the same bottom network,The bottom network outputs the feature vector of a sample,Each subtask picks up a small by itselfNN tower.
优点:简单,并且模型过拟合的风险小(Because it is not easy to overfit multiple tasks at the same time,It can be said that multiple tasks supervise each other and penalize overfitting);Multitasking is more relevant,Complement each other the better.
缺点:If the connection between tasks is not strong(矛盾、冲突),Then the optimization direction for the underlying network may be the opposite.
Bottom output:f(x),子任务tower:$h^k_x ,Output for each subtask: ,Output for each subtask: ,Output for each subtask:y_x^k = h^k_x(f(x)) $
One-gate-MoE
将inputInput to three independentexpert(3个nn),同时将input输入到gate,gate输出每个expert被选择的概率,然后将三个expert的输出加权求和,输出给tower:
y k = h k ( ∑ i = 1 n g i f i ( x ) ) , y^{k}=h^{k}\left(\sum_{i=1}^{n} g_{i} f_{i}(x)\right) \text { ,} yk=hk(i=1∑ngifi(x)) ,
其中 g ( ) g() g() is a multi-classification module,且 ∑ i = 1 n g ( x ) i = 1 \sum_{i=1}^{n} g(x)_{i}=1 ∑i=1ng(x)i=1 ,$f_{i}(), i=1, \cdots, n 是 n 个 e x p e r t n e t w o r k , k 表示 k 个任务, 是n个expert network,k表示k个任务, 是n个expertnetwork,k表示k个任务,h^k$means afterNN tower.
So here is equivalent to givinginput当作query,给bottomThe output adds oneattention,Look at the formula to change the soup without changing the medicine,$h^k $outside the parentheses,这就导致不同的towerThe input is still the same,没有解决task冲突问题.
但是MOEIt can solve the problem of domain adaptation,用于cross domain:
MMOE
因此很自然,每个 $h_k $ Don't put it outside parentheses,不同的 h k h_k hkThe input is different.
所以每个task各分配一个gate,这样gateThe role is no longerattention了,Rather, it's personalized for eachtaskSelect important features,Filter redundant features:
f k ( x ) = ∑ i = 1 n g i k ( x ) f i ( x ) g k ( x ) = softmax ( W g k x ) \begin{aligned} f^{k}(x) &=\sum_{i=1}^{n} g_{i}^{k}(x) f_{i}(x) \\ \ \ g^{k}(x) &=\operatorname{softmax}\left(W_{gk}x\right) \end{aligned} fk(x) gk(x)=i=1∑ngik(x)fi(x)=softmax(Wgkx)
其中gis a linear change+softmax.
There is a situation in theory,gate能给每个task筛选特征,As for whether the model can be optimized to this situation,不好说.
实验

边栏推荐
猜你喜欢

【节能学院】数据机房中智能小母线与列头柜方案的对比分析

regular expression

【社媒营销】如何知道自己的WhatsApp是否被屏蔽了?

我的驾照考试笔记(2)

卷积神经网络(CNN)mnist数字识别-Tensorflow

为你的“架构”安排定期体检吧!

57:第五章:开发admin管理服务:10:开发【从MongoDB的GridFS中,获取文件,接口】;(从GridFS中,获取文件的SOP)(不使用MongoDB的服务,可以排除其自动加载类)

数字孪生北京故宫,元宇宙推进旅游业进程

用户体验好的Button,在手机上不应该有Hover态

Gradle系列——Gradle文件操作,Gradle依赖(基于Gradle文档7.5)day3-1
随机推荐
Ruijie switch basic configuration
Different operating with different locks, rounding
面试突击70:什么是粘包和半包?怎么解决?
AcWing 797. 差分
【webrtc】sigslot : 继承has_slot 及相关流程和逻辑
mysql自增ID跳跃增长解决方案
【节能学院】智能操控装置在高压开关柜的应用
内网穿透 lanproxy部署
根据Uniprot ID/PDB ID批处理获取蛋白质.pdb文件
洛谷 P2440 木材加工
第60章 ApplicationPart自动集成整体性和独立性插件项
PROE/Croe如何编辑已完成的草图,让其再次进入草绘状态
Greenplum Database Source Code Analysis - Analysis of Standby Master Operation Tools
突破边界,华为存储的破壁之旅
The graphic details Eureka's caching mechanism/level 3 cache
启明云端分享|盘点ESP8684开发板有哪些功能
使用Huggingface在矩池云快速加载预训练模型和数据集
【节能学院】安科瑞餐饮油烟监测云平台助力大气污染攻坚战
easyUI中datagrid中的formatter里面向后台发送请求获取数据
第56章 业务逻辑之物流/配送实体定义