当前位置:网站首页>[Multi-task model] Progressive Layered Extraction: A Novel Multi-Task Learning Model for Personalized (RecSys'20)
[Multi-task model] Progressive Layered Extraction: A Novel Multi-Task Learning Model for Personalized (RecSys'20)
2022-08-01 20:01:00 【chad_lee】
Tencent's video recommendation team,建模的目标包含用户的多种不同的行为:点击,分享,评论等等.每次请求,The ranking points of the candidates are calculated according to the formula:
score = p V T R w V T R × p V C R w V C R × p S H R w S H R × … × p C M R w C M × f ( video l e n ) \text { score }=p V T R^{w V T R} \times p V C R^{w V C R} \times p S H R^{w S H R} \times \ldots \times p_{C M R}^{w C M} \times f(\text { video } l e n) score =pVTRwVTR×pVCRwVCR×pSHRwSHR×…×pCMRwCM×f( video len)
其中w是超参,表示相对重要性

There are often complex relationships between multiple targets,Therefore, the phenomenon of seesaw often appears when modeling multiple targets at the same time,i.e. multiple tasksnegative transfer的问题:

GCG
MMOEIn theory, there is an optimal situation where features can be automatically selected,But this situation depends:1、gateCan you choose;2、也依赖expertCan produce a variety of characteristics(所有expert输出类似,无可奈何).
因此本文提出的Customized Gate ControlMake this problem a little easier,Divide experts into big peers and small peers,both sharedexpert们,每个task也有专门的expert们,A little less difficult.

这样EA只被taskA训,EB只被taskB训,Guaranteed at least.
input是x,任务k的输出是
y k ( x ) = t k ( g k ( x ) ) y^{k}(x)=t^{k}\left(g^{k}(x)\right) yk(x)=tk(gk(x))
其中 t k t^k tk是这个任务的NN tower, g k ( x ) g^{k}(x) gk(x) 是第kThe output of the gating network for each task:
g k ( x ) = w k ( x ) S k ( x ) g^{k}(x)=w^{k}(x) S^{k}(x) gk(x)=wk(x)Sk(x)
其中x是原始输入, w k ( x ) w^{k}(x) wk(x)是一个加权函数,Corresponds to the weight of each expert respectively,是一个softmax的输出:
w k ( x ) = Softmax ( W g k x ) w^{k}(x)=\operatorname{Softmax}\left(W_{g}^{k} x\right) wk(x)=Softmax(Wgkx)
其中 W g k ∈ R ( m k + m s ) × d W_{g}^{k} \in R^{\left(m_{k}+m_{s}\right) \times d} Wgk∈R(mk+ms)×d,mk和ms是 shared experts 和 specific experts 的个数. S k ( x ) S^{k}(x) Sk(x)is the output vector of all expertscontackcalled togetherselected matrix:
S k ( x ) = [ E ( k , 1 ) T , E ( k , 2 ) T , … , E ( k , m k ) T , E ( s , 1 ) T , E ( s , 2 ) T , … , E ( s , m s ) T ] T S^{k}(x)=\left[E_{(k, 1)}^{T}, E_{(k, 2)}^{T}, \ldots, E_{\left(k, m_{k}\right)}^{T}, E_{(s, 1)}^{T}, E_{(s, 2)}^{T}, \ldots, E_{\left(s, m_{s}\right)}^{T}\right]^{T} Sk(x)=[E(k,1)T,E(k,2)T,…,E(k,mk)T,E(s,1)T,E(s,2)T,…,E(s,ms)T]T
PLE
But there are also problems after dividing small peers,不同taskThe role of the auxiliary supervision signal is small again(Because the difference from the independent model is only one sharedexpert,能力有限).所以PLEIt is to connect several layers of expert networks,让共享expert更强一些.

优化方法
Half of the multi-objective task optimization is to set different weights for different subtasks,损失函数加权:
L ( θ 1 , … … , θ K , θ s ) = ∑ k = 1 K ω k L k ( θ k , θ s ) L\left(\theta_{1}, \ldots \ldots, \theta_{K}, \theta_{s}\right)=\sum_{k=1}^{K} \omega_{k} L_{k}\left(\theta_{k}, \theta_{s}\right) L(θ1,……,θK,θs)=k=1∑KωkLk(θk,θs)
But this paper considers the problem of inconsistency in the training sample space in more detail:

比如用户只有点击后才能进行分享和评论.本文是在 Loss 上进行一定的优化,联合训练这些任务,在计算每个任务的损失时需要把样本空间相同的合并,并忽略不在自己样本空间的样本,即不同的任务仍使用其各自样本空间中的样本.I understand it to mean a time when the model is updated,不会同时用SHR和CTR的loss来更新
At the same time, this paper also considers different tasks to set a dynamic weight,比如task k的初始loss权重为 ω k , 0 \omega_{k, 0} ωk,0,那么在第t个epoch的时候loss权重为:
ω k ( t ) = ω k , 0 × γ k t \omega_{k}^{(t)}=\omega_{k, 0} \times \gamma_{k}^{t} ωk(t)=ωk,0×γkt
其中 γ k t \gamma_{k}^{t} γkt is the update rate of the previous step.
边栏推荐
猜你喜欢
随机推荐
57:第五章:开发admin管理服务:10:开发【从MongoDB的GridFS中,获取文件,接口】;(从GridFS中,获取文件的SOP)(不使用MongoDB的服务,可以排除其自动加载类)
数值矩阵的图形表示
nacos安装与配置
MongoDB快速上手
latex论文神器--服务器部署overleaf
【节能学院】推进农业水价综合改革的意见解读
mysql解压版简洁式本地配置方式
ARTS_202207W2
9月备考PMP,应该从哪里备考?
Risc-v Process Attack
LabVIEW 使用VISA Close真的关闭COM口了吗
Win10, the middle mouse button cannot zoom in and out in proe/creo
常用命令备查
【软考软件评测师】基于规则说明的测试技术下篇
BN BatchNorm + BatchNorm的替代新方法KNConvNets
有用的网站
安装win32gui失败,解决问题
锐捷交换机基础配置
突破边界,华为存储的破壁之旅
Arthas 常用命令








