当前位置：网站首页>[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22

[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22

2022-07-25 12:00:00 【chad_ lee】

《TransRec: Learning Transferable Recommendation from Mixture-of-Modality Feedback》 Arxiv’22

NLP and CV Domain pre training and large models have become very popular , Come forth BERT、GPT-3、ViT etc. , Realized one4all normal form , That is, a general large model serves almost all downstream tasks . But the development of recommendation system in this direction is slow , The mobility range of the model is limited , It is usually only applicable to the business scenario within a company , It is impossible to achieve portability and versatility in a broad sense .

The article first points out ： This is mainly because RS Over reliance on users ID With items ID Information , be based on ID Collaborative filtering paradigm makes RS Break away from complex content modeling , also DL+GCN It makes CF The performance of has experienced a period of improvement , Has dominated the field of recommendation systems . But based on ID Of RS There has been a serious bottleneck in the performance of , Approaching the ceiling , and ID Its unshareable nature leads to almost no migration .

Therefore, it is proposed that ID Back to content-based recommendations , Realization General recommendation system for large-scale mixed mode .

Mixed mode scene

Insert picture description here

The implementation of general recommendation is based on a common recommendation scenario , That is, the interaction behavior of users' items is determined by ** Mixed mode （MoM: Mixture-of-modality）** Composition of items , The object of user interaction can be text （text） form , Vision （vision）（ Images / Video etc. ） form , Or both modes exist . This paper begins with MoM Of source domain Next pre training model , This allows you to migrate to any domain Downstream tasks of .

The dataset is QQ News recommendation scenario of browser ,7 A record of days .

TransRec

Insert picture description here

Item Encoder

First item encoder It's pre training BERT and ResNet-18, That is, the yellow and green color block in the above figure .

For the text item i, take word token Sequence $\boldsymbol{t}=\left[t_{1}, t_{2}, \ldots, t_{k}\right]$ Input BERT, And then pass by self- attention pooling Get the text item The final characterization of ：
$Z_{i, t}=\operatorname{SelfAtt}(\operatorname{BERT}(\boldsymbol{t}))$
For the picture item i, take ResNet Output. feature map Have a time MLP, Get pictures item The final characterization of ：
$\boldsymbol{Z}_{\boldsymbol{i}, \boldsymbol{v}}=\operatorname{MLP}(\operatorname{ResNet}(\boldsymbol{v})) \text {. }$

User Encoder

The user is represented by his item interaction sequence , therefore User Encoder The input of is user interaction item Of embedding, And then use BERT（ Write it down as $BERT_u$ ） Get the representation of user interaction sequence , As a user embedding, And then item embedding Calculate similarity , there BERT Is one-way , Use the last item The output of as a representation of the sequence ：
$\begin{aligned} &\boldsymbol{S}^{u}=\boldsymbol{Z}^{u}+\boldsymbol{P}^{u} \\ &\boldsymbol{U}^{u}=E_{u}\left(\boldsymbol{S}^{u}\right)=\operatorname{BERT}_{\mathrm{u}}\left(\boldsymbol{S}^{u}\right) \end{aligned}$

Training methods

Two stage pre training

User Encoder Preliminary training

Conduct self-monitoring on user encoder pretraining . say concretely , The generation pre training from left to right is used to predict the next in the user interaction sequence item, That is, pre training one-way BERT Use softmax Cross entropy loss as objective function
$\begin{aligned} &\tilde{\boldsymbol{y}}_{t}=\operatorname{Softmax}\left(\operatorname{RELU}\left(\boldsymbol{S}^{\prime}{ }_{t} \boldsymbol{W}^{\boldsymbol{U}}+\boldsymbol{b}^{\boldsymbol{U}}\right)\right) \\ &\mathcal{L}_{\text {UEP }}=-\sum_{u \in U} \sum_{t \in[1, \ldots, n]}\left(\boldsymbol{y}_{\boldsymbol{t}} \log \left(\tilde{\boldsymbol{y}}_{\boldsymbol{t}}\right)\right) \end{aligned}$
there $W^U, b^U$ Then discarded , Because it uses inner product as similarity matching . $S^{\prime}{ }_{t}$ It is the representation of sequence .

End-to-End Twin tower training

Training at the same time item encoder and user encoder, The purpose is similar to that of previous hybrid experts , In order to make the characteristics of text and pictures encoder Adapt to the current situation as soon as possible domain. utilize Contrastive Predictive Coding (CPC) Method training , The idea is shown in the figure above , Divide a sequence into two segments , According to the prediction of the previous paragraph, all in the next paragraph item, Therefore, it is the same as the task scenario of collaborative filtering ：
$\mathcal{L}_{\mathbf{C P C}}=-\sum_{u \in U}\left[\sum_{t=n+1}^{n+l} \log \left(\sigma\left(\boldsymbol{r}_{\boldsymbol{u}, \boldsymbol{t}}\right)\right)+\sum_{g=1}^{j} \log \left(1-\sigma\left(\boldsymbol{r}_{\boldsymbol{u}, \boldsymbol{g}}\right)\right)\right]$
$g$ Is a random negative sample .

experiment

Insert picture description here

be based on ID The method of domain The effect is not as good as the method based on modal content , Pre training is also better than direct training .

Insert picture description here

The paper also verifies the upper limit of the data to the model , More pre training data for TransRec The greater the performance improvement of , There is an endless stream of data in industry that can be expanded .

原网站

版权声明
本文为[chad_ lee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251110592154.html

当前位置：网站首页>[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22

[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22

《TransRec: Learning Transferable Recommendation from Mixture-of-Modality Feedback》 Arxiv’22

Mixed mode scene

TransRec

Item Encoder

User Encoder

Training methods

User Encoder Preliminary training

End-to-End Twin tower training

experiment

边栏推荐

猜你喜欢

随机推荐