当前位置:网站首页>Looking at the trend of sequence modeling of recommended systems in 2022 from the top paper

Looking at the trend of sequence modeling of recommended systems in 2022 from the top paper

2022-07-06 02:24:00 kaiyuan_ sjtu

format,png

author  |  Schrodinger of the cat

Recently saw 22  Several articles on sequence modeling at the top meeting , The models are all complex and profound , But on closer inspection , It is found that these articles are essentially input changes , The model is only to match the input . See how the nearest top will play .

4770051e31aa7ddc322972d2130b1166.png

background

The purpose of sequence modeling is to mine users' interests from their historical behaviors , And then recommend items of interest to users .

First, I will introduce two classic articles on sequence modeling .

The first one is what I think is the beginning of the mountain —— Ali's DIN, The structure of the model is shown in the figure below . The paper argues that in choosing target item when ,user behaviors Medium item Should have different weights , And adopted target-attention To calculate the weight .DIN Of user behavior It's a one-dimensional item Sequence ,emb Then one more emb dimension , I hope you can remember this input format , The latter algorithm is enriching this format .

ad165b6624f66301dc3ac4a8e8cc496a.png

▲ DIN

The second is a long series of classic articles —— Ali's SIM, The structure of the model is shown in the figure below . stay DIN On the basis of , Lengthen the sequence of user behavior , If computing resources allow , Mindless adoption DIN It's not a bad way . go back to SIM, This is what the paper does : Keep the short-term behavior sequence , use DIEN(DIN A variant of ) To extract users' short-term interests ; A long sequence uses a two-stage approach , Recall gets topk individual item after , Calculate again target-attention.

9b1a458f273036c5e9883418c1be534d.png

▲ SIM

Insert a digression , Ali's MIMN Have proved , The longer the sequence ,auc The higher the .

f2bd8409b5910a38732ec55b1efc1f34.png

743769bec7302ed413f7388a3046b223.png

Play method one : Sequence addition side info

This is a eBay stay WSDM 2022 Articles on .

ad117b52bbce0e32c63bf0b9046d0e2c.png

Paper title :

Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

Thesis link :

https://arxiv.org/pdf/2110.11072.pdf

The previous sequences were just item Sequence ,eBay Extra item Properties of , Like the price ; This property is variable , For example, when the price changes , Users may be interested .

therefore The input changes from one-dimensional sequence to 2 D matrix , The corresponding calculation attention The way has also become 2D Of ——attention2D, The model structure is shown below .

8dd7efe92b8960ae9824a1872adfe782.png

▲ attention2D

Specific calculation process :

1. Embedding Layer—— Input with attribute information , It's a 2D Matrix ;embedding More after that emb dimension ,shape by ,N Is the length of the sequence ,C Is the number of attributes ;

2. After adding multiple attribute dimensions ,emb How to turn into QKV Well ? For each attribute ( Also called channel) Do the same linear transformation , namely :

7eff5cd3261d5bd888599329d58e14d7.png

▲ Linear2D,i representative item,j Representative attribute

3. that 2D Of QK How to calculate attention score Well ? This paper introduces 3 There are two ways to calculate granularity : yes 4 D data , Represents the i individual item Of the j Attribute and the individual item Of the Interaction of attributes , Is the most granular interaction ; yes 2 Dimension group , Yes item All of the channel Sum up , It means item Interaction of dimensions ; yes 2 Dimension group , Yes channel All of the item Sum up , It means channel Interaction of dimensions . Put these three kinds of score The weighted sum is the final score 了 ( The weighting coefficient is also learned ).

9c925b56a2fec58263955e95a1ab6734.png

▲ 2D Of attention score

4. up to now , To calculate the attention after , The output still has 2 dimension , Attributes and dimensions emb dimension . The dimensionality reduction process is performed in prediction layer Before , Attribute dimension pooling, What is left emb Wei songru mlp.

d2b62d40561ebdc1e92de315d442023a.png

Play two : Sequence widening

This article is written by Ali in WSDM 2022 Articles on .

e3bcc58809a746655e16e65b633cc1a1.png

Paper title :

Triangle Graph Interest Network for Click-throughRate Prediction

Thesis link :

https://arxiv.org/pdf/2202.02698.pdf

No matter how long the sequence is, it is only the sequence of this user , This paper looks directly at the behavior of other users item. Build a sequence of all users' behaviors item chart , Each node on the graph is a item,item An edge means that a user has clicked on the two in turn item.

In this paper, we first define a graph triangle,triangle The definition of is not the point , You can put triangle It is simply understood as the neighbor on the graph .

f4209721ecc6df48508b78edde2c73c1.png

▲ triangle

TGIN The structure of the model is as follows , Look at the complexity , It's very simple . The red box in the figure below is an example similar to DIN The calculation of attention Network of , On the left is processing triangle Of , An edge is a computational multiorder triangle( It can be understood as multi-level neighbors on a graph )attention Network of .

c23979b1b7826e61555a007865ce1d44.png

▲ TGIN

Of a certain order TriangleNet: Each level will have multiple triangle, Every triangle There will be 3 individual item, So let's start with internal aggregation (intra), ordinary avg pooling; Then external aggregation (inter),multi-head self-attention.

9665da3eace44f3334c3da48f33b290d.png

▲ TriangleNet

d0285f4e45ce9bb338c62b4afe56e51e.png

Play three : Sequence segmentation and labeling

Ali WSDM 2022

ce1bc120fbc448a9b16c9d3f28f291c9.png

Paper title :

Modeling Users’ Contextualized Page-wise Feedback for Click-Through Rate Prediction in E-commerce Search

Thesis link

https://guyulongcs.github.io/files/WSDM2022_RACP.pdf

Sequence with page The shape of ,page There are not only clicks poi( The positive feedback ), There is still no click poi( Negative feedback ), Capture... Within the page context The evolution of interest between information and pages . What page information can learn ? Users may not click because they don't like , Instead, there is a cheaper one of the same type in the page .

c3847eee155a16342042253c2a3c99cf.png

The model structure is shown below . Generally, the input of this two-layer structure , The model also has two layers , One layer extracts the information in the page (intra), A layer of aggregation between pages (inter). This model has three layers , There's a layer in the middle backtrack layer . And one more detail ,page It is filtered , Limit it to the same category as the current query vector .

9c71d800a381e82f7e411e788f9628cf.png

▲ RACP

Focus on the interest backtracking layer , That's the green part of the picture above . General models only focus on long-term interests and target item The relevance of , While ignoring the consistency of short-term interests . Specific to this paper , Short term interest refers to every page The interest represented .

In this layer, a query vector of users' current interest is introduced ( This is a search article , So there are query vectors ), The rest No longer a real query vector , It is attention query vector.attention query vector adopt GRU Pass to the left layer by layer , influence page Internal attention Calculation .

bc3686de636a24fa15ad45132bd9ed6e.png

summary

Previously, I introduced the playing methods of several sequences on the input side .

Play method one : Add item Properties of . The choice of this attribute is very particular , It must be changeable 、 And the user is very sensitive , Such as price 、 Such as subsidies ; Can not be that kind of irrelevant attribute .

Play two : Add more item, But not longer , But through the diagram ( The essence is through other users ), Introduce some users that you haven't seen before ( Or no interaction ) But you may be interested in item; This kind of addition item Can be done offline , You can use some “ On the tall ” The way to boast is to force . Personal feeling , The essence of the effectiveness of this method is Learn more about co-occurrence . I was giving user choice poi when , Only learning. user In my own history poi The co-occurrence relationship of ; In some reasonable way ( Such as the historical behavior of other users ) Introducing more co-occurrence relations can broaden the vision of the model .

Play three : Sequence segmentation ( branch page、 branch session), Mixed sequences ( It's not just a click sequence ), Most truly Restore the environment selected by the user , Infer the user's click logic within the segment , Analyze the user's choice psychology .

Communicate together

I want to learn and progress with you !『NewBeeNLP』 At present, many communication groups in different directions have been established ( machine learning / Deep learning / natural language processing / Search recommendations / Figure network / Interview communication /  etc. ), Quota co., LTD. , Quickly add the wechat below to join the discussion and exchange !( Pay attention to it o want Notes Can pass )

format,png

b39dea6894f37d12310ebf481acc7334.gif

原网站

版权声明
本文为[kaiyuan_ sjtu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060212533465.html