当前位置:网站首页>From these papers in 2022, we can see the trend of recommended system sequence modeling
From these papers in 2022, we can see the trend of recommended system sequence modeling
2022-06-10 01:24:00 【PaperWeekly】

author | Schrodinger of the cat
Recently saw 22 Several articles on sequence modeling at the top meeting , The models are all complex and profound , But on closer inspection , It is found that these articles are essentially input changes , The model is only to match the input . See how the nearest top will play .

background
The purpose of sequence modeling is to mine users' interests from their historical behaviors , And then recommend items of interest to users .
First, I will introduce two classic articles on sequence modeling .
The first one is what I think is the beginning of the mountain —— Ali's DIN, The structure of the model is shown in the figure below . The paper argues that in choosing target item when ,user behaviors Medium item Should have different weights , And adopted target-attention To calculate the weight .DIN Of user behavior It's a one-dimensional item Sequence ,emb Then one more emb dimension , I hope you can remember this input format , The latter algorithm is enriching this format .

▲ DIN
The second is a long series of classic articles —— Ali's SIM, The structure of the model is shown in the figure below . stay DIN On the basis of , Lengthen the sequence of user behavior , If computing resources allow , Mindless adoption DIN It's not a bad way . go back to SIM, This is what the paper does : Keep the short-term behavior sequence , use DIEN(DIN A variant of ) To extract users' short-term interests ; A long sequence uses a two-stage approach , Recall gets topk individual item after , Calculate again target-attention.

▲ SIM
Insert a digression , Ali's MIMN Have proved , The longer the sequence ,auc The higher the .


Play method one : Sequence addition side info
This is a eBay stay WSDM 2022 Articles on .

Paper title :
Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce
Thesis link :
https://arxiv.org/pdf/2110.11072.pdf
The previous sequences were just item Sequence ,eBay Extra item Properties of , Like the price ; This property is variable , For example, when the price changes , Users may be interested .
therefore The input changes from one-dimensional sequence to 2 D matrix , The corresponding calculation attention The way has also become 2D Of ——attention2D, The model structure is shown below .

▲ attention2D
Specific calculation process :
1. Embedding Layer—— Input with attribute information , It's a 2D Matrix ;embedding More after that emb dimension ,shape by ,N Is the length of the sequence ,C Is the number of attributes ;
2. After adding multiple attribute dimensions ,emb How to turn into QKV Well ? For each attribute ( Also called channel) Do the same linear transformation , namely :

▲ Linear2D,i representative item,j Representative attribute
3. that 2D Of QK How to calculate attention score Well ? This paper introduces 3 There are two ways to calculate granularity : yes 4 D data , Represents the i individual item Of the j Attribute and the individual item Of the Interaction of attributes , Is the most granular interaction ; yes 2 Dimension group , Yes item All of the channel Sum up , It means item Interaction of dimensions ; yes 2 Dimension group , Yes channel All of the item Sum up , It means channel Interaction of dimensions . Put these three kinds of score The weighted sum is the final score 了 ( The weighting coefficient is also learned ).

▲ 2D Of attention score
4. up to now , To calculate the attention after , The output still has 2 dimension , Attributes and dimensions emb dimension . The dimensionality reduction process is performed in prediction layer Before , Attribute dimension pooling, What is left emb Wei songru mlp.

Play two : Sequence widening
This article is written by Ali in WSDM 2022 Articles on .

Paper title :
Triangle Graph Interest Network for Click-throughRate Prediction
Thesis link :
https://arxiv.org/pdf/2202.02698.pdf
No matter how long the sequence is, it is only the sequence of this user , This paper looks directly at the behavior of other users item. Build a sequence of all users' behaviors item chart , Each node on the graph is a item,item An edge means that a user has clicked on the two in turn item.
In this paper, we first define a graph triangle,triangle The definition of is not the point , You can put triangle It is simply understood as the neighbor on the graph .

▲ triangle
TGIN The structure of the model is as follows , Look at the complexity , It's very simple . The red box in the figure below is an example similar to DIN The calculation of attention Network of , On the left is processing triangle Of , An edge is a computational multiorder triangle( It can be understood as multi-level neighbors on a graph )attention Network of .

▲ TGIN
Of a certain order TriangleNet: Each level will have multiple triangle, Every triangle There will be 3 individual item, So let's start with internal aggregation (intra), ordinary avg pooling; Then external aggregation (inter),multi-head self-attention.

▲ TriangleNet

Play three : Sequence segmentation and labeling
Ali WSDM 2022

Paper title :
Modeling Users’ Contextualized Page-wise Feedback for Click-Through Rate Prediction in E-commerce Search
Thesis link :
https://guyulongcs.github.io/files/WSDM2022_RACP.pdf
Sequence with page The shape of ,page There are not only clicks poi( The positive feedback ), There is still no click poi( Negative feedback ), Capture... Within the page context The evolution of interest between information and pages . What page information can learn ? Users may not click because they don't like , Instead, there is a cheaper one of the same type in the page .

The model structure is shown below . Generally, the input of this two-layer structure , The model also has two layers , One layer extracts the information in the page (intra), A layer of aggregation between pages (inter). This model has three layers , There's a layer in the middle backtrack layer . And one more detail ,page It is filtered , Limit it to the same category as the current query vector .

▲ RACP
Focus on the interest backtracking layer , That's the green part of the picture above . General models only focus on long-term interests and target item The relevance of , While ignoring the consistency of short-term interests . Specific to this paper , Short term interest refers to every page The interest represented .
In this layer, a query vector of users' current interest is introduced ( This is a search article , So there are query vectors ), The rest No longer a real query vector , It is attention query vector.attention query vector adopt GRU Pass to the left layer by layer , influence page Internal attention Calculation .

summary
Previously, I introduced the playing methods of several sequences on the input side .
Play method one : Add item Properties of . The choice of this attribute is very particular , It must be changeable 、 And the user is very sensitive , Such as price 、 Such as subsidies ; Can not be that kind of irrelevant attribute .
Play two : Add more item, But not longer , But through the diagram ( The essence is through other users ), Introduce some users that you haven't seen before ( Or no interaction ) But you may be interested in item; This kind of addition item Can be done offline , You can use some “ On the tall ” The way to boast is to force . Personal feeling , The essence of the effectiveness of this method is Learn more about co-occurrence . I was giving user choice poi when , Only learning. user In my own history poi The co-occurrence relationship of ; In some reasonable way ( Such as the historical behavior of other users ) Introducing more co-occurrence relations can broaden the vision of the model .
Play three : Sequence segmentation ( branch page、 branch session), Mixed sequences ( It's not just a click sequence ), Most truly Restore the environment selected by the user , Infer the user's click logic within the segment , Analyze the user's choice psychology .
Read more

# cast draft through Avenue #
Let your words be seen by more people
How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ? The answer is : People you don't know .
There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities .
PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be Analysis of academic hot spots 、 Scientific research experience or Competition experience explanation etc. . We have only one purpose , Let knowledge really flow .
The basic requirements of the manuscript :
• The article is really personal Original works , Not published in public channels , For example, articles published or to be published on other platforms , Please clearly mark
• It is suggested that markdown Format writing , The pictures are sent as attachments , The picture should be clear , No copyright issues
• PaperWeekly Respect the right of authorship , And will be adopted for each original first manuscript , Provide Competitive remuneration in the industry , Specifically, according to the amount of reading and the quality of the article, the ladder system is used for settlement
Contribution channel :
• Send email :[email protected]
• Please note your immediate contact information ( WeChat ), So that we can contact the author as soon as we choose the manuscript
• You can also directly add Xiaobian wechat (pwbot02) Quick contribution , remarks : full name - contribute

△ Long press add PaperWeekly Small make up
Now? , stay 「 You know 」 We can also be found
Go to Zhihu home page and search 「PaperWeekly」
Click on 「 Focus on 」 Subscribe to our column
·

边栏推荐
- 从转载阿里开源项目 Egg.js 技术文档引发的“版权纠纷”,看宽松的 MIT 许可该如何用?
- The writing speed is increased by tens of times. The application of tdengine in tostar intelligent factory solution
- Xargs command details, the difference between xargs and pipeline
- 加密机与数据库加密产品的区别?
- Locust: a powerful tool for microservice performance testing
- Application of fire door monitoring system in a residential project
- Host computer development -- how fast is MODBUS
- 分布式数据库下子查询和 Join 等复杂 SQL 如何实现?
- Dependent auto assembly
- [GoogleCTF2019 Quals]Bnv -S
猜你喜欢

0 is it feasible to conduct software testing only by self-study? After reading this article, I am not confused

写入速度提升数十倍,TDengine 在拓斯达智能工厂解决方案上的应用

JVM records a CPU surge

datagrip的两个问题

国内现货白银有哪些好技术:常见指标的简单用法

Mysql——》事务的隔离级别

0基础入行软件测试只靠自学可行吗?看完这篇不迷茫....
![[HFCTF2020]BabyUpload](/img/84/d3117eaffb79fac54193ac0cf7348e.png)
[HFCTF2020]BabyUpload

Unity技术 - 2D项目经验

Two problems of DataGrid
随机推荐
余压监控系统在某高层住宅的应用方案
我的创作纪念日
騰訊Libco協程開源庫 源碼分析(一)---- 下載Libco 編譯安裝 嘗試運行示例代碼
上位机开发——Modbus到底有多快
The project was successful, and the project manager was the greatest contributor?
为什么程序员干两年就跑路?
Chapter 6 domain controller security
从转载阿里开源项目 Egg.js 技术文档引发的“版权纠纷”,看宽松的 MIT 许可该如何用?
【Multisim仿真】差分比例放大电路
Have you learned about arrays and slices in golang in go question bank · 1?
Luogu p2657 [scoi2009]windy number problem solving digit DP
CocosCreator旧活新整-合成大粽子
MySQL - separate database and table
洛谷P2657 [SCOI2009]windy数 题解 数位DP
What if win11 returns win10 without a return option?
写入速度提升数十倍,TDengine 在拓斯达智能工厂解决方案上的应用
PCI BAR寄存器详解(一)
正则表达式不含某字符串
Mysql——》varchar
Domain Adaptation and Graph Neural Networks


