当前位置:网站首页>【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22
【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22
2022-07-25 13:09:00 【chad_ lee】
《DC-GNN: Decoupled Graph Neural Networks for Improving and Accelerating Large-Scale E-commerce Retrieval》(WWW’22)
In the industrial scene , Tens of billions of nodes and hundreds of billions of edges are directly end-to-end GNN-based CTR Model overhead is too high , The article puts the whole GNN The framework is decoupled into three stages : Preliminary training 、 polymerization 、CTR.
But in fact, the article moves the computational cost forward , The computational cost of graph convolution is converted into the computational cost of mining graph .
Graphs and datasets

use Taobao near 7 Days of records as a data set , There are three types of nodes :user、query and item, Each node has rich node attributes : equipment 、 Age etc. .
There are three kinds of edges :query Search for item、 User browsing item、 Users to search query.
altogether 9 Billion nodes 、100 Hundred million sides .
Method

Preliminary training
Every node First use RW Generate Three Subgraph of this node , Then use GNN encoder Convolutional coding of nodes , Get the embedding, Then two pre training tasks :
Link Prediction
L l i n k = ∑ ( q , i p ) ∈ E ( − log σ ( f s ( q , i p ) ) − ∑ k log ( 1 − σ ( f s ( q , i n k ) ) ) ) \mathcal{L}_{l i n k}=\sum_{\left(q, i_{p}\right) \in \mathcal{E}}\left(-\log \sigma\left(f_{s}\left(q, i_{p}\right)\right)-\sum_{k} \log \left(1-\sigma\left(f_{s}\left(q, i_{n}^{k}\right)\right)\right)\right) Llink=(q,ip)∈E∑(−logσ(fs(q,ip))−k∑log(1−σ(fs(q,ink))))
The case with edge connection is a positive example , Then collect k individual Negative example , Negative examples use difficult negative sample mining :

One is choice K-hop As a negative sample ,K Can control the degree of difficulty ; One is to keep the graph structure unchanged , Replace a positive sample with a globally collected negative sample .
The article believes that these two practices can be strengthened GNN Pay more attention to the attribute learning of nodes , avoid GNN Over reliance on graph structural features , So as to alleviate over-smoothing.
Multi-view graph contrastive learning
From the second and third subgraphs embedding, Contrast learning , The same node is in two views embedding It's a case in point , Different nodes are negative . Here we only consider the calculation between nodes of the same kind InfoNCE loss, Therefore, there will be three comparative learning loss.
be-all loss Add up to total loss:
L q u e r y = ∑ q 1 ∈ v q − log exp ( w f s ( q 1 , q 2 ) ) ∑ v ∈ exp ( w f s ( q 1 , v 2 ) ) L u s e r = ∑ u 1 ∈ v u , v u ∈ V − log exp ( w f s ( u 1 , u 2 ) ) ∑ v exp ( w f s ( u 1 , v 2 ) ) L a d = ∑ i 1 ∈ v i , v i ∈ V − log exp ( w f s ( i 1 , i 2 ) ) ∑ v exp ( w f s ( i 1 , v 2 ) ) L contra = L query + L user + L a d L total = L link + λ 1 L contra + λ 2 ∥ θ ∥ 2 2 \begin{aligned} \mathcal{L}_{q u e r y}&=\sum_{q_{1} \in v_{q}}-\log \frac{\exp \left(w f_{s}\left(q_{1}, q_{2}\right)\right)}{\sum_{v} \in \exp \left(w f_{s}\left(q_{1}, v_{2}\right)\right)}\\ \mathcal{L}_{u s e r} &=\sum_{\substack{u_{1} \in v_{u}, v_{u} \in \mathcal{V}}}-\log \frac{\exp \left(w f_{s}\left(u_{1}, u_{2}\right)\right)}{\sum_{v} \exp \left(w f_{s}\left(u_{1}, v_{2}\right)\right)} \\ \mathcal{L}_{a d} &=\sum_{\substack{i_{1} \in v_{i}, v_{i} \in \mathcal{V}}}-\log \frac{\exp \left(w f_{s}\left(i_{1}, i_{2}\right)\right)}{\sum_{v} \exp \left(w f_{s}\left(i_{1}, v_{2}\right)\right)}\\ \mathcal{L}_{\text {contra }}&=\mathcal{L}_{\text {query }}+\mathcal{L}_{\text {user }}+\mathcal{L}_{a d} \\ \mathcal{L}_{\text {total }}&=\mathcal{L}_{\text {link }}+\lambda_{1} \mathcal{L}_{\text {contra }}+\lambda_{2}\|\theta\|_{2}^{2} \end{aligned} LqueryLuserLadLcontra Ltotal =q1∈vq∑−log∑v∈exp(wfs(q1,v2))exp(wfs(q1,q2))=u1∈vu,vu∈V∑−log∑vexp(wfs(u1,v2))exp(wfs(u1,u2))=i1∈vi,vi∈V∑−log∑vexp(wfs(i1,v2))exp(wfs(i1,i2))=Lquery +Luser +Lad=Llink +λ1Lcontra +λ2∥θ∥22
Deep Aggregation
In the first stage, each node already has one embedding $X 了 , then Sample again , For each node , Sample three subgraphs of different classes , such as target node yes user, Sample three subgraphs for this node , Each subgraph shows target node outside , Each contains only user、query、item node .
Then on the basis of existing subgraphs , and SIGN Same as that one , Directly join the vectors of convolution of different orders [ X , A X , A 2 X , A 3 X ] \left[X, A X, A^{2} X, A^{3} X\right] [X,AX,A2X,A3X] As input to the model . But there are misleading things here , Should be 1+3*3=10 Join the two vectors together , Because there are three subgraphs , Each subgraph has a third-order matrix , So it should actually be :
[ X , A 1 X , A 1 2 X , A 1 3 X , A 2 X , A 2 2 X , A 2 3 X , A 3 X , A 3 2 X , A 3 3 X ] \left[X, A_1X, A_1^{2} X, A_1^{3} X,A_2X, A_2^{2} X, A_2^{3} X,A_3X, A_3^{2} X, A_3^{3} X\right] [X,A1X,A12X,A13X,A2X,A22X,A23X,A3X,A32X,A33X]
CTR Prediction
chart 1.3, The spliced vectors are sent into the twin towers , Conduct CTR forecast . Negative samples are exposed samples that are not clicked :
L C T R = ∑ ( − log σ ( f s ( ( q , u ) , i c l k ) ) − ∑ k log ( 1 − σ ( f s ( ( q , u ) , i p v k ) ) ) ) \mathcal{L}_{C T R}=\sum\left(-\log \sigma\left(f_{s}\left((q, u), i_{c l k}\right)\right)-\sum_{k} \log \left(1-\sigma\left(f_{s}\left((q, u), i_{p v}^{k}\right)\right)\right)\right) LCTR=∑(−logσ(fs((q,u),iclk))−k∑log(1−σ(fs((q,u),ipvk))))
experiment

DC-GNN-Pf and DC-GNN-Pt It means skipping Deep Aggregation, Direct use pretrain Output embedding Input CTR, then fix perhaps fine-tuning embedding.
边栏推荐
- Zero basic learning canoe panel (16) -- clock control/panel control/start stop control/tab control
- CONDA common commands: install, update, create, activate, close, view, uninstall, delete, clean, rename, change source, problem
- AtCoder Beginner Contest 261 F // 树状数组
- 并发编程 — 内存模型 JMM
- Ministry of Public Security: the international community generally believes that China is one of the safest countries in the world
- 卷积神经网络模型之——VGG-16网络结构与代码实现
- Selenium uses -- XPath and analog input and analog click collaboration
- 【问题解决】org.apache.ibatis.exceptions.PersistenceException: Error building SqlSession.1 字节的 UTF-8 序列的字
- 业务可视化-让你的流程图'Run'起来(3.分支选择&跨语言分布式运行节点)
- "Autobiography of Franklin" cultivation
猜你喜欢

ESP32-C3 基于Arduino框架下Blinker点灯控制10路开关或继电器组

2022 年中回顾 | 大模型技术最新进展 澜舟科技

G027-OP-INS-RHEL-04 RedHat OpenStack 创建自定义的QCOW2格式镜像

如何用因果推断和实验驱动用户增长? | 7月28日TF67

Cyberspace Security penetration attack and defense 9 (PKI)

Mid 2022 review | latest progress of large model technology Lanzhou Technology

The larger the convolution kernel, the stronger the performance? An interpretation of replknet model
![[300 opencv routines] 239. accurate positioning of Harris corner detection (cornersubpix)](/img/a6/c45a504722f5fd6e3c9fb8e51c6bb5.png)
[300 opencv routines] 239. accurate positioning of Harris corner detection (cornersubpix)
![[problem solving] org.apache.ibatis.exceptions PersistenceException: Error building SqlSession. 1-byte word of UTF-8 sequence](/img/fd/245306273e464c04f3292132fbfa2f.png)
[problem solving] org.apache.ibatis.exceptions PersistenceException: Error building SqlSession. 1-byte word of UTF-8 sequence
![[CSDN year-end summary] end and start, always on the way -](/img/51/a3fc5eba0eeb22b600260ee81ff9e6.png)
[CSDN year-end summary] end and start, always on the way - "2021 summary of" 1+1= Wang "
随机推荐
G027-OP-INS-RHEL-04 RedHat OpenStack 创建自定义的QCOW2格式镜像
Mid 2022 review | latest progress of large model technology Lanzhou Technology
A hard journey
Ministry of Public Security: the international community generally believes that China is one of the safest countries in the world
"Wei Lai Cup" 2022 Niuke summer multi school training camp 2 supplementary problem solution (g, J, K, l)
State mode
Introduction to web security UDP testing and defense
【CSDN 年终总结】结束与开始,一直在路上—— “1+1=王”的2021总结
Force deduction 83 biweekly T4 6131. The shortest dice sequence impossible to get, 303 weeks T4 6127. The number of high-quality pairs
Mysql 远程连接权限错误1045问题
录制和剪辑视频,如何解决占用空间过大的问题?
[300 opencv routines] 239. accurate positioning of Harris corner detection (cornersubpix)
Detailed explanation of switch link aggregation [Huawei ENSP]
Detailed explanation of the training and prediction process of deep learning [taking lenet model and cifar10 data set as examples]
【AI4Code】《Contrastive Code Representation Learning》 (EMNLP 2021)
Pytorch creates its own dataset and loads the dataset
Shell Basics (exit control, input and output, etc.)
[operation and maintenance, implementation of high-quality products] interview skills for technical positions with a monthly salary of 10k+
[review SSM framework series] 15 - Summary of SSM series blog posts [SSM kill]
ECCV 2022 | climb to the top semantickitti! Semantic segmentation of LIDAR point cloud based on two-dimensional prior assistance