当前位置：网站首页>【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22

【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22

2022-07-25 13:09:00 【chad_ lee】

《DC-GNN: Decoupled Graph Neural Networks for Improving and Accelerating Large-Scale E-commerce Retrieval》(WWW’22)

In the industrial scene , Tens of billions of nodes and hundreds of billions of edges are directly end-to-end GNN-based CTR Model overhead is too high , The article puts the whole GNN The framework is decoupled into three stages ： Preliminary training 、 polymerization 、CTR.

But in fact, the article moves the computational cost forward , The computational cost of graph convolution is converted into the computational cost of mining graph .

Graphs and datasets

Insert picture description here

use Taobao near 7 Days of records as a data set , There are three types of nodes ：user、query and item, Each node has rich node attributes ： equipment 、 Age etc. .

There are three kinds of edges ：query Search for item、 User browsing item、 Users to search query.

altogether 9 Billion nodes 、100 Hundred million sides .

Method

Insert picture description here

Preliminary training

Every node First use RW Generate Three Subgraph of this node , Then use GNN encoder Convolutional coding of nodes , Get the embedding, Then two pre training tasks ：

Link Prediction

$\mathcal{L}_{l i n k}=\sum_{\left(q, i_{p}\right) \in \mathcal{E}}\left(-\log \sigma\left(f_{s}\left(q, i_{p}\right)\right)-\sum_{k} \log \left(1-\sigma\left(f_{s}\left(q, i_{n}^{k}\right)\right)\right)\right)$

The case with edge connection is a positive example , Then collect k individual Negative example , Negative examples use difficult negative sample mining ：

Insert picture description here

One is choice K-hop As a negative sample ,K Can control the degree of difficulty ; One is to keep the graph structure unchanged , Replace a positive sample with a globally collected negative sample .

The article believes that these two practices can be strengthened GNN Pay more attention to the attribute learning of nodes , avoid GNN Over reliance on graph structural features , So as to alleviate over-smoothing.

Multi-view graph contrastive learning

From the second and third subgraphs embedding, Contrast learning , The same node is in two views embedding It's a case in point , Different nodes are negative . Here we only consider the calculation between nodes of the same kind InfoNCE loss, Therefore, there will be three comparative learning loss.

be-all loss Add up to total loss：
$\begin{aligned} \mathcal{L}_{q u e r y}&=\sum_{q_{1} \in v_{q}}-\log \frac{\exp \left(w f_{s}\left(q_{1}, q_{2}\right)\right)}{\sum_{v} \in \exp \left(w f_{s}\left(q_{1}, v_{2}\right)\right)}\\ \mathcal{L}_{u s e r} &=\sum_{\substack{u_{1} \in v_{u}, v_{u} \in \mathcal{V}}}-\log \frac{\exp \left(w f_{s}\left(u_{1}, u_{2}\right)\right)}{\sum_{v} \exp \left(w f_{s}\left(u_{1}, v_{2}\right)\right)} \\ \mathcal{L}_{a d} &=\sum_{\substack{i_{1} \in v_{i}, v_{i} \in \mathcal{V}}}-\log \frac{\exp \left(w f_{s}\left(i_{1}, i_{2}\right)\right)}{\sum_{v} \exp \left(w f_{s}\left(i_{1}, v_{2}\right)\right)}\\ \mathcal{L}_{\text {contra }}&=\mathcal{L}_{\text {query }}+\mathcal{L}_{\text {user }}+\mathcal{L}_{a d} \\ \mathcal{L}_{\text {total }}&=\mathcal{L}_{\text {link }}+\lambda_{1} \mathcal{L}_{\text {contra }}+\lambda_{2}\|\theta\|_{2}^{2} \end{aligned}$

Deep Aggregation

In the first stage, each node already has one embedding $X 了 , then Sample again , For each node , Sample three subgraphs of different classes , such as target node yes user, Sample three subgraphs for this node , Each subgraph shows target node outside , Each contains only user、query、item node .

Then on the basis of existing subgraphs , and SIGN Same as that one , Directly join the vectors of convolution of different orders $\left[X, A X, A^{2} X, A^{3} X\right]$ As input to the model . But there are misleading things here , Should be 1+3*3=10 Join the two vectors together , Because there are three subgraphs , Each subgraph has a third-order matrix , So it should actually be ：
$\left[X, A_1X, A_1^{2} X, A_1^{3} X,A_2X, A_2^{2} X, A_2^{3} X,A_3X, A_3^{2} X, A_3^{3} X\right]$

CTR Prediction

chart 1.3, The spliced vectors are sent into the twin towers , Conduct CTR forecast . Negative samples are exposed samples that are not clicked ：
$\mathcal{L}_{C T R}=\sum\left(-\log \sigma\left(f_{s}\left((q, u), i_{c l k}\right)\right)-\sum_{k} \log \left(1-\sigma\left(f_{s}\left((q, u), i_{p v}^{k}\right)\right)\right)\right)$

experiment

Insert picture description here

DC-GNN-Pf and DC-GNN-Pt It means skipping Deep Aggregation, Direct use pretrain Output embedding Input CTR, then fix perhaps fine-tuning embedding.

原网站

版权声明
本文为[chad_ lee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251110592590.html

当前位置：网站首页>【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22

【GCN-CTR】DC-GNN: Decoupled GNN for Improving and Accelerating Large-Scale E-commerce Retrieval WWW22

《DC-GNN: Decoupled Graph Neural Networks for Improving and Accelerating Large-Scale E-commerce Retrieval》(WWW’22)

Graphs and datasets

Method

Preliminary training

Link Prediction

Multi-view graph contrastive learning

Deep Aggregation

CTR Prediction

experiment

边栏推荐

猜你喜欢

随机推荐