当前位置：网站首页>【GCN-RS】Region or Global? A Principle for Negative Sampling in Graph-based Recommendation (TKDE‘22)

【GCN-RS】Region or Global? A Principle for Negative Sampling in Graph-based Recommendation (TKDE‘22)

2022-07-25 13:09:00 【chad_ lee】

Region or Global? A Principle for Negative Sampling in Graph-based Recommendation (TKDE’22)

Middle zone

Insert picture description here

stay GCN-RS in , Negative sampling should choose distance user In the middle , Areas that are too close are usually items aggregated by users , It's too far away to have much information , Samples that are not far or near can be regarded as difficult samples with a large amount of information .

Define the distance user k The item of the stage is the middle area $R_{m e d}$ , You can use the breadth first search layer by layer to get $R_{m e d}$ .

From the middle area $R_{m e d}$ choice M Negative samples form Negative sample candidate set $C_{u}$ , And then from $C_{u}$ Two strategies are used to get hard negative samples .

The author claims that the intermediate region $R_{m e d}$ Much smaller than the entire item set , But my statistics on the data set is just 3-hop, most $R_{m e d}$ It contains almost the entire data set ：
Insert picture description here

A strategy ： Positive sample assistance

Be similar to MixUp technology , In order to get user Hard negative sample , Not only with user $u$ of , And also user A positive sample of $v$ of , For a pair of positive samples $(u, v)$ , $C_{u}$ Medium negative sample $v_{n}^{p}$ The probability of being picked is ：
$p_{n}\left(v_{n}^{p} \mid(u, v)\right)=\frac{\sigma\left(\alpha\left(\mathbf{e}_{u}^{*} \cdot \mathbf{e}_{v_{n}^{p}}^{*}\right)+(1-\alpha)\left(\mathbf{e}_{v}^{*} \cdot \mathbf{e}_{v_{n}^{p}}^{*}\right)\right)}{\sum_{v_{i} \in \mathcal{C}_{u}} \sigma\left(\alpha\left(\mathbf{e}_{u}^{*} \cdot \mathbf{e}_{v_{i}}^{*}\right)+(1-\alpha)\left(\mathbf{e}_{v}^{*} \cdot \mathbf{e}_{v_{i}}^{*}\right)\right)}$
among $\alpha$ It's a super parameter. , Used to balance the impact of users and items . Intuitively understand the meaning of this formula ： $C_{u}$ in ,embedding Distance from users $u$ And positive samples $v$ The closer the object , The greater the probability of being a negative sample . from $p_n$ Zhongcai k Negative samples form a negative sample set ： $\mathcal{P}_{k}=\left\{v_{n}^{p}\right\}$

10% Pseudo label .

10~20% Hard negative sample .

Strategy two ： The exposure did not click

You can't choose more , Because exposure without clicking contains a strong bias Information , Only select exposure without clicking **“ The hardest one ”**：
$v_{n}^{e}=\underset{v_{i} \in \mathcal{M}_{u}}{\operatorname{argmax}} \sigma\left(\beta\left(\mathbf{e}_{u}^{*} \cdot \mathbf{e}_{v_{i}}^{*}\right)\right)$

$\beta=\left\{\begin{array}{lr} 1, & \text { if } v_{e} \text { not in } \mathcal{C}_{u} \\ \text { number of exposed items, } & \text { if } v_{e} \text { in } \mathcal{C}_{u} \end{array}\right.$

$\mathcal{M}_{u}$ Is the exposure of the UN clicked collection , $\beta$ Cumulative score , But if it's not in the middle area , No count .

Negative sample fusion

because GCN-RS The essence is iteratively to the user - In the commodity map embedding Spread the message , So combine these negative sample strategies into embedding Space , among k It's the number of negative samples ：
$\begin{gathered} \mathbf{e}_{v_{n}}^{*}=\underset{v_{n}^{p} \in \mathcal{P}_{k}}{\operatorname{merge}}\left(\mathbf{e}_{v_{n}^{e}}^{*}, \mathbf{e}_{v_{n}^{p}}^{*}\right) \\ \operatorname{merge}\left(\mathbf{e}_{v_{n}^{e}}, \mathbf{e}_{v_{n}^{p}}\right)=\frac{1}{k} \cdot \mathbf{e}_{v_{n}^{e}}+\left(1-\frac{1}{k}\right) \cdot \mathbf{e}_{v_{n}^{p}} \end{gathered}$
So finally, for a positive sample , Create a negative sample , And then use margin hinge loss Train this sample ：
$\mathcal{L}=\max \left(0, \mathbf{e}_{u}^{*} \cdot \mathbf{e}_{v_{n}}^{*}-\mathbf{e}_{u}^{*} \cdot \mathbf{e}_{v}^{*}+\gamma\right)$

原网站

版权声明
本文为[chad_ lee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251110591900.html