当前位置：网站首页>Learning to pre train graph neural networks

Learning to pre train graph neural networks

2022-07-25 12:02:00 【Shangshanxianger】

Bloggers once collated an article Figure pre training article , Since then, there have been many Graph Do on Pretraning Articles are emerging one after another , But basically ten thousand changes cannot leave its origin , It's all in node-level and graph-level Do self supervised learning .

Why is the self-monitoring strategy effective ？

Multilayer structure , Go again when the lower floor is fixed train The upper
multitasking , You can go to bias More general
Pre training in the same field , Learn more about

But there is always a gap between pre training and fine-tuning , How to solve this gap Become a thorny problem , In this blog post, bloggers will sort out several solutions .

Learning to Pre-train Graph Neural Networks
This article comes from AAAI 2021. Its core idea is actually ： How to alleviate GNN Optimization error between pre training and fine tuning ？

First, the author demonstrates GNN Pre training is a two-stage process ：

Pre-traning. First, carry out pre training on large-scale graph data set . That is, for parameters theta Update to minimize ： $\theta_0=argmin_{\theta} L^{pre}(f_{\theta};D^{pre})$
Fine-tuning. Fine tune downstream data . Train with the last step $\theta_0$ Fine tune on , That is, gradient descent ： $\theta_1=\theta_0-\eta \nabla_{\theta_0} L^{fine}(f_{\theta_0};D^{tr})$

The author believes that there are some differences between these two steps , That is to say fine-turning Although it is used $\theta_0$ , but $\theta_0$ Is constant , It is right fine-tuning The data is invisible , That is, how to fine tune downstream will not be considered . This will cause Pre-traning and Fine-tuning Optimization deviation between , This difference affects the migration effect of the pre training model to a certain extent .

therefore , The author proposes a self supervised pre training strategy L2P-GNN, Two key points bloggers think are ：

stay pre–traning In doing Fine-tuning. That is, since there is gap, So in pre–traning In the process of Fine-tuning Things are good . Some similar borrowings Meta learning Thought , Learn how to learn.
stay node-level and graph-level Do self supervised learning .

Insert picture description here
The model architecture is shown in the figure above , Here's the important thing task construction and dual adaptation These two parts .

Task Construction
In order to be in pre–traning In the process of Fine-tuning Things about , The author's idea is to divide the data set into training and testing Just fine . For needs Pre-training The multiple task, Every task Will be divided like this , Corresponds to support set and query set.

In order to simulate the fine-tuning on the downstream training set , Just train the loss function directly on the support set to obtain transferable prior knowledge , Then we can adapt its performance on the query set .

Dual Adaptation
In order to narrow the gap between pre training and fine-tuning process , In the process of pre training, the ability of optimizing the model to quickly adapt to new tasks is very important . In order to encode both local and global information into prior information , Therefore, the author proposes that dual adaptation is node and graph Update at two levels .

Node level adaptation .. This method is consistent with the previous article , It is also to sample and then calculate ： $L^{node}(\psi;S^c_G)=\sum -ln(\sigma(h^T_uh_v))-ln(\sigma(h^T_uh_v'))$ At this point, update the parameters at the node level ： $\psi'=\psi-\alpha \frac{\partial \sum L^{node}(\psi;S^c_G)}{\partial \psi}$
Map level adaptation . alike , Calculate by using the method of subgraph （ The diagram is represented by pooling obtain ）： $L^{graph}(\omega;S_G)=\sum -log(\sigma(h^T_{S^c_G}h_G))-ln(\sigma(h^T_{S^c_G}h_G'))$ Then the graph level parameters are updated ： $\omega'=\omega-\beta \frac{\partial L^{graph}(\omega;S_G)}{\partial \omega}$
Optimization of prior knowledge . After node level and graph level adaptation , Global prior knowledge has been adapted $\theta$ For task specific knowledge $\theta'=\{\psi',\omega'\}$ . Then use it to optimize the back propagation $\theta$ ：
$\theta \leftarrow \theta-\gamma \frac{\partial \sum L(\theta';Q_G)}{\partial \theta}$ $L(\theta';Q_G)=\frac{1}{k}\sum L^{node}(\psi;S^c_G)+L^{graph}(\omega;S_G)$

paper：https://yuanfulu.github.io/publication/AAAI-L2PGNN.pdf
code：https://github.com/rootlu/L2P-GNN

Insert picture description here
Adaptive Transfer Learning on GNN
come from KDD2021. The traditional pre training scheme does not design downstream adaptive learning , It is impossible to achieve consistency between upstream and downstream . Therefore, the author designs a weight model with the help of meta learning adaptive auxilizry loss weighting model To control the upstream self-supervised Tasks and downstream target task Consistency between .

traditional method . Self supervised task learning on a large amount of unlabeled data + Use the node representation learned by the self supervised task to assist the learning of the target task .
The author's transfer Method . use joint loss To fine tune the parameters , This will adaptively preserve pre-training Effective information of the stage , That is, by calculating the cosine similarity between the auxiliary task and the target task gradient similarity To learn Adaptive Auxiliary Loss Weighting, To quantify the consistency between the auxiliary task and the target task .

paper：https://arxiv.org/abs/2107.08765

原网站

版权声明
本文为[Shangshanxianger]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251108383788.html

当前位置：网站首页>Learning to pre train graph neural networks

Learning to pre train graph neural networks

边栏推荐

猜你喜欢

随机推荐