当前位置:网站首页>Learning to pre train graph neural networks
Learning to pre train graph neural networks
2022-07-25 12:02:00 【Shangshanxianger】
Bloggers once collated an article Figure pre training article , Since then, there have been many Graph Do on Pretraning Articles are emerging one after another , But basically ten thousand changes cannot leave its origin , It's all in node-level and graph-level Do self supervised learning .
Why is the self-monitoring strategy effective ?
- Multilayer structure , Go again when the lower floor is fixed train The upper
- multitasking , You can go to bias More general
- Pre training in the same field , Learn more about
But there is always a gap between pre training and fine-tuning , How to solve this gap Become a thorny problem , In this blog post, bloggers will sort out several solutions .
Learning to Pre-train Graph Neural Networks
This article comes from AAAI 2021. Its core idea is actually : How to alleviate GNN Optimization error between pre training and fine tuning ?
First, the author demonstrates GNN Pre training is a two-stage process :
- Pre-traning. First, carry out pre training on large-scale graph data set . That is, for parameters theta Update to minimize : θ 0 = a r g m i n θ L p r e ( f θ ; D p r e ) \theta_0=argmin_{\theta} L^{pre}(f_{\theta};D^{pre}) θ0=argminθLpre(fθ;Dpre)
- Fine-tuning. Fine tune downstream data . Train with the last step θ 0 \theta_0 θ0 Fine tune on , That is, gradient descent : θ 1 = θ 0 − η ∇ θ 0 L f i n e ( f θ 0 ; D t r ) \theta_1=\theta_0-\eta \nabla_{\theta_0} L^{fine}(f_{\theta_0};D^{tr}) θ1=θ0−η∇θ0Lfine(fθ0;Dtr)
The author believes that there are some differences between these two steps , That is to say fine-turning Although it is used θ 0 \theta_0 θ0, but θ 0 \theta_0 θ0 Is constant , It is right fine-tuning The data is invisible , That is, how to fine tune downstream will not be considered . This will cause Pre-traning and Fine-tuning Optimization deviation between , This difference affects the migration effect of the pre training model to a certain extent .
therefore , The author proposes a self supervised pre training strategy L2P-GNN, Two key points bloggers think are :
- stay pre–traning In doing Fine-tuning. That is, since there is gap, So in pre–traning In the process of Fine-tuning Things are good . Some similar borrowings Meta learning Thought , Learn how to learn.
- stay node-level and graph-level Do self supervised learning .

The model architecture is shown in the figure above , Here's the important thing task construction and dual adaptation These two parts .
Task Construction
In order to be in pre–traning In the process of Fine-tuning Things about , The author's idea is to divide the data set into training and testing Just fine . For needs Pre-training The multiple task, Every task Will be divided like this , Corresponds to support set and query set.
In order to simulate the fine-tuning on the downstream training set , Just train the loss function directly on the support set to obtain transferable prior knowledge , Then we can adapt its performance on the query set .
Dual Adaptation
In order to narrow the gap between pre training and fine-tuning process , In the process of pre training, the ability of optimizing the model to quickly adapt to new tasks is very important . In order to encode both local and global information into prior information , Therefore, the author proposes that dual adaptation is node and graph Update at two levels .
- Node level adaptation .. This method is consistent with the previous article , It is also to sample and then calculate : L n o d e ( ψ ; S G c ) = ∑ − l n ( σ ( h u T h v ) ) − l n ( σ ( h u T h v ′ ) ) L^{node}(\psi;S^c_G)=\sum -ln(\sigma(h^T_uh_v))-ln(\sigma(h^T_uh_v')) Lnode(ψ;SGc)=∑−ln(σ(huThv))−ln(σ(huThv′)) At this point, update the parameters at the node level : ψ ′ = ψ − α ∂ ∑ L n o d e ( ψ ; S G c ) ∂ ψ \psi'=\psi-\alpha \frac{\partial \sum L^{node}(\psi;S^c_G)}{\partial \psi} ψ′=ψ−α∂ψ∂∑Lnode(ψ;SGc)
- Map level adaptation . alike , Calculate by using the method of subgraph ( The diagram is represented by pooling obtain ): L g r a p h ( ω ; S G ) = ∑ − l o g ( σ ( h S G c T h G ) ) − l n ( σ ( h S G c T h G ′ ) ) L^{graph}(\omega;S_G)=\sum -log(\sigma(h^T_{S^c_G}h_G))-ln(\sigma(h^T_{S^c_G}h_G')) Lgraph(ω;SG)=∑−log(σ(hSGcThG))−ln(σ(hSGcThG′)) Then the graph level parameters are updated : ω ′ = ω − β ∂ L g r a p h ( ω ; S G ) ∂ ω \omega'=\omega-\beta \frac{\partial L^{graph}(\omega;S_G)}{\partial \omega} ω′=ω−β∂ω∂Lgraph(ω;SG)
- Optimization of prior knowledge . After node level and graph level adaptation , Global prior knowledge has been adapted θ \theta θ For task specific knowledge θ ′ = { ψ ′ , ω ′ } \theta'=\{\psi',\omega'\} θ′={ ψ′,ω′}. Then use it to optimize the back propagation θ \theta θ:
θ ← θ − γ ∂ ∑ L ( θ ′ ; Q G ) ∂ θ \theta \leftarrow \theta-\gamma \frac{\partial \sum L(\theta';Q_G)}{\partial \theta} θ←θ−γ∂θ∂∑L(θ′;QG) L ( θ ′ ; Q G ) = 1 k ∑ L n o d e ( ψ ; S G c ) + L g r a p h ( ω ; S G ) L(\theta';Q_G)=\frac{1}{k}\sum L^{node}(\psi;S^c_G)+L^{graph}(\omega;S_G) L(θ′;QG)=k1∑Lnode(ψ;SGc)+Lgraph(ω;SG)
paper:https://yuanfulu.github.io/publication/AAAI-L2PGNN.pdf
code:https://github.com/rootlu/L2P-GNN

Adaptive Transfer Learning on GNN
come from KDD2021. The traditional pre training scheme does not design downstream adaptive learning , It is impossible to achieve consistency between upstream and downstream . Therefore, the author designs a weight model with the help of meta learning adaptive auxilizry loss weighting model To control the upstream self-supervised Tasks and downstream target task Consistency between .
- traditional method . Self supervised task learning on a large amount of unlabeled data + Use the node representation learned by the self supervised task to assist the learning of the target task .
- The author's transfer Method . use joint loss To fine tune the parameters , This will adaptively preserve pre-training Effective information of the stage , That is, by calculating the cosine similarity between the auxiliary task and the target task gradient similarity To learn Adaptive Auxiliary Loss Weighting, To quantify the consistency between the auxiliary task and the target task .
paper:https://arxiv.org/abs/2107.08765
边栏推荐
- 【CTR】《Towards Universal Sequence Representation Learning for Recommender Systems》 (KDD‘22)
- [comparative learning] understanding the behavior of contractual loss (CVPR '21)
- selenium使用———安装、测试
- 基于TCP/IP在同一局域网下的数据传输
- JVM performance tuning methods
- 【GCN-RS】Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for RS (SIGIR‘22)
- session和cookie有什么区别??小白来告诉你
- php 一台服务器传图片到另一台上 curl post file_get_contents保存图片
- Application and innovation of low code technology in logistics management
- Brpc source code analysis (VIII) -- detailed explanation of the basic class eventdispatcher
猜你喜欢
![[USB device design] - composite device, dual hid high-speed (64BYTE and 1024byte)](/img/ce/534834c53c72a53fd62ff72a1d3b39.png)
[USB device design] - composite device, dual hid high-speed (64BYTE and 1024byte)

Start with the development of wechat official account

油猴脚本链接

【GCN】《Adaptive Propagation Graph Convolutional Network》(TNNLS 2020)
![[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22](/img/02/5f24b4af44f2f9933ce0f031d69a19.png)
[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22

Intelligent information retrieval(智能信息检索综述)

Zero-Shot Image Retrieval(零样本跨模态检索)

Brpc source code analysis (VII) -- worker bthread scheduling based on parkinglot

toString()与new String()用法区别

【AI4Code】《Unified Pre-training for Program Understanding and Generation》 NAACL 2021
随机推荐
JS operator
【高并发】SimpleDateFormat类到底为啥不是线程安全的?(附六种解决方案,建议收藏)
Differences in usage between tostring() and new string()
Brpc source code analysis (I) -- the main process of RPC service addition and server startup
[untitled]
Innovation and breakthrough! AsiaInfo technology helped a province of China Mobile complete the independent and controllable transformation of its core accounting database
Qin long, a technical expert of Alibaba cloud: a prerequisite for reliability assurance - how to carry out chaos engineering on the cloud
Web APIs (get element event basic operation element)
JS process control
浅谈低代码技术在物流管理中的应用与创新
The applet image cannot display Base64 pictures. The solution is valid
GPT plus money (OpenAI CLIP,DALL-E)
【Debias】Model-Agnostic Counterfactual Reasoning for Eliminating Popularity Bias in RS(KDD‘21)
【GCN多模态RS】《Pre-training Representations of Multi-modal Multi-query E-commerce Search》 KDD 2022
【云驻共创】AI在数学界有哪些作用?未来对数学界会有哪些颠覆性影响?
'C:\xampp\php\ext\php_zip.dll' - %1 不是有效的 Win32 应用程序 解决
brpc源码解析(六)—— 基础类socket详解
OSPF综合实验
JS中的函数
dirReader. Readentries compatibility issues. Exception error domexception