当前位置：网站首页>GCC: Graph Contrastive Coding for Graph Neural NetworkPre-Training

GCC: Graph Contrastive Coding for Graph Neural NetworkPre-Training

2022-07-02 19:52:00 【yihanyifan】

Abstract

Figure shows that learning has become a powerful technology to solve real problems . Various downstream graph learning tasks benefit from their recent development , Such as node classification 、 Similarity search and graph classification . However , The existing graph shows that learning technology focuses on domain specific problems , And train a special model for each graph data set , This is usually not transferable to extraterritorial data . Inspired by the latest advances in natural language processing and computer vision in pre training , We designed a graph comparison code (GCC)1 - A self supervised graph neural network pre training framework , To capture common network topology attributes across multiple networks . We will GCC The pre training task of is designed as the instance discrimination of subgraphs in the network , By using contrast learning, graph neural network can learn intrinsic and transferable structural representation . We have conducted extensive experiments on three graph learning tasks and ten graph datasets . It turns out that ,GCC Pre training on different data sets can achieve greater competitiveness or better performance with task specific and trained peers . This shows that pre training and fine-tuning paradigms provide great potential for graph representation learning .

INTRODUCTION

In this paper, we propose Graph Contrastive Coding（GCC） Framework to learn structural representation across graphics . contrary , Using the idea of contrastive learning, we design a graph pre training task as an example recognition . Its basic idea is to extract instance, Think of them as a different kind of themselves , And learn to code and distinguish these examples .GCC Three questions need to be answered , So that it can learn the transferable structural patterns：(1) what are the instances? (2) what are the discrimination rules? (3) how to encode the instances?

The pre training task of this paper ： Subgraph instance discrimination （subgraph instance discrimination）. For each vertex , Put it r-ego networks As an example of a pattern drawing ,GCC The purpose of is to distinguish subgraphs sampled from specific vertices from subgraphs sampled from other vertices .

Main contributions ：

We will Across multiple graphs GNN Preliminary training The problem is formulated , And identify the main challenges （ difficulty ）.
We designed the pre training task as Graph representation learning , To capture common and transferable structural features from multiple input diagrams .
We have put forward GCC Frame to learn structural graph representations, The framework is based on contrastive learning To guide the training .
We did a lot of experiments , To prove that for cross cutting tasks ,GCC Can provide with Models for specific task areas Approximate or higher performance .

RELATED WORK

It mainly includes three parts ：vertex similarity, contrastive learning , graph pre-training

Vertex Similarity

Include Neighborhood similarity,Structural similarity,Attribute similarity

Graph Pre-Training

Skip-gram based model

Some are affected by word2vec Inspired graph embedding Method , for example LINE、DeepWalk、node2vec、 metapath2vec, Mostly based on similar characteristics of nodes , And you can't use problems other than samples . And what this paper proposes GCC The model is based on structural similarity , And can be transferred on the map outside the training .

Pre-training graph neural networks

Before this article , There are also some pre training articles for graphs , These articles either use the attributes of nodes or edges on the graph , Or define the training task . The model of this paper ：1） Do not use icon labels ;2） There are no characteristic tasks to learn .

GRAPH CONTRASTIVE CODING (GCC)

Figure 2 It shows GCC Of pre-training and fine-tuning Overview of the phase .

The GNN Pre-Training Problem

Given a set of graphs from different fields , Our goal is to pre train GNN Model , With self-supervised Capture the structural features in these diagrams . The model should be able to benefit downstream tasks on different data sets . The basic assumption is , Different graph There is common and transferable structural patterns, For example, theme , This is obvious in the literature of network science . An easy to understand scenario , We are Facebook、IMDB and DBLP One is pre trained on the picture GNN Model , And carry on self-supervised Study , And then apply it to US-Airport network The node classification task of the graph , Pictured 2 Shown .

Formally ,GNN The pre training problem is to learn the function of mapping vertices to low dimensional eigenvectors f, And f It has the following two characteristics ：

First ,structural similarity, Nodes with similar topological structure map similarly in vector space .
secondly ,transferability, It is compatible with vertices and shapes that are not visible during pre training .

GCC Pre-Training

Given a set of graphs , Our goal is to pre train a universal graph neural network encoder , To capture the structural patterns behind these diagrams . To achieve this , We need to design appropriate self-monitoring tasks and learning objectives for graph structured data .

We suggest using subgraph instance discrimination as a pre training task , use InfoNCE[35] As a learning goal . The pre training task regards each instance of the subgraph as a different kind , And learn to distinguish these examples . Its commitment is , It can output a representation that captures the similarity between these subgraph instances .

From the perspective of dictionary search , Given a query q And one containing K+1 individual keys {k $_{0}$ ,⋯,k $_{k}$ } Dictionary , Contrast learning to find q Matching keys in the dictionary k+ . This paper adopts InfoNCE ：

among ,τ Is the temperature super parameter .fq and f $_{k}$ Are the two GNN encoder , take query x $^{q}$ and keys x $^{k}$ Encoded as d Wei said , use q=fq(xq) and k=fk(xk) Express .

To be in GCC Instantiate each component in , We need to answer the following three questions :

Q1: How to define subgraph instances in graphs?
Q2: How to define (dis) similar instance pairs in and across graphs, i.e., for a query Xq, which key Xk is the matched one?
Q3: What are the proper graph encoders fq and fk ?

Q1: Design (subgraph) instances in graphs.

The example in the figure is not clearly defined . Besides , The focus of our pre training is purely structured representation , No additional input features / attribute . This makes natural selection of a single vertex as an instance infeasible , Because it is not suitable for distinguishing two vertices .

To solve this problem , We suggest using subgraphs as comparison examples , Extend each single vertex to its local structure . say concretely , For a vertex v, We define an instance as its r-ego The Internet :

Definition 3.1. A r-ego network. For a vertex v, Its r- Neighbors are defined as Sv = {u: d(u,v)≤r}, among d(u,v) The picture is g in u To v The shortest path distance between , The vertices v Of r-ego The Internet , Write it down as Gv, By Sv Induced subgraph .

chart 3 The left side of shows 2-ego Two examples of Networks .GCC Each one r-ego The Internet is regarded as a different category of itself , And encourage the model to distinguish between similar instances and different instances .

Q2: Define (dis)similar instances.

stay GCC in , We will be the same r-ego Two random data augmentations of the network are taken as a similar example pair , Data augmentation is defined as graph sampling [27]. Graph sampling is a technique to obtain representative subgraph samples from the original graph . Suppose we want to expand vertices v Of r-ego The Internet (Gv), GCC The graph sampling of follows three steps : Restart random walk (RWR)[50]、 Subgraph induction and anonymization

Random walk with restart. We start from the top of ourselves v Start in G Random walk on , Walk iteratively moves to its neighborhood with a probability proportional to the edge weight . Besides , At each step , Steps return to the starting vertex v The probability is positive .
Subgraph induction. Random walk with restart collection v A set of vertices around , use eSv Express . from eSv Induced subgraph eGv It is considered as self network Gv Extended version of . This step is also called induced subgraph random walk sampling (ISRW).
Anonymization. We will sample the graph eGv Anonymity , The method is to re mark its vertices as {1,2,···,|eSv |}, In any order

We repeat the above process twice , To create two data extensions , They form a similar instance pair (xq, xk+). If two subgraphs from different r-ego The network is expanded , We regard them as having k It's not equal to k+ Different instances of (xq, xk). It is worth noting that , All the above diagram operations — Restart random walk 、 Subgraph induction and anonymization — Can be in DGL The package uses .

Q3: Define graph encoders.

Given two mining drawings xq and xk, GCC Through two neural network encoders fq and fk Code it . Technically speaking , Any graphical neural network can be used here [4] As an encoder , also GCC The model is not sensitive to different choices . in application , We use the latest graph neural network model —— Graph isomorphic networks (GIN)[59] As our graph encoder . Think about it , Before training, we focus on structural representation , While most of the GNN The model requires vertex features / Property as input . To close the gap , We propose to initialize vertex features by using the graph structure of each mining graph . In particular , We define generalized position embedding as follows :

Definition 3.2. Generalized positional embedding

The above facts show that , The position embedding in the sequence model can be regarded as the Laplacian eigenvector of the path graph . This inspires us to generalize location embedding from path graph to arbitrary graph . The reason for using the normalized graph Laplace graph instead of the non normalized graph is that the path graph is a regular graph （ Namely constancy ）, In the real world, the graph is usually irregular , There is a skew degree distribution . In addition to generalized positional embedding , We also added vertex degrees [59] The unique hot coding and the binary index of self vertex are used as vertex features . After coding by the graph encoder , The final dd Dimension output vectors and then use their L2-Norm Normalize .

A running example

We are in the picture 3 It shows a GCC Running example of pre training . For the sake of simplicity , We set the dictionary size to 3.GCC First, from the diagram 3 One on the left 2-ego Add two subgraphs randomly in the network xq and xk0. meanwhile , The other two subgraphs xk1 and xk2 It is generated by the noise distribution —— In this case , They are made up of figures 3 Another one on the left panel 2-ego The network expands randomly . Then two graphic encoders fq and fk Map the query and the three keys to a low dimensional vector q and {k0, k1, k2}. Last , The formula 1 The contrast loss in promotes model recognition (xq, xk0) For similar instance pairs , And compare them with different examples ( namely {xk1, xk2}) Distinguish .

GCC Fine-Tuning

The downstream tasks in graph learning can be divided into graph level and node level , The goal is to predict the labels of graph level or nodes . For graph level tasks , The input graph itself can be represented by GCC Encoding , Express in implementation . For node level tasks , The node representation can be encoded by its r-ego The Internet ( Or from its r-ego Network enhanced subgraph ) To define .

Freezing vs. full fine-tuning

　　GCC Two fine tuning strategies are provided for downstream tasks ——freezing mode and fine-tuning mode.

stay freezing mode Next , Freeze the pre trained map encoder fqfq Parameters of , And use it as a static feature extractor , Then the classifier for specific downstream tasks is trained on the extracted features .
stay fine-tuning mode Next , Map encoder to be initialized with pre trained parameters fqfq Train downstream tasks end-to-end with the classifier .

GCC as a local algorithm

　　GCC As a graph algorithm , It belongs to the local algorithm category , because GCC Based on random walk （ On a large scale ） Network graph sampling method to explore the local structure , therefore GCC Only the input is involved （ On a large scale ） Local exploration of the network . This characteristic makes GCC It can be extended to large-scale graphic learning tasks , And it is very friendly to distributed computing settings .

原网站

版权声明
本文为[yihanyifan]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207021845060616.html