当前位置：网站首页>Validutil, "Rethinking the setting of semi supervised learning on graphs"

Validutil, "Rethinking the setting of semi supervised learning on graphs"

2022-07-07 21:26:00 【Learner-】

Paper information

Paper title ：Rethinking the Setting of Semi-supervised Learning on Graphs
Author of the paper ：Ziang Li, Ming Ding, Weikai Li, Zihan Wang, Ziyu Zeng, Yukuo Cen, Jie Tang
Source of the paper ：2022, arXiv
Address of thesis ：download
Paper code ：download

1 Introduction

　　 This paper mainly studies semi supervised GNNs Overshoot in the model （over-tuning phenomenon）, A fair model comparison framework is proposed .

2 The Risk of Over-tuning of Semi-supervised Learning on Graphs

2.1 Semi-Supervised Learning on Graphs

　　 Three commonly used data sets ：

2.2 An Analysis of Over-tuning in Current GNNs

　　 Overshoot （over-tuning phenomenon） ubiquitous GNNs in , namely GNN The hyperparametric over fitting validation set of the model .

　　 This paper tests 5 It's representative of GNNs frame （GCN, GAT, APPNP, GDC-GCN, ADC） stay Cora Comparison of accuracy rates of different verification set sizes on the data set . In this paper, grid search is used to select the optimal hyperparameter for each model . Change the size of the validation set from 100 To 500. For each validation set , After using the super parameter training model of the best search , Report the results on the test set . The result is as follows Figure1 Shown .

　　Figure 1 Show ,GNN Models with larger validation sets usually perform better . Because the validation set can only affect the model through hyperparameters , So we can come to a conclusion , The model can benefit from validation tags by using super parameters . If we change the size of the validation set from 100 Add to 500, The accuracy is improved as high as 1%∼3% , This is enough to show that over tuning already exists .

2.3 ValidUtil: Exploring the Limits of Over-tuning

　　 It is usually not possible to add validation sets to training sets , This is considered a data leak . What this article puts forward ValidUtil Such as ：

　　 Results found ： Only when $\hat{y}_{i}=y_{i}$ when , The model can achieve the best results .Figure 2 Shows hyper-parameters The impact on the experiment ：

　　 We found that , Even from ValidUtil There are only 20 individual ∼60 A super parameter , It can also bring a leap in performance to some models . When we verify all in the set 500 When adding super parameters to nodes ,PPNP Comparable Table 2 Medium SOTA Method .

　　 remarks ： although ValidUtil Work purely by using validation tags , But it is fully effective under the current setting . If we were to GNN+ValidUtil As a black box model , Then the training process is quite normal .ValidUtil In fact, the efficiency of using labels is very low , Because each super parameter can only learn the information of one node —— But this is enough to test our hypothesis . The current setting cannot prevent the validation tag from being used during hyper parameter tuning “ leak ”. We think there are some more effective ways to define influential hyperparameters . These hyperparameters may be entangled with features or model structures , They can get information from multiple authentication tags . according to Figure 1, Such influential hyperparameters may already exist in some models , Not easy to detect . therefore , There is an urgent need to build a new semi supervised learning benchmark on graphs , To avoid over tuning and fairness 、 Compare steadily GNN Model .

3 IGB: An Independent and Identically Distributed Graph Benchmark

3.1 Overview

　　 Two goals of the new benchmark ： Avoid over tuning and be more robust .

　　 To avoid over tuning , In this paper, nodes are divided into labeled nodes and unlabeled nodes . You can learn the best model of labeled data sets in any way , And evaluate unlabeled sets （ Test set ） Performance on . If you need to search for super parameters , A part of the marked data set can be regarded as a verification set . Because the verification label has been exposed , Therefore, the problem of over tuning is eliminated . This setting is closer to the real scene , Be able to make a fair comparison between models with different superparameters . In order to easily GNN Migrate to this new setting , We will be in the 3.2 Section introduces a simple and powerful way to create validation sets .

　　 This paper expects that under different random seeds , The performance of the model is stable . A common method of reporting performance in machine learning is ： Repeat testing and reporting average performance . So this article expects to be in multiple i.i.d Test the performance of the model on the graph , These are many i.i.d chart It's from sampling .

　　 To solve the tuning problem , We only divide the nodes in the graph into two groups , Marked and unmarked respectively ( stay IGB The default scale in is 1：4) data . The model can use the tag set to train the best model in any way , And evaluate whether it is unmarked （ test ） Performance on set . The recommended methods are as follows ：

1. Divide the labeled set into training and validation sets.
2. Find the best hyper-parameters using grid search on the training and validation sets from the first step.
3. Train the model with the best hyper-parameters on the full labeled nodes.
4. Test the performance of the model from the third step on the unlabeled (test) sets.
5. Repeat the above steps on each graph in a dataset and report the average accuracy

　　 The first two steps aim to find GNN The best hyperparameter of the model . We think this method is applicable to many GNN The model obtains satisfactory hyperparameters . If there are other reasonable methods to determine the optimal hyperparameter with a marked set , They will also be encouraged to replace the first two steps in this pipeline . In this way , You can avoid over tuning by directly exposing all tag information in the validation set in the third step .

3.3 Datasets

　　IGB It consists of four data sets ：Aminer,Facebook,Nell,Flickr. Each dataset contains 100 An undirected connected graph , According to the first 3.4 The random walk method in the section samples from the original large image . We also report the average node overlap rate of a pair of sampled graphs , That is, the ratio of public nodes to the total size of nodes . Coverage is defined as 100 The ratio of the union of the sampled graph and the original large graph . Low overlap and high coverage are preferred . The statistical data of the data set is shown in table 3.

3.4 Sampling Algorithm

　　 The simplest way to make the node label distribution of a subgraph similar to the original graph is vertex sampling . However , It doesn't meet our expectations , Because it generates disconnected subgraphs . To get close i.i.d. Benchmark of subgraph , We must carefully design sampling strategies and principles . say concretely , We expect the sampling strategy to have the following characteristics ：

1. The sampled subgraph is a connected graph.
2. The distribution of the subgraph’s node labels is close to that of the original graph.
3. The distribution of the subgraph’s edge categories (edge category is defined by the combination of its two endpoints’ labels) is close to that of the original graph.

　　 First , Random walk well satisfies the first point , We start from the node $u=n_0$ Start sampling , The following nodes can be selected through the possibility of conversion ：

　　　　$P_{u, v}=\left\{\begin{array}{ll}\frac{1}{d_{u}}, & \text { if }(u, v) \in E \\0, & \text { otherwise }\end{array}\right.$

　　 among ,$P_{u, v}$ It's from $u$ To $v$ The probability of transfer ,$d_{u}$ Represents the node $u$ Degree .

　　 We refuse to adopt a similar sampling method , To ensure the second and third properties . ad locum , We introduced KL The divergence As a measure to measure the difference between two different distributions . In order to get the node label distribution (“Node KL”) And marginal category distribution (“Edge KL”)KL Subgraphs with relatively low divergence , We set a predefined threshold to decide whether to accept the sampling map . The results before and after adding the threshold are compared as Table 4 Shown .

3.5 Benchmarking Results

　　 Results on these four datasets ：

3.6 The Stability of IGB

　　 This paper uses two methods to verify IGB The stability of . First , Verify its stability when evaluating the model on different graphs , Because of every IGB Data set containing 100 Close i.i.d. chart . say concretely , We compared 100 individual AMiner Subgraphs and 100 individual Cora Precision variance in random data segmentation .Figure 3 It turns out that , Even if each AMiner The graph uses random data segmentation , Yes IGB Is also better than Cora More stable style .

　　 secondly , Focus on IGB Stability in evaluating models with different random seeds . In a stable benchmark , The ranking of different models should not be easily changed when changing random seeds . To test this , We use ranked “inversion number” As a measure .

4 Conclusion

　　 In this paper, we discuss the semi supervised setting on graph again , It also expounds the problem of over optimization , And through VallidUtil Verified his significance . This paper also proposes a new benchmark IGB, A more stable evaluation pipeline . A method based on RW Sampling algorithm to improve the stability of evaluation , This paper hopes to pass IGB It can benefit the society .