Generative Graph Models
第八章传统的图生成方法>
The previous parts of this book introduced a wide variety of methods for learning representations of graphs. In this final part of the book, we will discuss a distinct but closely related task: the problem of > graph generation.
本书前面的部分介绍了学习图形表示的各种方法。在本书的最后部分,我们将讨论一个截然不同但密切相关的任务:图形生成问题。
The goal of graph generation is to build models that can generate realistic(逼真的) graph structures. In some ways, we can view this graph generation problem as the mirror image of the graph embedding problem. Instead of assuming that we are given a graph structure G = (V,E) as input to our model, in graph generation we want the output of our model to be a graph G. Of course, simply generating an arbitrary graph is not necessarily that challenging. For instance, it is trivial to generate a fully connected graph or a graph with no edges. The key challenge in graph generation, however, is generating graphs that have certain desirable properties. As we will see in the following chapters, the way in which we define “desirable properties”—and how we perform graph generation—varies drastically between different approaches.
图形生成的目标是构建能够生成逼真图形结构的模型。在某种程度上,我们可以把这个图生成问题看作是图嵌入问题的镜像。在图生成过程中,我们希望模型的输出是图G,而不是假设我们得到一个图结构G=(V,E)作为我们模型的输入。当然,简单地生成一个任意图并不一定具有那么大的挑战性。例如,生成一个完全连通的图或一个没有边的图是微不足道的。然而,图形生成中的关键挑战是生成具有某些所需属性的图形。正如我们将在接下来的章节中看到的,我们定义“理想属性”的方式--以及我们如何执行图形生成--在不同的方法之间有很大的不同
In this chapter, we begin with a discussion of traditional approaches to graph generation. These tradiational approaches pre-date most research on graph representation learning—and even machine learning research in general. The methods we will discuss in this chapter thus provide the backdrop to motivate the deep learning-based approaches that we will introduce in Chapter 9.
在本章中,我们首先讨论传统的图形生成方法。这些传统的方法早于大多数关于图形表示学习的研究,甚至比一般的机器学习研究更早。因此,我们将在本章中讨论的方法为激励我们将在第9章介绍的基于深度学习的方法提供了背景。
8.1 Overview of Traditional Approaches 传统方法概述
Traditional approaches to graph generation generally involve specifying some kind of generative process, which defines how the edges in a graph are created. In most cases we can frame this generative process as a way of specifying the probability or likelihood P(A[u, v] = 1) of an edge existing between two nodes u and v. The challenge is crafting some sort of generative process that is both tractable and also able to generate graphs with non-trivial properties or acteristics. Tractability is essential because we want to be able to sample or analyze the graphs that are generated. However, we also want these graphs to have some properties that make them good models for the kinds of graphs we see in the real world.
传统的图形生成方法通常涉及指定某种生成过程,该过程定义了图形中的边是如何创建的。在大多数情况下,我们可以将这种生成过程框定为指定存在于两个节点u和v之间的边的概率或可能性P(A[u,v]=1)的一种方式。挑战在于设计出一种既易于处理又能够生成具有非平凡性质或特征的图的生成过程。在大多数情况下,我们可以将这种生成过程框定为指定存在于两个节点u和v之间的边的概率或可能性P(A[u,v]=1)。易操纵性是必不可少的,因为我们希望能够对生成的图表进行采样或分析。然而,我们也希望这些图具有一些属性,使它们成为我们在现实世界中看到的各种图的良好模型。
The three approaches we review in this subsection represent a small but representative subset of the traditional graph generation approaches that exist in the literature. For a more thorough survey and discussion, we recommend Newman [2018] as a useful resource.
我们在这一小节中回顾的三种方法代表了文献中存在的传统图形生成方法中的一小部分但具有代表性。要进行更深入的调查和讨论,我们推荐纽曼[2018]作为一个有用的资源。
8.2 ER模型
Perhaps the simplest and most well-known generative model of graphs is the ER model . In this model we define the likelihood of an edge occurring between any pair of nodes as
也许最简单和最著名的图生成模型是ER模型。在此模型中,我们将任意节点对之间出现边的可能性定义为
where r ∈ [0,1] is parameter controlling the density of the graph. In other words, the ER model simply assumes that the probability of an edge occurring between any pairs of nodes is equal to r.
其中r∈[0,1]是控制图形密度的参数。换句话说,ER模型简单地假设在任何节点对之间出现边的概率等于r。
The ER model is attractive due to its simplicity. To generate a random ER graph, we simply choose (or sample) how many nodes we want, set the density parameter r, and then use Equation (8.1) to generate the adjacency matrix. Since the edge probabilities are all independent, the time complexity to generate a graph is O(|V|2), i.e., linear in the size of the adjacency matrix.
ER模型由于其简单性而颇具吸引力。要生成随机ER图,我们只需选择(或采样)需要多少个节点,设置密度参数r,然后使用等式(8.1)生成邻接矩阵。由于边的概率都是独立的,所以生成一个图的时间复杂度是O(|V|2),即邻接矩阵的大小是线性的。
The downside of the ER model, however, is that it does not generate very realistic graphs. In particular, the only property that we can control in the ER model is the density of the graph, since the parameter r is equal (in expectation) to the average degree in the graph. Other graph properties—such as the degree distribution, existence of community structures, node clustering coefficients, and the occurrence of structural motifs—are not captured by the ER model. It is well known that graphs generated by the ER model fail to reflect the distribution of these more complex graph properties, which are known to be important in the structure and function of real-world graphs.
然而,ER模型的缺点是它不能生成非常逼真的图形。特别地,在ER模型中我们可以控制的唯一属性是图的密度,因为参数r等于(在期望中)图中的平均度。ER模型没有捕获其他图属性,例如度分布、社区结构的存在、节点聚类系数和结构基元的出现。众所周知,ER模型生成的图不能反映这些更复杂的图属性的分布,而这些属性在现实世界图的结构和功能中是非常重要的。
8.3 Stochastic Block Models 随机块图
Many traditional graph generation approaches seek to improve the ER model by better capturing additional properties of real-world graphs, which the ER model ignores. One prominent example is the class of stochastic block models (SBMs), which seek to generate graphs with community structure.
许多传统的图形生成方法试图通过更好地捕捉真实图形的附加属性来改进ER模型,而ER模型忽略了这一点。一个突出的例子是随机区块模型(SBM),它寻求生成具有社区结构的图。
In a basic SBM model, we specify a number γ of different blocks: C1, ...,Cγ. Every node u ∈ V then has a probability piof belonging to block i, i.e. pi= P(u ∈ Ci),∀u ∈ V, i = 1, ..., γ wherePγ i=1pi= 1. Edge probabilities are then specified by a block-to-block probability matrix P ∈ [0,1]γ×γ, where C[i, j] gives the probability of an edge occuring between a node in block Ciand a node in block Cj. The generative process for the basic SBM model is as follows:
在基本的SBM模型中,我们指定不同块的数量γ:C1,...,Cγ。然后,每个节点u∈V具有属于块i的概率pof,即pi=P(u∈ci),∀u∈V,i=1,...,γ,其中Pγi=1pi=1。然后,由块到块概率矩阵P∈[0,1]γ×γ指定边概率,其中C[i,j]给出块Ci中的节点和块Cj中的节点之间出现边的概率。基本SBM模型的生成过程如下: