当前位置:网站首页>DGL Chapter 1 (official tutorial) personal notes
DGL Chapter 1 (official tutorial) personal notes
2022-07-28 17:11:00 【Name filling】
DGL The library is very friendly Chinese tutorial address It's right here , Here is basically pasted from there , Count as personal notes .
Chapter one chart
DGL The core data structure of DGLGraph Provides a graph centric programming abstraction . DGLGraph Interfaces are provided to handle the structure of the diagram 、 node / edge Characteristics of , And the calculations that can be performed using these components .
1.1 The basic concept of graph
Understand basic concepts , chart 、 The representation of the figure 、 Weighted graph and unweighted graph 、 Isomorphic and heterogeneous graphs 、 Multiple pictures
1.2 chart 、 Nodes and edges
DGL Use a unique integer to represent the node , namely spot ID; The corresponding two endpoints ID To represent an edge . Each edge has edge ID.DGL The middle side has direction , That is the edge ( u , v ) (u,v) (u,v) Representation node u u u Point to the node v v v.
For multiple nodes ,DGL Use a one-dimensional shaping tensor ( Such as ,PyTorch Of Tensor class ) Keep the points of the graph ID,DGL be called ” Node tensor ”. For many edges ,DGL Use a include 2 Tuples of node tensors ( U , V ) (U,V) (U,V) , among , use ( U [ i ] , V [ i ] ) (U[i],V[i]) (U[i],V[i]) Refers to a U [ i ] U[i] U[i] To V [ i ] V[i] V[i] The edge of .
Create a DGLGraph One way to object is to use dgl.graph() function . It takes a set of edges as input .DGL It also supports creating graph objects from other data sources .

The following code snippet uses dgl.graph() Function to build a DGLGraph object , Corresponding to the inclusion shown in the figure below 4 Graph of nodes . Some of the code demonstrates parts of the query graph structure API How to use .
import dgl
import torch as th
# edge 0->1, 0->2, 0->3, 1->3
u,v = th.tensor([0,0,0,1]), th.tensor([1,2,3,3])
g = dgl.graph((u,v))
print(g)
Using backend: pytorch
Graph(num_nodes=4, num_edges=4,
ndata_schemes={}
edata_schemes={})
# Access to the node
print(g.nodes())
tensor([0, 1, 2, 3])
# Get the point corresponding to the edge
print(g.edges())
(tensor([0, 0, 0, 1]), tensor([1, 2, 3, 3]))
# Get the corresponding endpoint and edge of the edge ID
print(g.edges(form='all'))
(tensor([0, 0, 0, 1]), tensor([1, 2, 3, 3]), tensor([0, 1, 2, 3]))
# If there is a maximum ID The node has no edges , When creating a diagram , The user needs to clearly indicate the number of nodes .
g = dgl.graph((u, v), num_nodes=8)
For undirected graphs , You need to create edges in both directions for each edge . have access to dgl.to_bidirected() Function to achieve this . As shown in the following code snippet , This function can convert the original graph into a graph with reverse edges .
bg = dgl.to_bidirected(g)
bg.edges()
(tensor([0, 0, 0, 1, 1, 2, 3, 3]), tensor([1, 2, 3, 0, 3, 0, 0, 1]))
DGL It can be used 32 or 64 Bit integer as ID But the type should be consistent . Here are two conversion methods
edges = th.tensor([2, 5, 3]), th.tensor([3, 5, 0]) # edge :2->3, 5->5, 3->0
g64 = dgl.graph(edges) # DGL By default int64
print(g64.idtype)
torch.int64
g32 = dgl.graph(edges, idtype=th.int32) # Use int32 Build a diagram
g32.idtype
torch.int32
g64_2 = g32.long() # convert to int64
g64_2.idtype
torch.int64
g32_2 = g64.int() # convert to int32
g32_2.idtype
torch.int32
1.3 Characteristics of nodes and edges
DGLGraph The nodes and edges of an object can have Multiple user-defined 、 Nameable features , To store the properties of the nodes and edges of the graph .
adopt ndata and edata Interfaces can access these features .
for example , The following code creates 2 A node feature ( In the first 8、15 The row is named 'x' 、 'y' ) and 1 Edge features ( In the 9 The row is named 'x' ).
import dgl
import torch as th
g = dgl.graph((th.tensor([0,0,1,5]), th.tensor([1,2,2,0]))) # 6 Nodes , Four sides
# g = dgl.graph(([0, 0, 1, 5], [1, 2, 2, 0]))
g
Graph(num_nodes=6, num_edges=4,
ndata_schemes={}
edata_schemes={})
g.ndata['x'] = th.ones(g.num_nodes(), 3) # The length is 3 The node characteristics of
g.edata['x'] = th.ones(g.num_edges(), dtype=th.int32) # # Scalar integer feature
g
Graph(num_nodes=6, num_edges=4,
ndata_schemes={'x': Scheme(shape=(3,), dtype=torch.float32)}
edata_schemes={'x': Scheme(shape=(), dtype=torch.int32)})
# Features with different names can have different shapes
g.ndata['y'] = th.randn(g.num_nodes(), 5) # x, y Two characteristics
g.ndata['x'][1] # Access to the node 1 Characteristics of
tensor([1., 1., 1.])
g.edata['x'][th.tensor([0, 3])] # Get edge 0 and 3 Characteristics of
tensor([1, 1], dtype=torch.int32)
About ndata and edata Important description of the interface :
- Only numeric types are allowed ( Such as single precision floating point 、 Double precision floating point and integer ) Characteristics of . These features can be scalars 、 Vector or multidimensional tensor .
- Each node feature has a unique name , Each edge feature also has a unique name . Features of nodes and edges can have the same name ( As in the above example code
'x') - When creating features through tensor assignment ,DGL Will assign features to Every node and every edge . The first dimension of the tensor must be consistent with the number of nodes or edges in the graph . You cannot assign features to a subset of nodes or edges in a graph .
- Features with the same name must have the same dimension and data type .
- The characteristic tensor uses ” Line first ” Principles , That is, each row of slices is stored 1 A node or 1 The characteristics of the edge ( Refer to section 16 and 18 That's ok ).
For weighted graphs , You can store weights as an edge feature , as follows .
# edge 0->1, 0->2, 0->3, 1->3
edges = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
weights = th.tensor([0.1, 0.6, 0.9, 0.7]) # The weight of each edge
g = dgl.graph(edges)
g.edata['w'] = weights # I'm going to call it 'w'
g
Graph(num_nodes=4, num_edges=4,
ndata_schemes={}
edata_schemes={'w': Scheme(shape=(), dtype=torch.float32)})
edges
(tensor([0, 0, 0, 1]), tensor([1, 2, 3, 3]))
1.4 Create a diagram from an external source
You can construct a from an external source DGLGraph object , Include :
- From the outside for graphs and sparse matrices Python library (NetworkX and SciPy) Created from .
- Load graph data from disk .
This section does not cover functions that generate graphs by converting other graphs , Please read the relevant overview API Reference manual .
Create a diagram from an external library
import dgl
import torch as th
import scipy.sparse as sp
spmat = sp.rand(100, 100, density=0.05) # 5% Nonzero term 100*100 0.05 Nonzero term
dgl.from_scipy(spmat) # come from SciPy
Graph(num_nodes=100, num_edges=500,
ndata_schemes={}
edata_schemes={})
import networkx as nx
nx_g = nx.path_graph(5) # One link 0-1-2-3-4
dgl.from_networkx(nx_g) # come from NetworkX
Graph(num_nodes=5, num_edges=8,
ndata_schemes={}
edata_schemes={})
Be careful , When using nx.path_graph(5) When creating , DGLGraph Objects have 8 side , Instead of 4 strip . This is because nx.path_graph(5) Constructed an undirected NetworkX chart networkx.Graph , and DGLGraph The edges of are always directed . So when the undirected NetworkX Turn the picture into DGLGraph Object time ,DGL Will internally 1 An undirected edge is converted to 2 The strip has a directed edge . Use directed NetworkX chart networkx.DiGraph This behavior can be avoided .
nxg = nx.DiGraph([(2, 1), (1, 2), (2, 3), (0, 0)])
dgl.from_networkx(nxg)
Graph(num_nodes=4, num_edges=4,
ndata_schemes={}
edata_schemes={})
Load from disk
Comma separated values (CSV)
CSV Is a common format , Store nodes in tabular format 、 Edges and their characteristics :
nodes.csv
| age, title |
|---|
| 43, 1 |
| 23, 3 |
| … |
edges.csv
| src, dst, weight |
|---|
| 0, 1, 0.4 |
| 0, 3, 0.9 |
| … |
Many famous Python library ( Such as Pandas) You can load this type of data into python object ( Such as numpy.ndarray) in , Then use these objects to build DGLGraph object . If the back-end framework also provides tools to save or load tensors from disk ( Such as torch.save(), torch.load()), You can follow the same principle to build diagrams .
JSON/GML Format
If you don't pay much attention to speed , Readers can use NetworkX Provide tools to parse Various data formats , DGL You can create diagrams indirectly from these sources .
DGL Binary format
DGL Provides API To load from disk or save binary format to disk . In addition to the diagram structure ,API It can also process feature data and graph level label data . DGL Also support direct from S3/HDFS Load or add S3/HDFS Save map . The reference manual provides more details of this usage .
1.5 Heterogeneous graph
Compared with isomorphic graphs , Heterogeneous graphs can have different types of nodes and edges . These different types of nodes and edges have independent properties ID Space and features . For example, in the figure below ,” user ” and ” game ” Node ID from 0 At the beginning , And the two nodes have different characteristics .
A heterogeneous graph example . The graph has two types of nodes (“ user ” and ” game ”) And two types of edges (“ Focus on ” and ” play ”).
stay DGL in , A heterogeneous graph consists of A series of subgraphs constitute , A subgraph corresponds to a relationship . Each relationship consists of a string triplet Definition ( Source node type , Edge type , Target node type ) . Because the relationship definition here eliminates the ambiguity of edge type ,DGL Call them canonical edge types .
The following code is a DGL An example of creating a heterogeneous diagram in .
import dgl
import torch as th
# Create a heterogeneous graph with three node types and three edge types
graph_data = {
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
}
g = dgl.heterograph(graph_data)
print("g.ntypes:", g.ntypes)
print("g.etypes:", g.etypes)
print("g.canonical_etypes:", g.canonical_etypes)
g.ntypes: ['disease', 'drug', 'gene']
g.etypes: ['interacts', 'interacts', 'treats']
g.canonical_etypes: [('drug', 'interacts', 'drug'), ('drug', 'interacts', 'gene'), ('drug', 'treats', 'disease')]
Be careful , Isomorphic graph and bipartite graph are just a special kind of heterogeneous graph , They include only one relationship .
# An isomorphic graph
dgl.heterograph({
('node_type', 'edge_type', 'node_type'): (u, v)})
# A bipartite graph
dgl.heterograph({
('source_type', 'edge_type', 'destination_type'): (u, v)})
Graph(num_nodes={'destination_type': 4, 'source_type': 2},
num_edges={('source_type', 'edge_type', 'destination_type'): 4},
metagraph=[('source_type', 'destination_type', 'edge_type')])
Associated with heterogeneous graphs metagraph Is the pattern of the graph . It specifies Node set and Type constraints on edges between nodes . metagraph One of the nodes in u u u Corresponds to a node type in the related heterogeneous graph . metagraph In the middle (u,v) Indicates that there are from... In the related heterogeneous graph u u u Type node to v v v Edge of type node .
g
Graph(num_nodes={'disease': 3, 'drug': 3, 'gene': 4},
num_edges={('drug', 'interacts', 'drug'): 2, ('drug', 'interacts', 'gene'): 2, ('drug', 'treats', 'disease'): 1},
metagraph=[('drug', 'drug', 'interacts'), ('drug', 'gene', 'interacts'), ('drug', 'disease', 'treats')])
g.metagraph().edges()
OutMultiEdgeDataView([('drug', 'drug'), ('drug', 'gene'), ('drug', 'disease')])
g.metagraph().nodes()
NodeView(('drug', 'gene', 'disease'))
Use multiple types
When multiple node and edge types are introduced , The user is calling DGLGraph API To get a specific type of information , You need to specify specific node and edge types . Besides , Different types of nodes and edges have separate properties ID.
# Get the number of all nodes in the graph
print("g.num_nodes():", g.num_nodes())
# obtain drug Number of nodes
print("g.num_nodes('drug'):", g.num_nodes('drug'))
# Different types of nodes have separate ID. therefore , Without specifying the node type, there is no explicit return value .
# g.nodes()
# DGLError: Node type name must be specified if there are more than one node types.
print("g.nodes('drug'):", g.nodes('drug'))
g.num_nodes(): 10
g.num_nodes('drug'): 3
g.nodes('drug'): tensor([0, 1, 2])
To set up / Gets the characteristics of a specific node and edge type ,DGL Provides two new types of Syntax : g.nodes[‘node_type’].data[‘feat_name’] and g.edges[‘edge_type’].data[‘feat_name’] .
# Set up / obtain "drug" Type of node "hv" features
g.nodes['drug'].data['hv'] = th.ones(3, 1)
g.nodes['drug'].data['hv']
tensor([[1.],
[1.],
[1.]])
# Set up / obtain "treats" Type of edge "he" features
g.edges['treats'].data['he'] = th.zeros(1, 1)
g.edges['treats'].data['he']
tensor([[0.]])
If there is only one node or edge type in the graph , You do not need to specify the type of node or edge .
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'is similar', 'drug'): (th.tensor([0, 1]), th.tensor([2, 3]))
})
g.nodes()
tensor([0, 1, 2, 3])
# Set up / Gets a single type of node or edge feature , You don't have to use new syntax
g.ndata['hv'] = th.ones(4, 1)
g.ndata['hv']
tensor([[1.],
[1.],
[1.],
[1.]])
Load heterogeneous graph from disk
Comma separated values (CSV)
A common way to store heterogeneous graphs is in different ways CSV Different types of nodes and edges are stored in the file . Here is an example .
Data folder
data/
|-- drug.csv # drug node
|-- gene.csv # gene node
|-- disease.csv # disease node
|-- drug-interact-drug.csv # drug-drug Interaction edge
|-- drug-interact-gene.csv # drug-gene Interaction edge
|-- drug-treat-disease.csv # drug-disease Treatment side
Similar to the case of isomorphic graphs , Users can use things like Pandas Such a bag will first CSV File resolved to numpy Array or frame tensor , Then build a relational Dictionary , And use it to construct a heterogeneous graph . This method is also applicable to other popular file formats , such as GML or JSON.
DGL Binary format
DGL Provides dgl.save_graphs() and dgl.load_graphs() function , It is used to save heterogeneous diagrams in binary format and load them .
Edge type subgraph
Users can create subgraphs of heterogeneous graphs by specifying the relationships to be preserved , Relevant features will also be copied .
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
})
g.nodes['drug'].data['hv'] = th.ones(3, 1)
# Keep the relationship ('drug', 'interacts', 'drug') and ('drug', 'treats', 'disease') .
# 'drug' and 'disease' Nodes of type will also be preserved
eg = dgl.edge_type_subgraph(g,[('drug', 'interacts', 'drug'),
('drug', 'treats', 'disease')])
eg
Graph(num_nodes={'disease': 3, 'drug': 3},
num_edges={('drug', 'interacts', 'drug'): 2, ('drug', 'treats', 'disease'): 1},
metagraph=[('drug', 'drug', 'interacts'), ('drug', 'disease', 'treats')])
eg.nodes['drug'].data['hv']
tensor([[1.],
[1.],
[1.]])
dgl.edge_type_subgraph(g, [('drug', 'interacts', 'gene')])
Graph(num_nodes={'drug': 3, 'gene': 4},
num_edges={('drug', 'interacts', 'gene'): 2},
metagraph=[('drug', 'gene', 'interacts')])
Transform heterogeneous graphs into isomorphic graphs
Heterogeneous graph provides a clear interface for managing different types of nodes and edges and their related features . This is especially useful when :
- The characteristics of different types of nodes and edges have different data types or sizes .
- Users want to apply different operations to different types of nodes and edges .
If the above does not apply , And users do not want to distinguish the types of nodes and edges in modeling , be DGL Allow to use dgl.DGLGraph.to_homogeneous() API Transform heterogeneous graphs into isomorphic graphs . The specific actions are as follows :
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))})
g.nodes['drug'].data['hv'] = th.zeros(3, 1)
g.nodes['disease'].data['hv'] = th.ones(3, 1)
g.edges['interacts'].data['he'] = th.zeros(2, 1)
g.edges['treats'].data['he'] = th.zeros(1, 2)
# Feature merging is not performed by default
hg = dgl.to_homogeneous(g)
'hv' in hg.ndata
False
hg
Graph(num_nodes=6, num_edges=3,
ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), '_TYPE': Scheme(shape=(), dtype=torch.int64)}
edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), '_TYPE': Scheme(shape=(), dtype=torch.int64)})
# Characteristics of copy edge
# For features to be copied ,DGL It is assumed that the features to be merged of different types of nodes or edges have the same size and data type
g = dgl.to_homogeneous(g, edata=['he']) # There is no same feature shape
# DGLError: Cannot concatenate column ‘he’ with shape Scheme(shape=(2,), dtype=torch.float32) and shape Scheme(shape=(1,), dtype=torch.float32)
# Copy node characteristics
hg = dgl.to_homogeneous(g, ndata=['hv'])
hg.ndata['hv']
tensor([[1.],
[1.],
[1.],
[0.],
[0.],
[0.]])
The type of the original node or edge and the corresponding ID Stored in ndata and edata in .
# The order of node types in a heterogeneous graph
g.ntypes
['disease', 'drug']
# Original node type
hg.ndata[dgl.NTYPE]
tensor([0, 0, 0, 1, 1, 1])
# The original node of a specific type ID
hg.ndata[dgl.NID]
tensor([0, 1, 2, 0, 1, 2])
# The order of edge types in heterogeneous graphs
g.etypes
['interacts', 'treats']
# Original edge type
hg.edata[dgl.ETYPE]
tensor([0, 0, 1])
# The original specific type of edge ID
hg.edata[dgl.EID]
tensor([0, 1, 0])
For modeling purposes , Users may need to merge some relationships , And apply the same operation to them . To achieve this , You can extract edge type subgraphs of heterogeneous graphs first , Then the subgraph is transformed into isomorphic graph
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
})
sub_g = dgl.edge_type_subgraph(g, [('drug', 'interacts', 'drug'),
('drug', 'interacts', 'gene')])
h_sub_g = dgl.to_homogeneous(sub_g)
h_sub_g
Graph(num_nodes=7, num_edges=4,
ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), '_TYPE': Scheme(shape=(), dtype=torch.int64)}
edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), '_TYPE': Scheme(shape=(), dtype=torch.int64)})
1.6 stay GPU Upper use DGLGraph
Users can pass in two GPU Tensor to create GPU Upper DGLGraph . Another way is to use to() API take DGLGraph Copied to the GPU, This will copy the graph structure and feature data to the specified device .
import dgl
import torch as th
u, v = th.tensor([0, 1, 2]), th.tensor([2, 3, 4])
g = dgl.graph((u, v))
g.ndata['x'] = th.rand(5, 3)
g.device
device(type='cpu')
cuda_g = g.to('cuda:0') # Accept any device object from the back-end framework
cuda_g.device
device(type='cuda', index=0)
cuda_g.ndata['x'].device # Feature data is also copied to GPU On
device(type='cuda', index=0)
# from GPU The graph of tensor construction is also in GPU On
u, v = u.to('cuda:0'), v.to('cuda:0')
g = dgl.graph((u, v))
g.device
device(type='cuda', index=0)
Any involvement GPU The operations of the graph are all in GPU Running on . therefore , This requires that all tensor parameters have been placed in GPU On , As a result, ( Graph or tensor ) Will also be GPU On . Besides ,GPU Only accept GPU Feature data on .
cuda_g.in_degrees() # The degree of
tensor([0, 0, 1, 1, 1], device='cuda:0')
cuda_g.in_edges([2, 3, 4]) # Parameters of non tensor type can be accepted
(tensor([0, 1, 2], device='cuda:0'), tensor([2, 3, 4], device='cuda:0'))
cuda_g.in_edges(th.tensor([2, 3, 4]).to('cuda:0')) # Parameters of tensor type must be in GPU On
(tensor([0, 1, 2], device='cuda:0'), tensor([2, 3, 4], device='cuda:0'))
cuda_g.ndata['h'] = th.randn(5, 4)
# Cannot assign node feature "h" on device cpu to a graph on device cuda:0. Call DGLGraph.to() to copy the graph to the same device.
边栏推荐
- 华为Mate 40系列曝光:大曲率双曲面屏,5nm麒麟1020处理器!还将有天玑1000+的版本
- 做题笔记3(二分查找)
- The maximum recommended number of rows for MySQL is 2000W. Is it reliable?
- The longest substring of sword finger offer without repeated characters
- 【深度学习】:《PyTorch入门到项目实战》:简洁代码实现线性神经网络(附代码)
- Efficiency comparison of three methods for obtaining timestamp
- Educational codeforces round 126 (rated for Div. 2) f.teleporters (two sets and two points)
- Some notes on how unity objects move
- Realization of reflection and refraction effect in unity shader cube texture
- Codeforces Round #750 (Div. 2) F.Korney Korneevich and XOR (easy&&hard version)(dp)
猜你喜欢

Ugui learning notes (II) Scrollview related

Unity shader transparent effect

Re14:读论文 ILLSI Interpretable Low-Resource Legal Decision Making

火了 2 年的服务网格究竟给微服务带来了什么?(转载)

【深度学习】:《PyTorch入门到项目实战》第九天:Dropout实现(含源码)

RE14: reading paper illsi interpretable low resource legal decision making
![[deep learning]: introduction to pytorch to project practice: simple code to realize linear neural network (with code)](/img/19/18d6e94a1e0fa4a75b66cf8cd99595.png)
[deep learning]: introduction to pytorch to project practice: simple code to realize linear neural network (with code)

Unity editor learning (I) using features to change the display of fields in components

Unity shader realizes mirror effect with rendered texture

Unity shader cartoon style rendering
随机推荐
Round 1C 2022 - Code jam 2022 b.square (Mathematics, thinking)
SUSE CEPH rapid deployment – storage6
Games101 assignment04 job 04
Alibaba cloud - Wulin headlines - site building expert competition
Create a self-organizing / safe / controllable Lora network! Semtech responded for the first time to the impact of the "new regulations of the Ministry of industry and information technology"
侦察机与预警机的区别
充分利用----英文
概率论与数理统计第一章
Unity shader transparent effect
Leetcode 2022.04.10 China Merchants Bank special competition D. store promotion (DP)
Ugui learning notes (IV) ugui event system overview and Usage Summary
飞马D200S无人机与机载激光雷达在大比例尺DEM建设中的应用
Unity3d simple implementation of water surface shader
Comprehensively design an oppe homepage -- after sales service of the page
Unity shader texture animation
Epoll horizontal departure, which edge triggers
[deep learning]: model evaluation and selection on the seventh day of pytorch introduction to project practice (Part 1): under fitting and over fitting (including source code)
The 16th program design competition of Dalian University of Technology (Problem Solver)
深入理解 DeepSea 和 Salt 部署工具 – Storage6
MD5 encryption verification