当前位置:网站首页>[paper reading | deep reading] rolne: improving the quality of network embedding with structural role proximity

[paper reading | deep reading] rolne: improving the quality of network embedding with structural role proximity

2022-07-07 02:17:00 Haihong Pro

Preface

Hello!
Thank you very much for reading Haihong's article , If there are mistakes in the text , You are welcome to point out ~
 
Self introduction. ଘ(੭ˊᵕˋ)੭
nickname : Sea boom
label : Program the ape |C++ player | Student
brief introduction : because C Language and programming , Then I turned to computer science , Won a National Scholarship , I was lucky to win some national awards in the competition 、 Provincial award … It has been insured .
Learning experience : Solid foundation + Take more notes + Knock more code + Think more + Learn English well !
 
Only hard work
 

Know what it is Know why !

 
This article only records what you are interested in

brief introduction

Link to the original text :https://link.springer.com/chapter/10.1007/978-3-030-62005-9_2

meeting :International Conference on Web Information Systems Engineering (ICIS)

year :2020

Abstract

Node Structural role It is the basic information of network structure , It provides a better perspective for understanding the network structure

Most network embedding learning algorithms try to preserve the neighborhood information of nodes

However , These methods It is difficult to identify the structural role closeness of nodes

We have come up with a new method RolNE

  • The method is passed by Clustering the degree vector of nodes to learn the structural role closeness of nodes
  • And use Aggregate functions To learn node embedding including neighborhood information and structural role closeness

Experiments on multiple data sets show that , Our algorithm outperforms other state-of-the-art baselines on downstream tasks .

1 Introduction

The Internet describes the complex information in daily life . for example

  • Email communication constitutes a social network between people [1]
  • The traffic between cities constitutes the traffic network

How to efficiently perform analysis tasks on large-scale networks , Such as node classification and clustering [2], It has always been the basis and focus of research in this field

At present, the mainstream method is to use network representation learning to learn the characteristics of nodes in the network

The network embedding algorithm transforms the network information into a low dimensional dense real vector , As the input of downstream machine learning algorithm


at present , The work of network embedded learning algorithm mainly focuses on maintaining the microstructure of the network , Such as the first-order proximity between nodes 、 Second order proximity and higher order proximity

  • DeepWalk[7] In the network embedded learning task word2vec[13]
  • More work [8,9,14,15,28] The definition of neighborhood is extended from different ranges , And capture neighborhood information to improve DeepWalk
  • GraphSAGE[16] It is an important network embedding learning algorithm proposed in recent years . The algorithm mainly learns to extract feature information from the local neighborhood of nodes , Then learn the embedding of nodes

However , The structural information in the network not only has microstructure [3], It also contains mesoscopic structure , Such as Structural roles are close


Structural role closeness describes the similarity between nodes with similar roles in the network , For example, the edge of the chain 、 The center of the star and the bridge between the two communities [3]

Especially in e-mail network and transmission network , The role of node structure is of great significance for characterizing nodes
for example , In social networks made up of email connections , The secretaries of the two departments are far apart in the network , It does have similar structural roles

And the first-order proximity of capturing node neighborhood information 、 The second-order proximity is different from the high-order proximity , Structural role proximity attempts to capture the similarity between nodes with the same structural role

Pictured 1 Shown , node a and b Far away , But they have similar structural roles

 Insert picture description here

Intuitively , We believe that nodes with similar roles should have similar embeddedness

  • But if two nodes are far away in the network and have no common neighbors , Embedding learning based on neighborhood information cannot capture the structural role similarity of nodes
  • If the classification of nodes in the network depends more on the proximity of structural roles , The embedding of capturing structural role proximity will perform better

Existing network embedding learning algorithms that maintain structural role similarity usually require manual annotation of topological features [17], Carry out complex feature Engineering

Then calculate the similarity score between nodes to obtain the structural similarity between nodes [10]

When these algorithms define the structural roles of nodes corresponding to different topologies in the local network neighborhood

For example, nodes on the chain 、 Star center in discrete space , They need to predefine such discrete roles , This requires domain experts and manual inspection of the network structure

We designed a new algorithm RolNE, The algorithm

  • Use a simple heuristic method to find the structural role of nodes
  • Then an aggregation function is used to jointly learn the structural role proximity of nodes and the neighborhood information of nodes , To improve the quality of node embedding

We summarize two main contributions of our research :

  • First , We use heuristic methods to find nodes with similar structural roles , These nodes do not need to manually label the topological features in the network or calculate the similarity between the two nodes .
  • secondly , Our method effectively uses neighbor information and structural role closeness to learn node embedding . Compared with the previous methods that only capture the neighborhood characteristics of nodes or only maintain the equivalence of node structure , Improve the quality of network embedding . The learned node embedding includes the similarity of node structure roles , It also contains the order information of node connection .

2 Related Work

The early work is mainly through Matrix decomposition To get node embedding

  • LLE( Local linear embedding )[4] Suppose that the embedding of nodes and their neighborhoods is located in the local linear region of the manifold
  • LE( Laplacian characteristic mapping )[5] The embedding of each node is obtained by solving the eigenvalue of the Laplace matrix corresponding to the network
  • GraRep[6] Perform singular value decomposition on a specific relational matrix , Feature dimension reduction , Get node embedding

DeepWalk[7] Based on Random walk The node embedding learning algorithm of has developed rapidly . The algorithm uses random walks to capture the structural relationship between vertices

  • DeepWalk The algorithm first uses random walks to generate sequences , Then use the skip grammar model to learn node embedding .
  • Node2vec[8] The algorithm proposes a method based on DeepWalk Biased random walk algorithm to generate sequences , Explore the neighborhood structure more effectively
  • Harp[9] Merge the relevant nodes in the network into super nodes , And then use DeepWalk and Node2vec Generate node embedding
  • Line[11] It is an edge modeling algorithm , It uses the direct connection between nodes to describe the first-order similarity relationship , The public neighbors of two nodes that are not directly connected are used to describe the secondary similarity relationship between nodes

With the continuous expansion of the application scope of deep learning , In recent years, there have been many uses Deep learning Network embedded learning method

  • DNGR[18] Use random surfing model to capture network structure information directly , And apply the deep learning method to PPMI matrix . In order to enhance the robustness of the model
  • DNGR Use stacked denoising automatic encoder to learn multi-layer embedding
  • SDNE[12] A semi supervised depth model with multi-layer nonlinear functions is proposed to capture highly nonlinear network structures
  • GraphSAGE[16] Graph neural network is used to gather node neighborhood information to generate node embedding
  • GCN[19] Improved the idea of neighborhood aggregation , The information of each neighborhood is normalized . This normalization is not a simple average , Instead, it introduces the differences between each neighbor , And reduce the weight of high neighbors
  • [26] Use anonymous traversal and graph anchors LDA Build a theme model on the graph , To reduce complexity and effectively generate structured topics .

The above method is mainly to save the neighborhood information of nodes , But the structural role of nodes is also essential information in the network

  • RolX[17] It is a method of using network structure to identify the role of nodes , This method needs to enumerate the local structural characteristics of nodes , Such as degree 、 Triangle count and PageRank[21] fraction
  • SNS[20] Use ORVA[22] To help calculate the Graphlet Degree vector , Then calculate the Euclidean distance between vectors
  • Struct2vec Coding nodes with similar structural roles into multi-layer graphs , And capture structural information through random walks
  • GraphWave[23] Use the thermal wavelet diffusion model to learn the structural information of nodes
  • GraphCSC[27] Use link information and centrality information to learn the low dimensional vector representation of network vertices .GraphCSC The learned embedding can retain different central information of nodes

The above method basically only learns part of the structure information of nodes , Such as neighborhood information or structural role information

The embedded information of nodes is not comprehensive

3 RolNE

The core idea of the algorithm is Gather feature information from local neighborhoods of nodes and node sets with similar roles

This method should have two characteristics :

  • a) The embedding distance of nodes should be closely related to the proximity of node structure roles , That is, nodes with similar structural roles should have similar embeddedness , Nodes with different structural roles should be away from
  • b) The embedding distance of nodes should have a strong correlation with the neighborhood information of nodes , That is, the embedding distance of directly connected nodes or nodes with public neighbors should be relatively close .

We proposed RolNE, It mainly has two steps :

  1. By clustering the degree vector of nodes , Find nodes with similar structural roles
  2. utilize Aggregation function Learn node embedding including neighborhood information and structural role information

The training process includes many iterations

  • In each iteration , Nodes will gather information from nodes in the local neighborhood and nodes with the same structural role
  • As the number of iterations increases , Nodes will go deeper 、 Collect more information in a wider range

Pictured 1 Shown , node a and b The same degree , The sum of degrees of adjacent nodes is also roughly the same , The two nodes have similar structural roles

 Insert picture description here
Intuitively speaking , We can use these two-dimensional vectors ( Degree vector ) To determine whether nodes have similar structural roles

It is to judge the similarity of structural roles according to the degree vector of nodes


RolNE Algorithm diagram of :

 Insert picture description here


In order to find similar structural role nodes , For each node in the graph u u u, We Calculate the degree vector of this node x ( u ) x(u) x(u)

The degree vector contains two dimensions

  • The first dimension is Degree of node
  • The second dimension is Sum of degrees of neighborhood of nodes

 Insert picture description here
After generating the degree vectors of all nodes , Use Kmeans[24] The algorithm clusters nodes , Get clusters of nodes with similar structural roles R R R

Simply put : Use degree vector as the classification standard of node structure roles , In the use of Kmeans Clustering , obtain R


polymerization

  • K K K Indicates the number of iterations of the aggregation operation
  • k k k Indicates the current step
  • h u k h^k_u huk Indicates in step k k k Embedded node u u u

For each node u ∈ V u∈V uV, We calculate it Direct neighbors h N ( u ) k h^k_{N(u)} hN(u)k The average of the embedding sum of

 Insert picture description here

And then calculate Its nodes are in the same structure role cluster h R ( u ) k h^k_{R(u)} hR(u)k The average value of the embedded sum in

 Insert picture description here


We Embed these two with the current embedding of the node h u k − 1 h^{k-1}_u huk1 Connect , And input the connected vector into a completely connected layer

Nonlinear activation function is used in the full connectivity layer , Output is h u k h^k_u huk, As input for the next iteration

 Insert picture description here
After completing all iterations

  • Output h u k h^k_u huk Will be used as the final embedding of each node
  • W k W^k Wk yes k k k Layer weight matrix

We adopt and GraphSAGE The same loss function , The graph based loss function is applied to Z u Z_u Zu, All parameters pass Stochastic gradient descent Training

The whole training process is unsupervised , The loss function is as follows

 Insert picture description here

  • Q Q Q Is the number of negative samples
  • v v v It can be reached by random walking with a fixed length u u u Nearby nodes , And with u u u Nodes belonging to the same cluster

4 Experimental Estimate

4.1 Barbell Graph

 Insert picture description here

 Insert picture description here
We can see ,RolNE It has been correctly learned that nodes with similar structural roles have similar embeddedness , And project all node vectors with the same color onto the space closer to each other

On the given barbell diagram , Blue and black nodes are better than green 、 Pink and purple nodes are further away from red nodes

This model will learn nodes with similar structure together , The distance between nodes with the same color is close , The node order of projection is also consistent with the original figure

The nodes closest to red are orange 、 green 、 Pink 、 violet 、 Black and blue

RolNE The structure information and neighborhood information of the original graph are effectively captured

4.2 Mirror Karate Club

 Insert picture description here
 Insert picture description here

4.3 Air Traffic Network

 Insert picture description here

4.4 Enron Email Network

 Insert picture description here

5 Conclusion

We have come up with a new method RolNE

  • It can capture the neighborhood characteristics of nodes
  • It can also capture the structural role of nodes

Through and with DeepWalk、Node2vec、GraphSAGE、struc2vec And other latest technologies , The effectiveness of the algorithm on multiple data sets is proved

Different algorithms capture different structural features , and RolNE Capture more comprehensive structural information , Rich node embeddedness is generated

After reading the summary

The general idea of this article is not complicated

The main step is

  1. Use degree vector as standard , Use k-means The algorithm clustering , Get a cluster R
  2. Then use the aggregate function : Multiple iterations , Using the loss function ( reference GraphSAGE) And multilayer neural networks , Make the same structural roles and directly connected nodes embedded closer in the embedded representation

Is probably : Cluster first , Then take a loss function as the optimization goal , Use neural network to train parameters , Make the same structural roles and directly connected nodes embedded closer in the embedded representation

Details , Mainly the last loss function , Do not know much about , But the overall idea is a little understood

Conclusion

The article is only used as a personal study note , Record from 0 To 1 A process of

Hope to help you a little bit , If you have any mistakes, you are welcome to correct them

 Insert picture description here

原网站

版权声明
本文为[Haihong Pro]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207061835381851.html