当前位置:网站首页>[paper reading | deep reading] rolne: improving the quality of network embedding with structural role proximity
[paper reading | deep reading] rolne: improving the quality of network embedding with structural role proximity
2022-07-07 02:17:00 【Haihong Pro】
Catalog
Preface
Hello!
Thank you very much for reading Haihong's article , If there are mistakes in the text , You are welcome to point out ~
Self introduction. ଘ(੭ˊᵕˋ)੭
nickname : Sea boom
label : Program the ape |C++ player | Student
brief introduction : because C Language and programming , Then I turned to computer science , Won a National Scholarship , I was lucky to win some national awards in the competition 、 Provincial award … It has been insured .
Learning experience : Solid foundation + Take more notes + Knock more code + Think more + Learn English well !
Only hard work
Know what it is Know why !
This article only records what you are interested in
brief introduction
Link to the original text :https://link.springer.com/chapter/10.1007/978-3-030-62005-9_2
meeting :International Conference on Web Information Systems Engineering (ICIS)
year :2020
Abstract
Node Structural role It is the basic information of network structure , It provides a better perspective for understanding the network structure
Most network embedding learning algorithms try to preserve the neighborhood information of nodes
However , These methods It is difficult to identify the structural role closeness of nodes
We have come up with a new method RolNE
- The method is passed by Clustering the degree vector of nodes to learn the structural role closeness of nodes
- And use Aggregate functions To learn node embedding including neighborhood information and structural role closeness
Experiments on multiple data sets show that , Our algorithm outperforms other state-of-the-art baselines on downstream tasks .
1 Introduction
The Internet describes the complex information in daily life . for example
- Email communication constitutes a social network between people [1]
- The traffic between cities constitutes the traffic network
How to efficiently perform analysis tasks on large-scale networks , Such as node classification and clustering [2], It has always been the basis and focus of research in this field
At present, the mainstream method is to use network representation learning to learn the characteristics of nodes in the network
The network embedding algorithm transforms the network information into a low dimensional dense real vector , As the input of downstream machine learning algorithm
at present , The work of network embedded learning algorithm mainly focuses on maintaining the microstructure of the network , Such as the first-order proximity between nodes 、 Second order proximity and higher order proximity
- DeepWalk[7] In the network embedded learning task word2vec[13]
- More work [8,9,14,15,28] The definition of neighborhood is extended from different ranges , And capture neighborhood information to improve DeepWalk
- GraphSAGE[16] It is an important network embedding learning algorithm proposed in recent years . The algorithm mainly learns to extract feature information from the local neighborhood of nodes , Then learn the embedding of nodes
However , The structural information in the network not only has microstructure [3], It also contains mesoscopic structure , Such as Structural roles are close
Structural role closeness describes the similarity between nodes with similar roles in the network , For example, the edge of the chain 、 The center of the star and the bridge between the two communities [3]
Especially in e-mail network and transmission network , The role of node structure is of great significance for characterizing nodes
for example , In social networks made up of email connections , The secretaries of the two departments are far apart in the network , It does have similar structural roles
And the first-order proximity of capturing node neighborhood information 、 The second-order proximity is different from the high-order proximity , Structural role proximity attempts to capture the similarity between nodes with the same structural role
Pictured 1 Shown , node a and b Far away , But they have similar structural roles
Intuitively , We believe that nodes with similar roles should have similar embeddedness
- But if two nodes are far away in the network and have no common neighbors , Embedding learning based on neighborhood information cannot capture the structural role similarity of nodes
- If the classification of nodes in the network depends more on the proximity of structural roles , The embedding of capturing structural role proximity will perform better
Existing network embedding learning algorithms that maintain structural role similarity usually require manual annotation of topological features [17], Carry out complex feature Engineering
Then calculate the similarity score between nodes to obtain the structural similarity between nodes [10]
When these algorithms define the structural roles of nodes corresponding to different topologies in the local network neighborhood
For example, nodes on the chain 、 Star center in discrete space , They need to predefine such discrete roles , This requires domain experts and manual inspection of the network structure
We designed a new algorithm RolNE, The algorithm
- Use a simple heuristic method to find the structural role of nodes
- Then an aggregation function is used to jointly learn the structural role proximity of nodes and the neighborhood information of nodes , To improve the quality of node embedding
We summarize two main contributions of our research :
- First , We use heuristic methods to find nodes with similar structural roles , These nodes do not need to manually label the topological features in the network or calculate the similarity between the two nodes .
- secondly , Our method effectively uses neighbor information and structural role closeness to learn node embedding . Compared with the previous methods that only capture the neighborhood characteristics of nodes or only maintain the equivalence of node structure , Improve the quality of network embedding . The learned node embedding includes the similarity of node structure roles , It also contains the order information of node connection .
2 Related Work
The early work is mainly through Matrix decomposition To get node embedding
- LLE( Local linear embedding )[4] Suppose that the embedding of nodes and their neighborhoods is located in the local linear region of the manifold
- LE( Laplacian characteristic mapping )[5] The embedding of each node is obtained by solving the eigenvalue of the Laplace matrix corresponding to the network
- GraRep[6] Perform singular value decomposition on a specific relational matrix , Feature dimension reduction , Get node embedding
DeepWalk[7] Based on Random walk The node embedding learning algorithm of has developed rapidly . The algorithm uses random walks to capture the structural relationship between vertices
- DeepWalk The algorithm first uses random walks to generate sequences , Then use the skip grammar model to learn node embedding .
- Node2vec[8] The algorithm proposes a method based on DeepWalk Biased random walk algorithm to generate sequences , Explore the neighborhood structure more effectively
- Harp[9] Merge the relevant nodes in the network into super nodes , And then use DeepWalk and Node2vec Generate node embedding
- Line[11] It is an edge modeling algorithm , It uses the direct connection between nodes to describe the first-order similarity relationship , The public neighbors of two nodes that are not directly connected are used to describe the secondary similarity relationship between nodes
With the continuous expansion of the application scope of deep learning , In recent years, there have been many uses Deep learning Network embedded learning method
- DNGR[18] Use random surfing model to capture network structure information directly , And apply the deep learning method to PPMI matrix . In order to enhance the robustness of the model
- DNGR Use stacked denoising automatic encoder to learn multi-layer embedding
- SDNE[12] A semi supervised depth model with multi-layer nonlinear functions is proposed to capture highly nonlinear network structures
- GraphSAGE[16] Graph neural network is used to gather node neighborhood information to generate node embedding
- GCN[19] Improved the idea of neighborhood aggregation , The information of each neighborhood is normalized . This normalization is not a simple average , Instead, it introduces the differences between each neighbor , And reduce the weight of high neighbors
- [26] Use anonymous traversal and graph anchors LDA Build a theme model on the graph , To reduce complexity and effectively generate structured topics .
The above method is mainly to save the neighborhood information of nodes , But the structural role of nodes is also essential information in the network
- RolX[17] It is a method of using network structure to identify the role of nodes , This method needs to enumerate the local structural characteristics of nodes , Such as degree 、 Triangle count and PageRank[21] fraction
- SNS[20] Use ORVA[22] To help calculate the Graphlet Degree vector , Then calculate the Euclidean distance between vectors
- Struct2vec Coding nodes with similar structural roles into multi-layer graphs , And capture structural information through random walks
- GraphWave[23] Use the thermal wavelet diffusion model to learn the structural information of nodes
- GraphCSC[27] Use link information and centrality information to learn the low dimensional vector representation of network vertices .GraphCSC The learned embedding can retain different central information of nodes
The above method basically only learns part of the structure information of nodes , Such as neighborhood information or structural role information
The embedded information of nodes is not comprehensive
3 RolNE
The core idea of the algorithm is Gather feature information from local neighborhoods of nodes and node sets with similar roles
This method should have two characteristics :
- a) The embedding distance of nodes should be closely related to the proximity of node structure roles , That is, nodes with similar structural roles should have similar embeddedness , Nodes with different structural roles should be away from
- b) The embedding distance of nodes should have a strong correlation with the neighborhood information of nodes , That is, the embedding distance of directly connected nodes or nodes with public neighbors should be relatively close .
We proposed RolNE, It mainly has two steps :
- By clustering the degree vector of nodes , Find nodes with similar structural roles
- utilize Aggregation function Learn node embedding including neighborhood information and structural role information
The training process includes many iterations
- In each iteration , Nodes will gather information from nodes in the local neighborhood and nodes with the same structural role
- As the number of iterations increases , Nodes will go deeper 、 Collect more information in a wider range
Pictured 1 Shown , node a and b The same degree , The sum of degrees of adjacent nodes is also roughly the same , The two nodes have similar structural roles
Intuitively speaking , We can use these two-dimensional vectors ( Degree vector ) To determine whether nodes have similar structural roles
It is to judge the similarity of structural roles according to the degree vector of nodes
RolNE Algorithm diagram of :
In order to find similar structural role nodes , For each node in the graph u u u, We Calculate the degree vector of this node x ( u ) x(u) x(u)
The degree vector contains two dimensions
- The first dimension is Degree of node
- The second dimension is Sum of degrees of neighborhood of nodes
After generating the degree vectors of all nodes , Use Kmeans[24] The algorithm clusters nodes , Get clusters of nodes with similar structural roles R R R
Simply put : Use degree vector as the classification standard of node structure roles , In the use of Kmeans Clustering , obtain R
polymerization
- K K K Indicates the number of iterations of the aggregation operation
- k k k Indicates the current step
- h u k h^k_u huk Indicates in step k k k Embedded node u u u
For each node u ∈ V u∈V u∈V, We calculate it Direct neighbors h N ( u ) k h^k_{N(u)} hN(u)k The average of the embedding sum of
And then calculate Its nodes are in the same structure role cluster h R ( u ) k h^k_{R(u)} hR(u)k The average value of the embedded sum in
We Embed these two with the current embedding of the node h u k − 1 h^{k-1}_u huk−1 Connect , And input the connected vector into a completely connected layer
Nonlinear activation function is used in the full connectivity layer , Output is h u k h^k_u huk, As input for the next iteration
After completing all iterations
- Output h u k h^k_u huk Will be used as the final embedding of each node
- W k W^k Wk yes k k k Layer weight matrix
We adopt and GraphSAGE The same loss function , The graph based loss function is applied to Z u Z_u Zu, All parameters pass Stochastic gradient descent Training
The whole training process is unsupervised , The loss function is as follows
- Q Q Q Is the number of negative samples
- v v v It can be reached by random walking with a fixed length u u u Nearby nodes , And with u u u Nodes belonging to the same cluster
4 Experimental Estimate
4.1 Barbell Graph
We can see ,RolNE It has been correctly learned that nodes with similar structural roles have similar embeddedness , And project all node vectors with the same color onto the space closer to each other
On the given barbell diagram , Blue and black nodes are better than green 、 Pink and purple nodes are further away from red nodes
This model will learn nodes with similar structure together , The distance between nodes with the same color is close , The node order of projection is also consistent with the original figure
The nodes closest to red are orange 、 green 、 Pink 、 violet 、 Black and blue
RolNE The structure information and neighborhood information of the original graph are effectively captured
4.2 Mirror Karate Club
4.3 Air Traffic Network
4.4 Enron Email Network
5 Conclusion
We have come up with a new method RolNE
- It can capture the neighborhood characteristics of nodes
- It can also capture the structural role of nodes
Through and with DeepWalk、Node2vec、GraphSAGE、struc2vec And other latest technologies , The effectiveness of the algorithm on multiple data sets is proved
Different algorithms capture different structural features , and RolNE Capture more comprehensive structural information , Rich node embeddedness is generated
After reading the summary
The general idea of this article is not complicated
The main step is
- Use degree vector as standard , Use k-means The algorithm clustering , Get a cluster R
- Then use the aggregate function : Multiple iterations , Using the loss function ( reference GraphSAGE) And multilayer neural networks , Make the same structural roles and directly connected nodes embedded closer in the embedded representation
Is probably : Cluster first , Then take a loss function as the optimization goal , Use neural network to train parameters , Make the same structural roles and directly connected nodes embedded closer in the embedded representation
Details , Mainly the last loss function , Do not know much about , But the overall idea is a little understood
Conclusion
The article is only used as a personal study note , Record from 0 To 1 A process of
Hope to help you a little bit , If you have any mistakes, you are welcome to correct them
边栏推荐
- 3D laser slam: time synchronization of livox lidar hardware
- FLIR blackfly s usb3 industrial camera: white balance setting method
- Schedulx v1.4.0 and SaaS versions are released, and you can experience the advanced functions of cost reduction and efficiency increase for free!
- Golang foundation - data type
- ROS学习(26)动态参数配置
- 【论文阅读|深读】 GraphSAGE:Inductive Representation Learning on Large Graphs
- Sensor: DS1302 clock chip and driver code
- [leetcode] day97 remove linked list elements
- Big guys gather | nextarch foundation cloud development meetup is coming!
- FLIR blackfly s industrial camera: auto exposure configuration and code
猜你喜欢
Flir Blackfly S 工业相机:配置多个摄像头进行同步拍摄
一片葉子兩三萬?植物消費爆火背後的“陽謀”
解密函数计算异步任务能力之「任务的状态及生命周期管理」
Cisp-pte practice explanation (II)
Flir Blackfly S USB3 工业相机:白平衡设置方法
Recognition of C language array
centos8安裝mysql報錯:The GPG keys listed for the “MySQL 8.0 Community Server“ repository are already ins
ROS学习(23)action通信机制
CISP-PTE实操练习讲解(二)
【论文阅读|深读】DNGR:Deep Neural Networks for Learning Graph Representations
随机推荐
Flir Blackfly S USB3 工业相机:白平衡设置方法
3D激光SLAM:Livox激光雷达硬件时间同步
The foreground downloads network pictures without background processing
【LeetCode】Day97-移除链表元素
ROS学习(22)TF变换
@Before, @after, @around, @afterreturning execution sequence
ROS学习(24)plugin插件
ROS学习(二十)机器人SLAM功能包——rgbdslam的安装与测试
组合导航:中海达iNAV2产品描述及接口描述
Flir Blackfly S 工业相机:配置多个摄像头进行同步拍摄
Jacob Steinhardt, assistant professor of UC Berkeley, predicts AI benchmark performance: AI has made faster progress in fields such as mathematics than expected, but the progress of robustness benchma
ROS学习(26)动态参数配置
XML to map tool class xmlmaputils (tool class V)
Freeswitch dials extension number source code tracking
Flir Blackfly S USB3 工业相机:计数器和定时器的使用方法
Modify the system time of Px4 flight control
MetaForce原力元宇宙开发搭建丨佛萨奇2.0系统开发
[unique] what is the [chain storage structure]?
A new path for enterprise mid Platform Construction -- low code platform
The GPG keys listed for the "MySQL 8.0 community server" repository are already ins