当前位置:网站首页>[paper reading | deep reading] rolne: improving the quality of network embedding with structural role proximity
[paper reading | deep reading] rolne: improving the quality of network embedding with structural role proximity
2022-07-07 02:17:00 【Haihong Pro】
Catalog
Preface
Hello!
Thank you very much for reading Haihong's article , If there are mistakes in the text , You are welcome to point out ~
Self introduction. ଘ(੭ˊᵕˋ)੭
nickname : Sea boom
label : Program the ape |C++ player | Student
brief introduction : because C Language and programming , Then I turned to computer science , Won a National Scholarship , I was lucky to win some national awards in the competition 、 Provincial award … It has been insured .
Learning experience : Solid foundation + Take more notes + Knock more code + Think more + Learn English well !
Only hard work
Know what it is Know why !
This article only records what you are interested in
brief introduction
Link to the original text :https://link.springer.com/chapter/10.1007/978-3-030-62005-9_2
meeting :International Conference on Web Information Systems Engineering (ICIS)
year :2020
Abstract
Node Structural role It is the basic information of network structure , It provides a better perspective for understanding the network structure
Most network embedding learning algorithms try to preserve the neighborhood information of nodes
However , These methods It is difficult to identify the structural role closeness of nodes
We have come up with a new method RolNE
- The method is passed by Clustering the degree vector of nodes to learn the structural role closeness of nodes
- And use Aggregate functions To learn node embedding including neighborhood information and structural role closeness
Experiments on multiple data sets show that , Our algorithm outperforms other state-of-the-art baselines on downstream tasks .
1 Introduction
The Internet describes the complex information in daily life . for example
- Email communication constitutes a social network between people [1]
- The traffic between cities constitutes the traffic network
How to efficiently perform analysis tasks on large-scale networks , Such as node classification and clustering [2], It has always been the basis and focus of research in this field
At present, the mainstream method is to use network representation learning to learn the characteristics of nodes in the network
The network embedding algorithm transforms the network information into a low dimensional dense real vector , As the input of downstream machine learning algorithm
at present , The work of network embedded learning algorithm mainly focuses on maintaining the microstructure of the network , Such as the first-order proximity between nodes 、 Second order proximity and higher order proximity
- DeepWalk[7] In the network embedded learning task word2vec[13]
- More work [8,9,14,15,28] The definition of neighborhood is extended from different ranges , And capture neighborhood information to improve DeepWalk
- GraphSAGE[16] It is an important network embedding learning algorithm proposed in recent years . The algorithm mainly learns to extract feature information from the local neighborhood of nodes , Then learn the embedding of nodes
However , The structural information in the network not only has microstructure [3], It also contains mesoscopic structure , Such as Structural roles are close
Structural role closeness describes the similarity between nodes with similar roles in the network , For example, the edge of the chain 、 The center of the star and the bridge between the two communities [3]
Especially in e-mail network and transmission network , The role of node structure is of great significance for characterizing nodes
for example , In social networks made up of email connections , The secretaries of the two departments are far apart in the network , It does have similar structural roles
And the first-order proximity of capturing node neighborhood information 、 The second-order proximity is different from the high-order proximity , Structural role proximity attempts to capture the similarity between nodes with the same structural role
Pictured 1 Shown , node a and b Far away , But they have similar structural roles
Intuitively , We believe that nodes with similar roles should have similar embeddedness
- But if two nodes are far away in the network and have no common neighbors , Embedding learning based on neighborhood information cannot capture the structural role similarity of nodes
- If the classification of nodes in the network depends more on the proximity of structural roles , The embedding of capturing structural role proximity will perform better
Existing network embedding learning algorithms that maintain structural role similarity usually require manual annotation of topological features [17], Carry out complex feature Engineering
Then calculate the similarity score between nodes to obtain the structural similarity between nodes [10]
When these algorithms define the structural roles of nodes corresponding to different topologies in the local network neighborhood
For example, nodes on the chain 、 Star center in discrete space , They need to predefine such discrete roles , This requires domain experts and manual inspection of the network structure
We designed a new algorithm RolNE, The algorithm
- Use a simple heuristic method to find the structural role of nodes
- Then an aggregation function is used to jointly learn the structural role proximity of nodes and the neighborhood information of nodes , To improve the quality of node embedding
We summarize two main contributions of our research :
- First , We use heuristic methods to find nodes with similar structural roles , These nodes do not need to manually label the topological features in the network or calculate the similarity between the two nodes .
- secondly , Our method effectively uses neighbor information and structural role closeness to learn node embedding . Compared with the previous methods that only capture the neighborhood characteristics of nodes or only maintain the equivalence of node structure , Improve the quality of network embedding . The learned node embedding includes the similarity of node structure roles , It also contains the order information of node connection .
2 Related Work
The early work is mainly through Matrix decomposition To get node embedding
- LLE( Local linear embedding )[4] Suppose that the embedding of nodes and their neighborhoods is located in the local linear region of the manifold
- LE( Laplacian characteristic mapping )[5] The embedding of each node is obtained by solving the eigenvalue of the Laplace matrix corresponding to the network
- GraRep[6] Perform singular value decomposition on a specific relational matrix , Feature dimension reduction , Get node embedding
DeepWalk[7] Based on Random walk The node embedding learning algorithm of has developed rapidly . The algorithm uses random walks to capture the structural relationship between vertices
- DeepWalk The algorithm first uses random walks to generate sequences , Then use the skip grammar model to learn node embedding .
- Node2vec[8] The algorithm proposes a method based on DeepWalk Biased random walk algorithm to generate sequences , Explore the neighborhood structure more effectively
- Harp[9] Merge the relevant nodes in the network into super nodes , And then use DeepWalk and Node2vec Generate node embedding
- Line[11] It is an edge modeling algorithm , It uses the direct connection between nodes to describe the first-order similarity relationship , The public neighbors of two nodes that are not directly connected are used to describe the secondary similarity relationship between nodes
With the continuous expansion of the application scope of deep learning , In recent years, there have been many uses Deep learning Network embedded learning method
- DNGR[18] Use random surfing model to capture network structure information directly , And apply the deep learning method to PPMI matrix . In order to enhance the robustness of the model
- DNGR Use stacked denoising automatic encoder to learn multi-layer embedding
- SDNE[12] A semi supervised depth model with multi-layer nonlinear functions is proposed to capture highly nonlinear network structures
- GraphSAGE[16] Graph neural network is used to gather node neighborhood information to generate node embedding
- GCN[19] Improved the idea of neighborhood aggregation , The information of each neighborhood is normalized . This normalization is not a simple average , Instead, it introduces the differences between each neighbor , And reduce the weight of high neighbors
- [26] Use anonymous traversal and graph anchors LDA Build a theme model on the graph , To reduce complexity and effectively generate structured topics .
The above method is mainly to save the neighborhood information of nodes , But the structural role of nodes is also essential information in the network
- RolX[17] It is a method of using network structure to identify the role of nodes , This method needs to enumerate the local structural characteristics of nodes , Such as degree 、 Triangle count and PageRank[21] fraction
- SNS[20] Use ORVA[22] To help calculate the Graphlet Degree vector , Then calculate the Euclidean distance between vectors
- Struct2vec Coding nodes with similar structural roles into multi-layer graphs , And capture structural information through random walks
- GraphWave[23] Use the thermal wavelet diffusion model to learn the structural information of nodes
- GraphCSC[27] Use link information and centrality information to learn the low dimensional vector representation of network vertices .GraphCSC The learned embedding can retain different central information of nodes
The above method basically only learns part of the structure information of nodes , Such as neighborhood information or structural role information
The embedded information of nodes is not comprehensive
3 RolNE
The core idea of the algorithm is Gather feature information from local neighborhoods of nodes and node sets with similar roles
This method should have two characteristics :
- a) The embedding distance of nodes should be closely related to the proximity of node structure roles , That is, nodes with similar structural roles should have similar embeddedness , Nodes with different structural roles should be away from
- b) The embedding distance of nodes should have a strong correlation with the neighborhood information of nodes , That is, the embedding distance of directly connected nodes or nodes with public neighbors should be relatively close .
We proposed RolNE, It mainly has two steps :
- By clustering the degree vector of nodes , Find nodes with similar structural roles
- utilize Aggregation function Learn node embedding including neighborhood information and structural role information
The training process includes many iterations
- In each iteration , Nodes will gather information from nodes in the local neighborhood and nodes with the same structural role
- As the number of iterations increases , Nodes will go deeper 、 Collect more information in a wider range
Pictured 1 Shown , node a and b The same degree , The sum of degrees of adjacent nodes is also roughly the same , The two nodes have similar structural roles
Intuitively speaking , We can use these two-dimensional vectors ( Degree vector ) To determine whether nodes have similar structural roles
It is to judge the similarity of structural roles according to the degree vector of nodes
RolNE Algorithm diagram of :
In order to find similar structural role nodes , For each node in the graph u u u, We Calculate the degree vector of this node x ( u ) x(u) x(u)
The degree vector contains two dimensions
- The first dimension is Degree of node
- The second dimension is Sum of degrees of neighborhood of nodes
After generating the degree vectors of all nodes , Use Kmeans[24] The algorithm clusters nodes , Get clusters of nodes with similar structural roles R R R
Simply put : Use degree vector as the classification standard of node structure roles , In the use of Kmeans Clustering , obtain R
polymerization
- K K K Indicates the number of iterations of the aggregation operation
- k k k Indicates the current step
- h u k h^k_u huk Indicates in step k k k Embedded node u u u
For each node u ∈ V u∈V u∈V, We calculate it Direct neighbors h N ( u ) k h^k_{N(u)} hN(u)k The average of the embedding sum of
And then calculate Its nodes are in the same structure role cluster h R ( u ) k h^k_{R(u)} hR(u)k The average value of the embedded sum in
We Embed these two with the current embedding of the node h u k − 1 h^{k-1}_u huk−1 Connect , And input the connected vector into a completely connected layer
Nonlinear activation function is used in the full connectivity layer , Output is h u k h^k_u huk, As input for the next iteration
After completing all iterations
- Output h u k h^k_u huk Will be used as the final embedding of each node
- W k W^k Wk yes k k k Layer weight matrix
We adopt and GraphSAGE The same loss function , The graph based loss function is applied to Z u Z_u Zu, All parameters pass Stochastic gradient descent Training
The whole training process is unsupervised , The loss function is as follows
- Q Q Q Is the number of negative samples
- v v v It can be reached by random walking with a fixed length u u u Nearby nodes , And with u u u Nodes belonging to the same cluster
4 Experimental Estimate
4.1 Barbell Graph
We can see ,RolNE It has been correctly learned that nodes with similar structural roles have similar embeddedness , And project all node vectors with the same color onto the space closer to each other
On the given barbell diagram , Blue and black nodes are better than green 、 Pink and purple nodes are further away from red nodes
This model will learn nodes with similar structure together , The distance between nodes with the same color is close , The node order of projection is also consistent with the original figure
The nodes closest to red are orange 、 green 、 Pink 、 violet 、 Black and blue
RolNE The structure information and neighborhood information of the original graph are effectively captured
4.2 Mirror Karate Club
4.3 Air Traffic Network
4.4 Enron Email Network
5 Conclusion
We have come up with a new method RolNE
- It can capture the neighborhood characteristics of nodes
- It can also capture the structural role of nodes
Through and with DeepWalk、Node2vec、GraphSAGE、struc2vec And other latest technologies , The effectiveness of the algorithm on multiple data sets is proved
Different algorithms capture different structural features , and RolNE Capture more comprehensive structural information , Rich node embeddedness is generated
After reading the summary
The general idea of this article is not complicated
The main step is
- Use degree vector as standard , Use k-means The algorithm clustering , Get a cluster R
- Then use the aggregate function : Multiple iterations , Using the loss function ( reference GraphSAGE) And multilayer neural networks , Make the same structural roles and directly connected nodes embedded closer in the embedded representation
Is probably : Cluster first , Then take a loss function as the optimization goal , Use neural network to train parameters , Make the same structural roles and directly connected nodes embedded closer in the embedded representation
Details , Mainly the last loss function , Do not know much about , But the overall idea is a little understood
Conclusion
The article is only used as a personal study note , Record from 0 To 1 A process of
Hope to help you a little bit , If you have any mistakes, you are welcome to correct them
边栏推荐
- The use of video in the wiper component causes full screen dislocation
- Introduction to RC oscillator and crystal oscillator
- Livox激光雷达硬件时间同步---PPS方法
- Time synchronization of livox lidar hardware -- PPS method
- ROS学习(21)机器人SLAM功能包——orbslam的安装与测试
- 强化学习如何用于医学影像?埃默里大学最新《强化学习医学影像分析》综述,阐述最新RL医学影像分析概念、应用、挑战与未来方向
- Analyze "C language" [advanced] paid knowledge [II]
- Flir Blackfly S工业相机:颜色校正讲解及配置与代码设置方法
- 微服务架构介绍
- 处理streamlit库上传的图片文件
猜你喜欢
FLIR blackfly s usb3 industrial camera: how to use counters and timers
Tiflash source code reading (IV) design and implementation analysis of tiflash DDL module
低代码平台中的数据连接方式(上)
红外相机:巨哥红外MAG32产品介绍
3D laser slam: time synchronization of livox lidar hardware
STM32F4---PWM输出
Blackfly S USB3工业相机:缓冲区处理
Flir Blackfly S 工业相机:配置多个摄像头进行同步拍摄
SchedulX V1.4.0及SaaS版发布,免费体验降本增效高级功能!
使用Ceres进行slam必须要弄清楚的几个类和函数
随机推荐
Lidar: introduction and usage of ouster OS
将截断字符串或二进制数据
ROS学习(24)plugin插件
Web开发小妙招:巧用ThreadLocal规避层层传值
Flir Blackfly S 工业相机 介绍
Big guys gather | nextarch foundation cloud development meetup is coming!
Shell script quickly counts the number of lines of project code
Draco - glTF模型压缩利器
The last line of defense of cloud primary mixing department: node waterline design
Analyze "C language" [advanced] paid knowledge [i]
ROS learning (XX) robot slam function package -- installation and testing of rgbdslam
#yyds干货盘点# 解决名企真题:最大差值
Make DIY welding smoke extractor with lighting
Errors made in the development of merging the quantity of data in the set according to attributes
ROS learning (24) plugin
The use of video in the wiper component causes full screen dislocation
ROS learning (25) rviz plugin
ROS learning (26) dynamic parameter configuration
Sensor: introduction of soil moisture sensor (xh-m214) and STM32 drive code
ROS learning (22) TF transformation