当前位置:网站首页>[paper reading notes] community oriented attributed network embedding

[paper reading notes] community oriented attributed network embedding

2020-11-10 07:59:00 Qinze

[ Paper reading notes ] Community-oriented attributed network embedding


The structure of this paper

  1. solve the problem
  2. Main contributions
  3. Algorithm principle
  4. reference

(1) solve the problem

Most of the existing methods only retain the local structure of nodes in the process of network representation , Ignore their community information and rich attribute information .


(2) Main contributions

Contribution 1: A novel representation method of attribute network is proposed (COANE), Introduce topic model LDA Generate node community information , Guide random walk to capture community information in the network .

Contribution 2: A community oriented random walk strategy is designed —— The edge of the community is introduced to adaptively control the range of random walk , this It solves the problem that the random walk method can't sample the effective walk sequence in dense network .


(3) Algorithm principle

Firstly, it briefly introduces the algorithm used to generate node community information LDA Model ( Please refer to other materials or the original paper for details ).LDA It's an iterative algorithm , Gibbs sampling is used to estimate the following two probability distributions in the form of statistical frequency ( The number of communities needs to be given in advance ).

  • Given the community c, Where nodes v In the community c The probability of

  • Given a sequence of nodes s, Choose a node from which to belong to the community c Probability )

It first gives the sequence s Every node in vi Randomly assign a community ci. Based on the sampling sequence , Every node is traversed , And its community is constantly updated based on the following probability , until LDA The model parameters converge .

COANE The algorithm includes the following three parts :

  1. Random walk based on edge generates node sequence .
  2. Based on the above sampling node sequence , Train two... With different parameters LDA Models are learned separately based on structure ( Short for LDA-S) And based on text properties ( Short for LDA-X) Community distribution of .
  3. adopt Skip-Gram Model to learn the vector representation of nodes .

Next we start from the algorithm pseudo code step by step analysis COANE Principle of algorithm .

COANE The pseudo code of the algorithm is shown in the figure below :

From the pseudo code above, I can see that the input of the algorithm is shown in the figure 、 Parameters of random walk 、 Number of communities (LDA Parameters of the topic model )、 The moment the edge appears Mm、 Edge expansion speed Ms. The output is the embedding matrix of the network .

So what is edge based random walk ? How to achieve it ? First , In the early days, we generated a sequence of nodes by random walk and input LDA-S Model update node ownership . After several iterations of node community ownership update , There will be margins between nodes belonging to different communities , As shown in the figure below . The gap between different communities is also becoming more and more obvious , That is, the membership degree of nodes to more likely communities will gradually increase , In the picture, the color is deepened .

You can find , After repeated training LDA The model can find the community structure more accurately . Therefore, based on LDA Estimated probability of community belonging , We can calculate the transition probability of two nodes in the random walk process as follows :

Mmax It's a by 0 It's going up to 1 Parameters of . When the maximum margin Mmax=0 When , Random walk based on margin is equivalent to traditional DeepWalk Random walk . With Mmax An increase in , The probability of node cross community access is reduced .Mmax It can be thought of as confidence in the division of communities , This confidence level is increasing , Because after constant training LDA The community structure discovered by the model is more and more obvious .

The above is random walk based on edge , The edge follows LDA The training level of the model is more and more obvious , The constraints of community on random walk are becoming stronger and stronger .

By pseudo code No 13,14 Line , Each random walk sequence is used to iteratively update two LDA Model .

Now? , The node sequence obtained by random walk based on edge is input into Skip-Gram model training , We can get node vector representation based on node topology and community structure . So how to fuse node attribute information ?

We mentioned a text attribute based model LDA Model (LDA The probability distribution model is the subject processing , Here we just need to sample the sequence of text ), Again , We do edge based random walks in the graph , Replace the node in the sequence with its corresponding text attribute , These texts are aggregated to produce a single document , Input to text-based LDA In the model ,. Final LDA The model can generate the topic probability distribution of the document , In this paper, the distribution is directly regarded as the representation vector of the text attributes of nodes , Finally, the fusion node topology can be obtained by merging it into the original structure based representation vector 、 The node representation of community information and attribute information .

The above is the whole content of the network representation learning algorithm .


(4) reference

Gao Y, Gong M, Xie Y, et al. Community-oriented attributed network embedding[J]. Knowledge-Based Systems, 2020, 193: 105418.


版权声明
本文为[Qinze]所创,转载请带上原文链接,感谢