当前位置:网站首页>Paper notes: graph neural network gat
Paper notes: graph neural network gat
2022-07-06 02:14:00 【Min fan】
Abstract : Share your understanding of the paper . See the original Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, Graph attention networks, ICLR 2018, 1–12. Can be in ArXiv: 1710.10903v3 download . It's completely difficult to estimate the influence !
1. Contribution of thesis
- Overcome the shortcomings of the existing methods of graph convolution .
- No time-consuming matrix operation ( Such as inverse ).
- There is no need to predict the structure of the graph .
- Applicable to inductive and deductive problems .
2. Basic ideas
Use neighbor information , Map the original attributes of the nodes in the graph to a new space , To support later learning tasks .
This idea may be common to different graph Neural Networks .
3. programme
Symbol | meaning | remarks |
---|---|---|
N N N | Number of nodes | |
F F F | Original feature number | |
F ′ F' F′ | Original feature number | In the example 4 |
h \mathbf{h} h | Node feature set | { h → 1 , … , h → N } \{\overrightarrow{h}_1,\dots, \overrightarrow{h}_N \} { h1,…,hN} |
h → i \overrightarrow{h}_i hi | The first i i i Characteristics of nodes | Belong to space R F \mathbb{R}^F RF |
h ′ \mathbf{h}' h′ | Node new feature set | { h → 1 ′ , … , h → N ′ } \{\overrightarrow{h}'_1,\dots, \overrightarrow{h}'_N \} { h1′,…,hN′} |
h → i ′ \overrightarrow{h}'_i hi′ | The first i i i New features of nodes | Belong to space R F ′ \mathbb{R}^{F'} RF′ |
W \mathbf{W} W | Characteristic mapping matrix | Belong to R F × F ′ \mathbb{R}^{F \times F'} RF×F′, All nodes share |
N i \mathcal{N}_i Ni | node i i i The neighborhood set of | Include i i i own , In the example, the cardinality is 6 |
a → \overrightarrow{\mathbf{a}} a | Feature weight vector | Belong to R 2 F ′ \mathbb{R}^{2F'} R2F′, All nodes share , Corresponding to single-layer network |
α i j \alpha_{ij} αij | node j j j Yes i i i Influence | The sum of the influences of all neighbor nodes is 1 |
α → i j \overrightarrow{\alpha}_{ij} αij | node j j j Yes i i i Influence vector of | The length is K K K, Corresponding to the bull |
Map node features to the new space , Using the attention mechanism a a a Calculate the relationship between nodes
e i j = a ( W h → i , W h → j ) (1) e_{ij} = a(\mathbf{W}\overrightarrow{h}_i, \mathbf{W}\overrightarrow{h}_j) \tag{1} eij=a(Whi,Whj)(1)
Here only if j j j yes i i i When you are a neighbor on the network , Only calculated e i j e_{ij} eij.
Carry it on softmax, Make nodes i i i The corresponding weight sum is 1.
α i j = s o f t m a x j ( e i j ) = exp ( e i j ) ∑ k ∈ N i exp ( e i k ) (2) \alpha_{ij} = \mathrm{softmax}_j(e_{ij}) = \frac{\exp(e_{ij})}{\sum_{k \in \mathcal{N}_i} \exp(e_{ik})}\tag{2} αij=softmaxj(eij)=∑k∈Niexp(eik)exp(eij)(2)
because a a a The length will be 2 F ′ 2F' 2F′ The column vector of is converted to a scalar , It can be written as a line vector of the same length a → T \overrightarrow{\mathbf{a}}^{\mathrm{T}} aT. Plus an activation function , It can be realized with a single-layer neural network .
α i j = exp ( L e a k y R e L u ( a → T [ W h → i ∥ W h → j ] ) ) ∑ k ∈ N i exp ( L e a k y R e L u ( a → T [ W h → i ∥ W h → k ] ) ) (2) \alpha_{ij} = \frac{\exp(\mathrm{LeakyReLu}(\overrightarrow{\mathbf{a}}^{\mathrm{T}}[\mathbf{W}\overrightarrow{h}_i \| \mathbf{W}\overrightarrow{h}_j]))}{\sum_{k \in \mathcal{N}_i} \exp(\mathrm{LeakyReLu}(\overrightarrow{\mathbf{a}}^{\mathrm{T}}[\mathbf{W}\overrightarrow{h}_i \| \mathbf{W}\overrightarrow{h}_k]))}\tag{2} αij=∑k∈Niexp(LeakyReLu(aT[Whi∥Whk]))exp(LeakyReLu(aT[Whi∥Whj]))(2)
chart 1. GAT Core program . Left : F ′ = 4 F' = 4 F′=4 When , from W \mathbf{W} W The new space mapped to is 4 dimension . Corresponding 2 F ′ = 8 2F' = 8 2F′=8 dimension . vector a → \overrightarrow{\mathbf{a}} a Shared by all nodes . Right : K = 3 K = 3 K=3 Head .
3.1 Scheme 1 : Single head
h → i ′ = σ ( ∑ j ∈ i α i j W h → j ) (4) \overrightarrow{h}'_i = \sigma\left(\sum_{j \in \mathcal{i}} \alpha_{ij} \mathbf{W} \overrightarrow{h}_j\right)\tag{4} hi′=σ(j∈i∑αijWhj)(4)
All neighbor nodes are mapped to the new space first ( Such as 4 dimension ), Then the weighted sum is calculated according to its influence , And use sigmoid Isononlinear activation function , What you finally get is 4 Dimension vector .
3.2 Option two : Multi head connection
h → i ′ = ∥ k = 1 K σ ( ∑ j ∈ N i α i j k W k h → j ) (5) \overrightarrow{h}'_i = \|_{k = 1}^K \sigma\left(\sum_{j \in \mathcal{N}_i} \alpha^k_{ij} \mathbf{W}^k \overrightarrow{h}_j\right)\tag{5} hi′=∥k=1Kσ⎝⎛j∈Ni∑αijkWkhj⎠⎞(5)
K K K Get the corresponding new vectors respectively , chart 1 The right shows 3 Head , So the last vector is 3 × 4 = 12 3 \times 4 = 12 3×4=12 dimension .
3.3 Option three : Long average
h → i ′ = σ ( 1 K ∑ k = 1 K ∑ j ∈ N i α i j k W k h → j ) (5) \overrightarrow{h}'_i = \sigma \left(\frac{1}{K} \sum_{k = 1}^K \sum_{j \in \mathcal{N}_i} \alpha^k_{ij} \mathbf{W}^k \overrightarrow{h}_j\right)\tag{5} hi′=σ⎝⎛K1k=1∑Kj∈Ni∑αijkWkhj⎠⎞(5)
Just average , The last vector is 4 4 4 dimension .
4. doubt
problem : there W \mathbf{W} W And a → \overrightarrow{\mathbf{a}} a How to learn ?
guess : From related work , That is, the necessary knowledge is obtained in the graph neural network . This paper just wants to describe different core technologies .
If the output of this network is used as the input of other networks ( The final output is class labels, etc ), It is possible to learn accordingly .
Tang Wentao's explanation : In essence, it is equivalent to matrix multiplication ( Linear regression ), You can see from the code of the paper : In the training phase , The whole training set is entered ( Characteristic matrix and adjacency matrix of samples ), adopt W \mathbf{W} W and a → \overrightarrow{\mathbf{a}} a Get the prediction label of the training set ( First, get the self attention weight of each sample for all samples , Then according to the adjacency matrix mask, Then normalize the weight as a layer of self attention ), Then proceed loss Calculation and dissemination of .problem : Why use when calculating influence LeakyReLU, When calculating the final eigenvector sigmoid?
Force to explain : The former is only different from the latter ( It's not necessary ), The latter is to change linearity ( It is necessary to ).
Tang Wentao's explanation : Calculate influence using LeakyReLU: Pay more attention to the neighbor nodes that are more positively related to the target node .
The final eigenvector uses sigmoid It should be to prevent the value from being too large , Affect the next level of learning , Because the self attention mechanism is relatively unstable ( From my previous experiments ), High requirements for the range and density of values ( Small scope :0-1 And so on , More dense ).
Besides , stay GAT The source code given in the paper can be seen , The author uses only two layers of self attention network for all data sets , also dropout All set to 0.5-0.8, It can be seen that it is easier to over fit .
5. Summary
- utilize W \mathbf{W} W Linear mapping to new space .
- utilize a → \overrightarrow{\mathbf{a}} a Calculate the influence of each neighbor α i j \alpha_{ij} αij. a → \overrightarrow{\mathbf{a}} a Only for the corresponding attribute , Not affected by neighbor number . α i j \alpha_{ij} αij The calculation of involves LeakyReLU Use of activation functions .
- Use bulls to increase stability .
- Calculating mean 、 Using nonlinear function activation will not change the dimension of the vector .
边栏推荐
- Global and Chinese market of wheelchair climbing machines 2022-2028: Research Report on technology, participants, trends, market size and share
- How to use C to copy files on UNIX- How can I copy a file on Unix using C?
- MySQL lethal serial question 1 -- are you familiar with MySQL transactions?
- Global and Chinese markets for single beam side scan sonar 2022-2028: Research Report on technology, participants, trends, market size and share
- Computer graduation design PHP enterprise staff training management system
- Leetcode3, implémenter strstr ()
- selenium 元素定位(2)
- Global and Chinese markets hitting traffic doors 2022-2028: Research Report on technology, participants, trends, market size and share
- 更改对象属性的方法
- Minecraft 1.16.5 生化8 模组 2.0版本 故事书+更多枪械
猜你喜欢
[solution] add multiple directories in different parts of the same word document
selenium 等待方式
Redis-列表
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
2022年PMP项目管理考试敏捷知识点(8)
How to improve the level of pinduoduo store? Dianyingtong came to tell you
Open source | Ctrip ticket BDD UI testing framework flybirds
Online reservation system of sports venues based on PHP
2022 edition illustrated network pdf
How does redis implement multiple zones?
随机推荐
Flutter Doctor:Xcode 安装不完整
Publish your own toolkit notes using NPM
Redis list
Unity learning notes -- 2D one-way platform production method
[robot hand eye calibration] eye in hand
02.Go语言开发环境配置
插卡4G工业路由器充电桩智能柜专网视频监控4G转以太网转WiFi有线网速测试 软硬件定制
Bidding promotion process
PHP campus financial management system for computer graduation design
Redis守护进程无法停止解决方案
500 lines of code to understand the principle of mecached cache client driver
leetcode3、實現 strStr()
[coppeliasim] 6-DOF path planning
Card 4G industrial router charging pile intelligent cabinet private network video monitoring 4G to Ethernet to WiFi wired network speed test software and hardware customization
RDD partition rules of spark
MySQL index
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
vs code保存时 出现两次格式化
Use Scrollview and tabhost to realize vertical scrollbars and tabs
【coppeliasim】高效传送带