当前位置:网站首页>[point cloud processing paper crazy reading classic version 14] - dynamic graph CNN for learning on point clouds
[point cloud processing paper crazy reading classic version 14] - dynamic graph CNN for learning on point clouds
2022-07-03 09:08:00 【LingbinBu】
DGCNN:Dynamic Graph CNN for Learning on Point Clouds
Abstract
- background : For many applications in computer graphics ,point clouds It is a flexible geometric representation , Usually most 3D Raw output of the acquisition device
- problem : Even though point clouds Of hand-designed Features have been proposed a long time ago , But recently in image It's very hot convolutional neural networks(CNNs) Indicates that CNN Applied to the point clouds The value of .Point clouds Lack of topology information , So we need to design a model that can recover topology information , So as to enrich point clouds Express the purpose of ability
- Method : stay CNN There is an embedded one called EdgeConv Module
- EdgeConv The module functions in graphs On , Dynamically evaluate each layer of the network graphs Calculate
- EdgeConv Modules can be exported , And can be embedded in any existing network
- EdgeConv The module considers local neighborhood information and global shape information
- EdgeConv It has sort invariance
- In feature space multi-layer systems affinity Collect the potential long-distance semantic features in the original embedding .
- Code :

Method
- In order to mine local geometry , Construct a local neighborhood graph, And apply convolution operation on the edge , Edges connect adjacent pairs of points .
- Article graph Unfixed , Dynamically update at every layer of the network , in other words , One point k N N kNN kNN The elements in the set change from layer to layer in the network , It's through embeddings Sequence calculation .
- Proximity and input in feature space are different , This will lead to the nonlocal diffusion of information in the whole point cloud .
Edge Convolution
remember X = { x 1 , … , x n } ⊆ R F \mathbf{X}=\left\{\mathbf{x}_{1}, \ldots, \mathbf{x}_{n}\right\} \subseteq \mathbb{R}^{F} X={ x1,…,xn}⊆RF Enter a point cloud for , among n n n Is the number of points , F F F It's the dimension of a point , In the simplest case , F = 3 F=3 F=3, Every point includes 3D coordinate x i = ( x i , y i , z i ) \mathbf{x}_{i}=\left(x_{i}, y_{i}, z_{i}\right) xi=(xi,yi,zi), In other cases , Colors will also be included 、 Normal vector, etc , At other layers of the network , F F F Represents the characteristic dimension of the point .
We calculate a directed graph G = ( V , E ) \mathcal{G}=(\mathcal{V}, \mathcal{E}) G=(V,E) , Represents the local point cloud structure , among V = { 1 , … , n } \mathcal{V}=\{1, \ldots, n\} V={ 1,…,n}, E ⊆ V × V \mathcal{E} \subseteq \mathcal{V} \times \mathcal{V} E⊆V×V They are vertices and edges . In the simplest case , Construct a X \mathrm{X} X Of KNN graph G \mathcal{G} G. The graph Include self-loop, Each node points to itself . Define the edge feature as e i j = h Θ ( x i , x j ) \boldsymbol{e}_{i j}=h_{\Theta}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) eij=hΘ(xi,xj), among h Θ : R F × h_{\Theta}: \mathbb{R}^{F} \times hΘ:RF× R F → R F ′ \mathbb{R}^{F} \rightarrow \mathbb{R}^{F^{\prime}} RF→RF′ It's a nonlinear function , The learnable parameters are Θ \boldsymbol{\Theta} Θ.
Last , By using channel Symmetric aggregation operation in units □ \square □ (e.g., ∑ \sum ∑ or max) Definition EdgeConv operation , Aggregate on edge features associated with all edges emitted from each vertex . In the i i i Vertex EdgeConv The output is expressed as :
x i ′ = □ j : ( i , j ) ∈ E h Θ ( x i , x j ) \mathbf{x}_{i}^{\prime}=\mathop {\square}\limits_{ {j:(i, j) \in \mathcal{E}}} h_{\Theta}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) xi′=j:(i,j)∈E□hΘ(xi,xj)
among x i \mathbf{x}_{i} xi yes central point, { x j : ( i , j ) ∈ E } \left\{\mathbf{x}_{j}:(i, j) \in \mathcal{E}\right\} { xj:(i,j)∈E} yes x i \mathbf{x}_{i} xi Of neigboured points. To make a long story short , Given with n n n Point F F F Dimensional point cloud ,EdgeConv Will produce an equal number of points F ′ F^{\prime} F′ Dimensional point cloud .

Choice of h h h and □ \square □
h h h The choice of :
- Convolution :
x i m ′ = ∑ j : ( i , j ) ∈ E θ m ⋅ x j . x_{i m}^{\prime}=\sum_{j:(i, j) \in \mathcal{E}} \boldsymbol{\theta}_{m} \cdot \mathbf{x}_{j} . xim′=j:(i,j)∈E∑θm⋅xj.
among Θ = ( θ 1 , … , θ M ) \Theta=\left(\theta_{1}, \ldots, \theta_{M}\right) Θ=(θ1,…,θM) Yes M M M Different filters Weight coding . Every θ m \theta_{m} θm All have a relationship with x \mathbf{x} x The same dimension , ⋅ \cdot ⋅ Represent inner product . - PointNet type :
h Θ ( x i , x j ) = h Θ ( x i ) , h_{\Theta}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=h_{\Theta}\left(\mathbf{x}_{i}\right), hΘ(xi,xj)=hΘ(xi),
Only encode global shape information , Without considering the local neighborhood structure , Count as EdgeConv A special case of . - Atzmon Proposed PCNN type :
x i m ′ = ∑ j ∈ V ( h θ ( x j ) ) g ( u ( x i , x j ) ) , x_{i m}^{\prime}=\sum_{j \in \mathcal{V}}\left(h_{\boldsymbol{\theta}\left(\mathbf{x}_{j}\right)}\right) g\left(u\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)\right), xim′=j∈V∑(hθ(xj))g(u(xi,xj)),
among g g g It's Gauss kernel, u u u It is used to calculate the distance in European space . - PointNet++ type :
h Θ ( x i , x j ) = h Θ ( x j − x i ) . h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{j}-\mathbf{x}_{i}\right) . hΘ(xi,xj)=hΘ(xj−xi).
Encode only local information , Divide the whole shape into many blocks , Lost global structure information . - The symmetric edge function used in this paper :
h Θ ( x i , x j ) = h ˉ Θ ( x i , x j − x i ) . h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=\bar{h}_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}-\mathbf{x}_{i}\right) . hΘ(xi,xj)=hˉΘ(xi,xj−xi).
It combines the global shape structure ( By way of x i \mathbf{x}_{i} xi Is determined by the coordinates of the center ), Local neighborhood information is also considered ( adopt x j − x i \mathbf{x}_{j}-\mathbf{x}_{i} xj−xi obtain ).
Specially , It can also be expressed by the following formula EdgeConv The operation of :
e i j m ′ = ReLU ( θ m ⋅ ( x j − x i ) + ϕ m ⋅ x i ) , e_{i j m}^{\prime}=\operatorname{ReLU}\left(\boldsymbol{\theta}_{m} \cdot\left(\mathbf{x}_{j}-\mathbf{x}_{i}\right)+\boldsymbol{\phi}_{m} \cdot \mathbf{x}_{i}\right), eijm′=ReLU(θm⋅(xj−xi)+ϕm⋅xi),
And then execute :
x i m ′ = max j : ( i , j ) ∈ E e i j m ′ , x_{i m}^{\prime}=\max _{j:(i, j) \in \mathcal{E}} e_{i j m}^{\prime}, xim′=j:(i,j)∈Emaxeijm′,
among Θ = ( θ 1 , … , θ M , ϕ 1 , … , ϕ M ) \Theta=\left(\theta_{1}, \ldots, \theta_{M}, \phi_{1}, \ldots, \phi_{M}\right) Θ=(θ1,…,θM,ϕ1,…,ϕM).
□ \square □ Choose as max
Dynamic Graph Update
Our experiments show that , Use the nearest neighbor in the feature space generated by each layer to recalculate graph It is useful to . This is our method with fixed input graph Working on graph CNN A key difference . This dynamic graph Update is the name of our architecture DGCNN Why .
In each layer , It's different graph G ( l ) = ( V ( l ) , E ( l ) ) \mathcal{G}^{(l)}=\left(\mathcal{V}^{(l)}, \mathcal{E}^{(l)}\right) G(l)=(V(l),E(l)), Among them the first l l l The form of the edge of the layer is ( i , j i 1 ) , … , ( i , j i k l ) \left(i, j_{i 1}\right), \ldots,\left(i, j_{i k_{l}}\right) (i,ji1),…,(i,jikl), That is to say x j i 1 ( l ) , … , x j i k l ( l ) \mathbf{x}_{j_{i 1}}^{(l)}, \ldots, x_{j_{i k_{l}}}^{(l)} xji1(l),…,xjikl(l) It's distance x i ( l ) \mathbf{x}_{i}^{(l)} xi(l) Current k l k_{l} kl A little bit . Our network learns how to construct graph G \mathcal{G} G, It is not fixed before the network starts to predict . At the time of implementation , Calculate the distance matrix in the distance space , Then take the nearest for each single point k k k A little bit .
Properties
Permutation Invariance
Consider that the output of each layer is :
x i ′ = max j : ( i , j ) ∈ E h Θ ( x i , x j ) , \mathbf{x}_{i}^{\prime}=\max _{j:(i, j) \in \mathcal{E}} h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right), xi′=j:(i,j)∈EmaxhΘ(xi,xj),
because max Is a symmetric function , So the output layer x i ′ \mathrm{x}_{i}^{\prime} xi′ Relative to input x j \mathbf{x}_{j} xj It's sort invariant . The global maximum pool operation is also sort invariant for the characteristics of aggregation points .
Translation Invariance
Our operation has a part translation invariance nature , Because the edge function formula is not affected by translation , It can also be selectively affected translation influence . Consider at point x j \mathbf{x}_{j} xj Sum point x i \mathbf{x}_{i} xi Pan on , When translation T T T when , Yes :
e i j m ′ = θ m ⋅ ( x j + T − ( x i + T ) ) + ϕ m ⋅ ( x i + T ) = θ m ⋅ ( x j − x i ) + ϕ m ⋅ ( x i + T ) \begin{aligned} e_{i j m}^{\prime} &=\boldsymbol{\theta}_{m} \cdot\left(\mathbf{x}_{j}+T-\left(\mathbf{x}_{i}+T\right)\right)+\boldsymbol{\phi}_{m} \cdot\left(\mathbf{x}_{i}+T\right) \\ &=\boldsymbol{\theta}_{m} \cdot\left(\mathbf{x}_{j}-\mathbf{x}_{i}\right)+\boldsymbol{\phi}_{m} \cdot\left(\mathbf{x}_{i}+T\right) \end{aligned} eijm′=θm⋅(xj+T−(xi+T))+ϕm⋅(xi+T)=θm⋅(xj−xi)+ϕm⋅(xi+T)
If you make ϕ m = 0 \boldsymbol{\phi}_{m}=\mathbf{0} ϕm=0 when , Only consider x j − x i \mathbf{x}_{j}-\mathbf{x}_{i} xj−xi, Then the operation is completely translational . But the model will lose the acquisition of local information , So it's still part translation invariance.


experiment
Classification





Part Segmentation





Indoor Scene Segmentation



expectation
- Improve the speed of the model by combining faster data formats , Use multivariate groups instead , Don't look for the relationship between point pairs
- Design a non shared transformer The Internet , At every local patch They are all different , More flexibility
New words
- dub v go by the name of …
- arguably adv Can be said to be
- emanate v issue
边栏推荐
- Binary tree sorting (C language, char type)
- Use the interface colmap interface of openmvs to generate the pose file required by openmvs mvs
- Wonderful review | i/o extended 2022 activity dry goods sharing
- Digital statistics DP acwing 338 Counting problem
- LeetCode 57. Insert interval
- Mortgage Calculator
- I made mistakes that junior programmers all over the world would make, and I also made mistakes that I shouldn't have made
- AcWing 786. Number k
- DOM render mount patch responsive system
- Parameters of convolutional neural network
猜你喜欢
too many open files解决方案
Digital statistics DP acwing 338 Counting problem
Facial expression recognition based on pytorch convolution -- graduation project
Slice and index of array with data type
Complex character + number pyramid
高斯消元 AcWing 883. 高斯消元解线性方程组
剑指 Offer II 091. 粉刷房子
Binary tree sorting (C language, int type)
Binary tree sorting (C language, char type)
【点云处理之论文狂读经典版13】—— Adaptive Graph Convolutional Neural Networks
随机推荐
Parameters of convolutional neural network
浅谈企业信息化建设
LeetCode 532. K-diff number pairs in array
Low code momentum, this information management system development artifact, you deserve it!
我们有个共同的名字,XX工
Memory search acwing 901 skiing
高斯消元 AcWing 883. 高斯消元解线性方程组
Sending and receiving of request parameters
On a un nom en commun, maître XX.
AcWing 787. 归并排序(模板)
DOM 渲染系统(render mount patch)响应式系统
LeetCode 871. Minimum refueling times
How to check whether the disk is in guid format (GPT) or MBR format? Judge whether UEFI mode starts or legacy mode starts?
网络安全必会的基础知识
Binary tree sorting (C language, int type)
php public private protected
LeetCode 30. Concatenate substrings of all words
数字化管理中台+低代码,JNPF开启企业数字化转型的新引擎
Gaussian elimination acwing 883 Gauss elimination for solving linear equations
Vs2019 configuration opencv3 detailed graphic tutorial and implementation of test code