当前位置：网站首页>[point cloud processing paper crazy reading classic version 14] - dynamic graph CNN for learning on point clouds

[point cloud processing paper crazy reading classic version 14] - dynamic graph CNN for learning on point clouds

2022-07-03 09:08:00 【LingbinBu】

DGCNN：Dynamic Graph CNN for Learning on Point Clouds

Abstract
Method
experiment
expectation
New words

Abstract

background ： For many applications in computer graphics ,point clouds It is a flexible geometric representation , Usually most 3D Raw output of the acquisition device
problem ： Even though point clouds Of hand-designed Features have been proposed a long time ago , But recently in image It's very hot convolutional neural networks(CNNs) Indicates that CNN Applied to the point clouds The value of .Point clouds Lack of topology information , So we need to design a model that can recover topology information , So as to enrich point clouds Express the purpose of ability
Method ： stay CNN There is an embedded one called EdgeConv Module
- EdgeConv The module functions in graphs On , Dynamically evaluate each layer of the network graphs Calculate
- EdgeConv Modules can be exported , And can be embedded in any existing network
- EdgeConv The module considers local neighborhood information and global shape information
- EdgeConv It has sort invariance
- In feature space multi-layer systems affinity Collect the potential long-distance semantic features in the original embedding .
Code ：
- TensorFlow edition
- PyTorch edition

Method

In order to mine local geometry , Construct a local neighborhood graph, And apply convolution operation on the edge , Edges connect adjacent pairs of points .
Article graph Unfixed , Dynamically update at every layer of the network , in other words , One point $k N N$ The elements in the set change from layer to layer in the network , It's through embeddings Sequence calculation .
Proximity and input in feature space are different , This will lead to the nonlocal diffusion of information in the whole point cloud .

Edge Convolution

remember $\mathbf{X}=\left\{\mathbf{x}_{1}, \ldots, \mathbf{x}_{n}\right\} \subseteq \mathbb{R}^{F}$ Enter a point cloud for , among $n$ Is the number of points , $F$ It's the dimension of a point , In the simplest case , $F = 3$ , Every point includes 3D coordinate $\mathbf{x}_{i}=\left(x_{i}, y_{i}, z_{i}\right)$ , In other cases , Colors will also be included 、 Normal vector, etc , At other layers of the network , $F$ Represents the characteristic dimension of the point .

We calculate a directed graph $\mathcal{G}=(\mathcal{V}, \mathcal{E})$ , Represents the local point cloud structure , among $\mathcal{V}=\{1, \ldots, n\}$ , $\mathcal{E} \subseteq \mathcal{V} \times \mathcal{V}$ They are vertices and edges . In the simplest case , Construct a $\mathrm{X}$ Of KNN graph $\mathcal{G}$ . The graph Include self-loop, Each node points to itself . Define the edge feature as $\boldsymbol{e}_{i j}=h_{\Theta}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)$ , among $h_{\Theta}: \mathbb{R}^{F} \times$ $\mathbb{R}^{F} \rightarrow \mathbb{R}^{F^{\prime}}$ It's a nonlinear function , The learnable parameters are $\boldsymbol{\Theta}$ .

Last , By using channel Symmetric aggregation operation in units $\square$ (e.g., $\sum$ or max) Definition EdgeConv operation , Aggregate on edge features associated with all edges emitted from each vertex . In the $i$ Vertex EdgeConv The output is expressed as ：
$\mathbf{x}_{i}^{\prime}=\mathop {\square}\limits_{ {j:(i, j) \in \mathcal{E}}} h_{\Theta}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)$
among $\mathbf{x}_{i}$ yes central point, $\left\{\mathbf{x}_{j}:(i, j) \in \mathcal{E}\right\}$ yes $\mathbf{x}_{i}$ Of neigboured points. To make a long story short , Given with $n$ Point $F$ Dimensional point cloud ,EdgeConv Will produce an equal number of points $F^{\prime}$ Dimensional point cloud .

Choice of $h$ and $\square$

$h$ The choice of ：

Convolution ：
$x_{i m}^{\prime}=\sum_{j:(i, j) \in \mathcal{E}} \boldsymbol{\theta}_{m} \cdot \mathbf{x}_{j} .$
among $\Theta=\left(\theta_{1}, \ldots, \theta_{M}\right)$ Yes $M$ Different filters Weight coding . Every $\theta_{m}$ All have a relationship with $\mathbf{x}$ The same dimension , $\cdot$ Represent inner product .
PointNet type ：
$h_{\Theta}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=h_{\Theta}\left(\mathbf{x}_{i}\right),$
Only encode global shape information , Without considering the local neighborhood structure , Count as EdgeConv A special case of .
Atzmon Proposed PCNN type ：
$x_{i m}^{\prime}=\sum_{j \in \mathcal{V}}\left(h_{\boldsymbol{\theta}\left(\mathbf{x}_{j}\right)}\right) g\left(u\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)\right),$
among $g$ It's Gauss kernel, $u$ It is used to calculate the distance in European space .
PointNet++ type ：
$h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{j}-\mathbf{x}_{i}\right) .$
Encode only local information , Divide the whole shape into many blocks , Lost global structure information .
The symmetric edge function used in this paper ：
$h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=\bar{h}_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}-\mathbf{x}_{i}\right) .$
It combines the global shape structure ( By way of $\mathbf{x}_{i}$ Is determined by the coordinates of the center ), Local neighborhood information is also considered ( adopt $\mathbf{x}_{j}-\mathbf{x}_{i}$ obtain ).
Specially , It can also be expressed by the following formula EdgeConv The operation of ：
$e_{i j m}^{\prime}=\operatorname{ReLU}\left(\boldsymbol{\theta}_{m} \cdot\left(\mathbf{x}_{j}-\mathbf{x}_{i}\right)+\boldsymbol{\phi}_{m} \cdot \mathbf{x}_{i}\right),$
And then execute ：
$x_{i m}^{\prime}=\max _{j:(i, j) \in \mathcal{E}} e_{i j m}^{\prime},$
among $\Theta=\left(\theta_{1}, \ldots, \theta_{M}, \phi_{1}, \ldots, \phi_{M}\right)$ .

$\square$ Choose as max

Dynamic Graph Update

Our experiments show that , Use the nearest neighbor in the feature space generated by each layer to recalculate graph It is useful to . This is our method with fixed input graph Working on graph CNN A key difference . This dynamic graph Update is the name of our architecture DGCNN Why .

In each layer , It's different graph $\mathcal{G}^{(l)}=\left(\mathcal{V}^{(l)}, \mathcal{E}^{(l)}\right)$ , Among them the first $l$ The form of the edge of the layer is $\left(i, j_{i 1}\right), \ldots,\left(i, j_{i k_{l}}\right)$ , That is to say $\mathbf{x}_{j_{i 1}}^{(l)}, \ldots, x_{j_{i k_{l}}}^{(l)}$ It's distance $\mathbf{x}_{i}^{(l)}$ Current $k_{l}$ A little bit . Our network learns how to construct graph $\mathcal{G}$ , It is not fixed before the network starts to predict . At the time of implementation , Calculate the distance matrix in the distance space , Then take the nearest for each single point $k$ A little bit .

Properties

Permutation Invariance

Consider that the output of each layer is ：
$\mathbf{x}_{i}^{\prime}=\max _{j:(i, j) \in \mathcal{E}} h_{\boldsymbol{\Theta}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right),$
because max Is a symmetric function , So the output layer $\mathrm{x}_{i}^{\prime}$ Relative to input $\mathbf{x}_{j}$ It's sort invariant . The global maximum pool operation is also sort invariant for the characteristics of aggregation points .

Translation Invariance

Our operation has a part translation invariance nature , Because the edge function formula is not affected by translation , It can also be selectively affected translation influence . Consider at point $\mathbf{x}_{j}$ Sum point $\mathbf{x}_{i}$ Pan on , When translation $T$ when , Yes ：
$\begin{aligned} e_{i j m}^{\prime} &=\boldsymbol{\theta}_{m} \cdot\left(\mathbf{x}_{j}+T-\left(\mathbf{x}_{i}+T\right)\right)+\boldsymbol{\phi}_{m} \cdot\left(\mathbf{x}_{i}+T\right) \\ &=\boldsymbol{\theta}_{m} \cdot\left(\mathbf{x}_{j}-\mathbf{x}_{i}\right)+\boldsymbol{\phi}_{m} \cdot\left(\mathbf{x}_{i}+T\right) \end{aligned}$
If you make $\boldsymbol{\phi}_{m}=\mathbf{0}$ when , Only consider $\mathbf{x}_{j}-\mathbf{x}_{i}$ , Then the operation is completely translational . But the model will lose the acquisition of local information , So it's still part translation invariance.

experiment

Classification

Part Segmentation

Indoor Scene Segmentation

expectation

Improve the speed of the model by combining faster data formats , Use multivariate groups instead , Don't look for the relationship between point pairs
Design a non shared transformer The Internet , At every local patch They are all different , More flexibility