当前位置：网站首页>[point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation

[point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation

2022-07-03 09:08:00 【LingbinBu】

FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation

Abstract
introduction
FoldingNet Auto-encoder on Point Cloud
- Graph-based Encoder Architecture
- Folding-based Decoder Architecture
The theoretical analysis
experiment
New words

Abstract

background ： Current deep learning networks deal directly with point clouds in the form of point sets , And in the classification and segmentation of point cloud processing tasks have achieved good results
problem ： The challenge of point cloud under unsupervised learning
Method ： This paper presents an end-to-end depth auto-decoder
details ：
1. stay encoder End , Use one based on Graph The enhancement method of PointNet Local structure extraction ability
2. stay decoder End , Using a folding The method will be a standard 2D The network becomes the surface shape of the point cloud , While obtaining the detailed structure, it will also lead to low reconstruction error
advantage ：
1. Put forward decoder The number of parameters used is very small , But it can produce a more discriminative representation
2. The proposed decoder It is a general structure , Theoretically, it can be based on 2D The mesh reconstructs the surface shape of any point cloud
Code ：
1. http://www.merl.com/research/license#FoldingNet
2. https://github.com/AnTao97/UnsupervisedPointCloudReconstruction PyTorch edition Point cloud unsupervised method comparison

introduction

As shown in the table 1 Shown , stay decoder Stage , We want to impose a “ Virtual external force ”, take 2D grid deformation become 3D Surface shape of object , This deformation should be affected by the mesh neighborhood Interconnected influences and constraints .

FoldingNet Auto-encoder on Point Cloud

Auto-encoder Of pipeline Pictured 2 Shown .encoder The input is one $\times 3$ Matrix , Each row of the matrix contains 3D Location of point cloud $(x, y, z)$ . The output is a $\times 3$ Matrix , Represents the position of the point after reconstruction . Number of points after reconstruction $m$ There is no need to match the number of original point clouds $n$ identical . Suppose the input contains a set of points $S$ , The reconstructed point set is $\widehat{S}$ , Then the reconstruction error can be used extended Chamfer distance Calculate ：
$\begin{aligned} d_{C H}(S, \widehat{S})=\max &\left\{\frac{1}{|S|} \sum_{\mathbf{x} \in S} \min _{\widehat{\mathbf{x}} \in \widehat{S}}\|\mathbf{x}-\widehat{\mathbf{x}}\|_{2},\right. \left.\frac{1}{|\widehat{S}|} \sum_{\widehat{\mathbf{x}} \in \widehat{S}} \min _{\mathbf{x} \in S}\|\widehat{\mathbf{x}}-\mathbf{x}\|_{2}\right\} \end{aligned}$
$\min _{\widehat{\mathbf{x}} \in \widehat{S}}\|\mathbf{x}-\widehat{\mathbf{x}}\|_{2}$ Force any in the original point cloud 3D spot $\mathbf{x}$ Can be compared with that in the reconstructed point cloud 3D spot $\widehat{\mathrm{x}}$ Match , $\min _{\mathbf{x} \in S}\|\widehat{\mathbf{x}}-\mathbf{x}\|_{2}$ Just the opposite of the above .max Operation force from $S$ To $\widehat{S}$ And from the $\widehat{S}$ To $S$ The distance must be small at the same time .encoder The representation of each input point cloud is calculated codeword, that decoder According to this codeword Rebuild point cloud . In this paper ,codeword The length of is set to 512.

Graph-based Encoder Architecture

be based on Graph Of encoder Follow and KCNet Same design , The network uses point cloud neighborhood graphs Conduct supervised learning .encoder There are many MLP Layers and are based on Graph Of max pooling Layer by layer .Graph Take the position of the input point set as the vertex , Re pass KNN obtain K-NNG. In the experiments , take $K$ Set to 16.

First , For each point $v$ , Calculate its size as $\times 3$ Local covariance matrix of , And vectorize it into $\times 9$ Vector . $v$ The local covariance of is utilized K-NNG in $v$ ( Include $v$ ) Of one-hop neighbors Calculated from the three-dimensional position of the point ;
Will be the size of $\times 3$ The position matrix and size of are $\times 9$ The local covariance matrix of , The magnitude is $\times 12$ Matrix as input , Put it in 3 Layer of MLP in ;
MLP The output of is put into two consecutive graph Layer , Each of these layers is used for the neighborhood of nodes max pooling;
To be specific , hypothesis K-NN Graph Have adjacency matrix $\mathbf{A}$ And input matrix $\mathbf{X}$ , Then the output matrix is ：
$\mathbf{Y}=\mathbf{A}_{\max }(\mathbf{X}) \mathbf{K}$
among $\mathbf{K}$ Is the characteristic mapping matrix , matrix $\mathbf{A}_{\max }(\mathbf{X})$ Of the $(i, j)$ Elements are ：
$\left(\mathbf{A}_{\max }(\mathbf{X})\right)_{i j}=\operatorname{ReLU}\left(\max _{k \in \mathcal{N}(i)} x_{k j}\right)$
Part of the above formula max pooling The operation is essentially based on graph Part of the structure signature, This signature Can represent local neighborhood ( After polymerization ) Topology information . Based on graph Of max pooling layer , The network spreads topology information to a larger area .

Folding-based Decoder Architecture

The proposed decoder Two consecutive 3 Layer of MLP, Make 2D Grid change “ Fit ” Enter the shape of the point cloud . Input codeword It is through the encoder Got .

Will be codeword Input to decoder Middle front , To put the codeword Copy $m$ Time , Become the size of $\times 512$ Matrix , Then and contain $m$ The point coordinates of three grids are spliced , The result of splicing is a size of $\times 514$ Matrix .
Take the spliced matrix as input , To deal with as a unit of action , Put one in 3 layer MLP, The output is of size $\times 3$ Matrix .
Match the matrix output above with the repeated codeword Matrix splicing , Put another 3 Layer of MLP, Finally, the size is $\times 3$ Output after reconstruction .

In this paper , $n$ take 2048, $m$ The number after taking the square ,2025.

Definition

Will repeat codeword And the splicing of low dimensional grid points , One more point-wise MLP The operation of is called a folding.

folding Operation is essentially a general 2D To 3D Mapping . In order to intuitively understand why folding Operation is a general 2D To 3D Mapping , Recording matrix $\mathbf{U}$ For input 2D Grid points , $\mathbf{U}$ Each row of is a two-dimensional grid point ; remember $\mathbf{U}$ Of the $i$ Behavior $\mathbf{u}_i$ , from encoder Obtained in codeword by $\boldsymbol{\theta}$ . So after splicing , Input to MLP In matrix $i$ Lines can be written as $\left[\mathbf{u}_{i}, \boldsymbol{\theta}\right]$ . because MLP It is processing on each row of the input matrix in parallel , So the output matrix of the first $i$ Lines can be written as $f\left(\left[\mathbf{u}_{i}, \boldsymbol{\theta}\right]\right)$ , among $f$ By MLP The function formed . This function can be regarded as a parameterized high-dimensional function , among codeword $\boldsymbol{\theta}$ Is the parameter of the guiding function structure （folding operation）. because MLP Strong ability in estimating nonlinear functions , So they can be in 2D Better on the grid folding operation. High dimensional codeword In essence, it stores a structural feature , force 2D Deform the mesh , bring folding operation More diverse .

The proposed decoder There are two folding operation. first folding operation Yes, it will 2D Grid from 2D Space transformation to 3D Space , the second folding operation Is in 3D Transform in space , See table 1.

The theoretical analysis

theory 1

The proposed encoder The structure is sort invariant , That is, if the rows of the input point cloud matrix can be sorted arbitrarily , Got codeword Still the same .

theory 2

There is a two-tier MLP, Able to use folding The operation will be a two-dimensional grid Reconstitute any point cloud .