当前位置:网站首页>[point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation
[point cloud processing paper crazy reading classic version 12] - foldingnet: point cloud auto encoder via deep grid deformation
2022-07-03 09:08:00 【LingbinBu】
FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation
Abstract
- background : Current deep learning networks deal directly with point clouds in the form of point sets , And in the classification and segmentation of point cloud processing tasks have achieved good results
- problem : The challenge of point cloud under unsupervised learning
- Method : This paper presents an end-to-end depth auto-decoder
- details :
- stay encoder End , Use one based on Graph The enhancement method of PointNet Local structure extraction ability
- stay decoder End , Using a folding The method will be a standard 2D The network becomes the surface shape of the point cloud , While obtaining the detailed structure, it will also lead to low reconstruction error
- advantage :
- Put forward decoder The number of parameters used is very small , But it can produce a more discriminative representation
- The proposed decoder It is a general structure , Theoretically, it can be based on 2D The mesh reconstructs the surface shape of any point cloud
- Code :

introduction
As shown in the table 1 Shown , stay decoder Stage , We want to impose a “ Virtual external force ”, take 2D grid deformation become 3D Surface shape of object , This deformation should be affected by the mesh neighborhood Interconnected influences and constraints .

FoldingNet Auto-encoder on Point Cloud
Auto-encoder Of pipeline Pictured 2 Shown .encoder The input is one n × 3 n \times 3 n×3 Matrix , Each row of the matrix contains 3D Location of point cloud ( x , y , z ) (x, y, z) (x,y,z). The output is a m × 3 m \times 3 m×3 Matrix , Represents the position of the point after reconstruction . Number of points after reconstruction m m m There is no need to match the number of original point clouds n n n identical . Suppose the input contains a set of points S S S, The reconstructed point set is S ^ \widehat{S} S, Then the reconstruction error can be used extended Chamfer distance Calculate :
d C H ( S , S ^ ) = max { 1 ∣ S ∣ ∑ x ∈ S min x ^ ∈ S ^ ∥ x − x ^ ∥ 2 , 1 ∣ S ^ ∣ ∑ x ^ ∈ S ^ min x ∈ S ∥ x ^ − x ∥ 2 } \begin{aligned} d_{C H}(S, \widehat{S})=\max &\left\{\frac{1}{|S|} \sum_{\mathbf{x} \in S} \min _{\widehat{\mathbf{x}} \in \widehat{S}}\|\mathbf{x}-\widehat{\mathbf{x}}\|_{2},\right. \left.\frac{1}{|\widehat{S}|} \sum_{\widehat{\mathbf{x}} \in \widehat{S}} \min _{\mathbf{x} \in S}\|\widehat{\mathbf{x}}-\mathbf{x}\|_{2}\right\} \end{aligned} dCH(S,S)=max{ ∣S∣1x∈S∑x∈Smin∥x−x∥2,∣S∣1x∈S∑x∈Smin∥x−x∥2⎭⎬⎫
min x ^ ∈ S ^ ∥ x − x ^ ∥ 2 \min _{\widehat{\mathbf{x}} \in \widehat{S}}\|\mathbf{x}-\widehat{\mathbf{x}}\|_{2} minx∈S∥x−x∥2 Force any in the original point cloud 3D spot x \mathbf{x} x Can be compared with that in the reconstructed point cloud 3D spot x ^ \widehat{\mathrm{x}} x Match , min x ∈ S ∥ x ^ − x ∥ 2 \min _{\mathbf{x} \in S}\|\widehat{\mathbf{x}}-\mathbf{x}\|_{2} minx∈S∥x−x∥2 Just the opposite of the above .max Operation force from S S S To S ^ \widehat{S} S And from the S ^ \widehat{S} S To S S S The distance must be small at the same time .encoder The representation of each input point cloud is calculated codeword, that decoder According to this codeword Rebuild point cloud . In this paper ,codeword The length of is set to 512.
Graph-based Encoder Architecture
be based on Graph Of encoder Follow and KCNet Same design , The network uses point cloud neighborhood graphs Conduct supervised learning .encoder There are many MLP Layers and are based on Graph Of max pooling Layer by layer .Graph Take the position of the input point set as the vertex , Re pass KNN obtain K-NNG. In the experiments , take K K K Set to 16.
- First , For each point v v v, Calculate its size as 3 × 3 3 \times 3 3×3 Local covariance matrix of , And vectorize it into 1 × 9 1 \times 9 1×9 Vector . v v v The local covariance of is utilized K-NNG in v v v( Include v v v) Of one-hop neighbors Calculated from the three-dimensional position of the point ;
- Will be the size of n × 3 n \times 3 n×3 The position matrix and size of are n × 9 n \times 9 n×9 The local covariance matrix of , The magnitude is n × 12 n \times 12 n×12 Matrix as input , Put it in 3 Layer of MLP in ;
- MLP The output of is put into two consecutive graph Layer , Each of these layers is used for the neighborhood of nodes max pooling;
To be specific , hypothesis K-NN Graph Have adjacency matrix A \mathbf{A} A And input matrix X \mathbf{X} X, Then the output matrix is :
Y = A max ( X ) K \mathbf{Y}=\mathbf{A}_{\max }(\mathbf{X}) \mathbf{K} Y=Amax(X)K
among K \mathbf{K} K Is the characteristic mapping matrix , matrix A max ( X ) \mathbf{A}_{\max }(\mathbf{X}) Amax(X) Of the ( i , j ) (i, j) (i,j) Elements are :
( A max ( X ) ) i j = ReLU ( max k ∈ N ( i ) x k j ) \left(\mathbf{A}_{\max }(\mathbf{X})\right)_{i j}=\operatorname{ReLU}\left(\max _{k \in \mathcal{N}(i)} x_{k j}\right) (Amax(X))ij=ReLU(k∈N(i)maxxkj)
Part of the above formula max pooling The operation is essentially based on graph Part of the structure signature, This signature Can represent local neighborhood ( After polymerization ) Topology information . Based on graph Of max pooling layer , The network spreads topology information to a larger area .
Folding-based Decoder Architecture
The proposed decoder Two consecutive 3 Layer of MLP, Make 2D Grid change “ Fit ” Enter the shape of the point cloud . Input codeword It is through the encoder Got .
- Will be codeword Input to decoder Middle front , To put the codeword Copy m m m Time , Become the size of m × 512 m \times 512 m×512 Matrix , Then and contain m m m The point coordinates of three grids are spliced , The result of splicing is a size of m × 514 m \times 514 m×514 Matrix .
- Take the spliced matrix as input , To deal with as a unit of action , Put one in 3 layer MLP, The output is of size m × 3 m \times 3 m×3 Matrix .
- Match the matrix output above with the repeated codeword Matrix splicing , Put another 3 Layer of MLP, Finally, the size is n × 3 n \times 3 n×3 Output after reconstruction .
In this paper , n n n take 2048, m m m The number after taking the square ,2025.
Definition
Will repeat codeword And the splicing of low dimensional grid points , One more point-wise MLP The operation of is called a folding.
folding Operation is essentially a general 2D To 3D Mapping . In order to intuitively understand why folding Operation is a general 2D To 3D Mapping , Recording matrix U \mathbf{U} U For input 2D Grid points , U \mathbf{U} U Each row of is a two-dimensional grid point ; remember U \mathbf{U} U Of the i i i Behavior u i \mathbf{u}_i ui, from encoder Obtained in codeword by θ \boldsymbol{\theta} θ. So after splicing , Input to MLP In matrix i i i Lines can be written as [ u i , θ ] \left[\mathbf{u}_{i}, \boldsymbol{\theta}\right] [ui,θ]. because MLP It is processing on each row of the input matrix in parallel , So the output matrix of the first i i i Lines can be written as f ( [ u i , θ ] ) f\left(\left[\mathbf{u}_{i}, \boldsymbol{\theta}\right]\right) f([ui,θ]), among f f f By MLP The function formed . This function can be regarded as a parameterized high-dimensional function , among codeword θ \boldsymbol{\theta} θ Is the parameter of the guiding function structure (folding operation). because MLP Strong ability in estimating nonlinear functions , So they can be in 2D Better on the grid folding operation. High dimensional codeword In essence, it stores a structural feature , force 2D Deform the mesh , bring folding operation More diverse .
The proposed decoder There are two folding operation. first folding operation Yes, it will 2D Grid from 2D Space transformation to 3D Space , the second folding operation Is in 3D Transform in space , See table 1.
The theoretical analysis
theory 1
The proposed encoder The structure is sort invariant , That is, if the rows of the input point cloud matrix can be sorted arbitrarily , Got codeword Still the same .
theory 2
There is a two-tier MLP, Able to use folding The operation will be a two-dimensional grid Reconstitute any point cloud .
experiment
Visualization of the Training Process

Point Cloud Interpolation

Point Cloud Interpolation


Transfer Classification Accuracy


Semi-supervised Learning:What Happens when Labeled Data are Rare

Effectiveness of the Folding-Based Decoder

New words
- downside n. shortcoming , deficiencies
边栏推荐
- 干货!零售业智能化管理会遇到哪些问题?看懂这篇文章就够了
- Methods of checking ports according to processes and checking processes according to ports
- Binary tree sorting (C language, int type)
- TP5 order multi condition sort
- First Servlet
- Find the combination number acwing 885 Find the combination number I
- The "booster" of traditional office mode, Building OA office system, was so simple!
- LeetCode 241. 为运算表达式设计优先级
- too many open files解决方案
- Arbre DP acwing 285. Un bal sans patron.
猜你喜欢
DOM 渲染系统(render mount patch)响应式系统
LeetCode 715. Range 模块
Six dimensional space (C language)
LeetCode 241. 为运算表达式设计优先级
SQL statement error of common bug caused by Excel cell content that is not paid attention to for a long time
PIC16F648A-E/SS PIC16 8位 微控制器,7KB(4Kx14)
Mortgage Calculator
LeetCode 532. K-diff number pairs in array
LeetCode 871. Minimum refueling times
AcWing 787. Merge sort (template)
随机推荐
AcWing 787. 归并排序(模板)
Common DOS commands
【点云处理之论文狂读经典版11】—— Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling
Slice and index of array with data type
状态压缩DP AcWing 91. 最短Hamilton路径
Parameters of convolutional neural network
22-06-27 Xian redis (01) commands for installing five common data types: redis and redis
Sword finger offer II 091 Paint the house
求组合数 AcWing 886. 求组合数 II
Vscode connect to remote server
Discussion on enterprise informatization construction
Methods of using arrays as function parameters in shell
LeetCode 241. Design priorities for operational expressions
【点云处理之论文狂读经典版9】—— Pointwise Convolutional Neural Networks
Complex character + number pyramid
LeetCode 324. 摆动排序 II
Mortgage Calculator
Facial expression recognition based on pytorch convolution -- graduation project
Convert video to GIF
LeetCode 513. Find the value in the lower left corner of the tree