当前位置:网站首页>[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition
[point cloud processing paper crazy reading frontier version 10] - mvtn: multi view transformation network for 3D shape recognition
2022-07-03 09:09:00 【LingbinBu】
MVTN: Multi-View Transformation Network for 3D Shape Recognition
Abstract
- problem : Among many point cloud processing methods ,Multi-view projection The perspective of methods is often set heuristically or the same for all shapes .
- Method : Put forward a method , Learn how to better set these perspectives .
- details : Introduced Multi-View Transformation Network (MVTN), Used to find for 3D The best angle of view for shape recognition , The design of the whole network is derivable .MVTN It can be trained end-to-end , And use it with any multi perspective network for 3D Shape recognition . This article will MVTN Integrate with a new adaptive multi perspective network , The network can not only handle 3D mesh, You can also handle point clouds .
- Code :https://github.com/ajhamdi/MVTN Pytorch edition

Related work
- MVTN Learn the spatial transformation of input data , No extra supervision is used in the learning process , Nor adjust the learning process .
Method
Overview of Multi-View 3D Recognition
The training of multi view network can be expressed as :
arg min θ C ∑ n N L ( C ( X n ) , y n ) = arg min θ C ∑ n N L ( C ( R ( S n , u 0 ) ) , y n ) \begin{aligned} & \underset{\boldsymbol{\theta}_{\mathbf{C}}}{\arg \min } \sum_{n}^{N} L\left(\mathbf{C}\left(\mathbf{X}_{n}\right), y_{n}\right) \\ =& \underset{\boldsymbol{\theta}_{\mathbf{C}}}{\arg \min } \sum_{n}^{N} L\left(\mathbf{C}\left(\mathbf{R}\left(\mathbf{S}_{n}, \mathbf{u}_{0}\right)\right), y_{n}\right) \end{aligned} =θCargminn∑NL(C(Xn),yn)θCargminn∑NL(C(R(Sn,u0)),yn)
among L L L Is the loss function of a specific task , N N N It's a dataset 3D Number of shapes , y n y_{n} yn It's No n n n individual 3D shape S n \mathbf{S}_{n} Sn Of label. u 0 ∈ R τ \mathbf{u}_{0} \in \mathbb{R}^{\tau} u0∈Rτ Of the entire dataset τ \tau τ Set of scene parameters , These parameters represent the properties that affect the rendered image , Including viewpoint 、 The light 、 Color and background . R \mathbf{R} R It's the renderer , In shape S n \mathbf{S}_{n} Sn And parameters u 0 \mathbf{u}_{0} u0 As input , Get each shape M M M Multiple view images X n \mathbf{X}_{n} Xn. stay MVCNN in , C = MLP ( max i f ( x i ) ) \mathbf{C}=\operatorname{MLP}\left(\max _{i} \mathbf{f}\left(\mathbf{x}_{i}\right)\right) C=MLP(maxif(xi)), f : R h × w × c → R d \mathbf{f}: \mathbb{R}^{h \times w \times c} \rightarrow \mathbb{R}^{d} f:Rh×w×c→Rd It's a 2D CNN backbone; stay ViewGCN in , C = MLP ( c a t G C N ( f ( x i ) ) ) \mathbf{C}=\operatorname{MLP}\left(\right. cat \left._{\mathrm{GCN}}\left(\mathbf{f}\left(\mathbf{x}_{i}\right)\right)\right) C=MLP(catGCN(f(xi))), c a t G C N cat _{\mathrm{GCN}} catGCN It is the aggregation of view features learned from graph convolution network . θ C \boldsymbol{\theta}_{\mathbf{C}} θC It's a multi view network C \mathbf{C} C Parameters of . In the experimental part , Scene parameters u \mathbf{u} u Expressed as the camera angle pointing to the center of the target azimuth (azimuth) Elevation angle (elevation) angles, therefore τ = 2 M \tau=2M τ=2M

Multi-View Transformation Network (MVTN)

Previous multi view methods are based on images X \mathbf{X} X As 3D Unique representation of shape , among X \mathbf{X} X Is to use fixed scene parameters u 0 \mathbf{u}_0 u0 Got . contrary , This paper considers a more general case , take u \mathbf{u} u Set the boundary to ± u bound \pm \mathbf{u}_{\text {bound }} ±ubound Variables in , among u bound \mathbf{u}_{\text {bound }} ubound Positive number , The allowable range of scene parameters is defined . Put the... Of each azimuth and elevation u bound \mathbf{u}_{\text {bound }} ubound Set as 18 0 ∘ 180^{\circ} 180∘ and 9 0 ∘ 90^{\circ} 90∘.
Differentiable Renderer
Renderers R \mathbf{R} R With 3D shape S \mathbf{S} S(mesh or point cloud) And scene parameters u \mathbf{u} u As input , The output is corresponding to M M M Images { x i } i = 1 M \left\{\mathbf{x}_{i}\right\}_{i=1}^{M} { xi}i=1M. because R \mathbf{R} R Derivable , gradient ∂ x i ∂ u \frac{\partial \mathbf{x}_{i}}{\partial \mathbf{u}} ∂u∂xi It can be back propagated from each image to the whole scene parameters , Therefore, we can construct an end-to-end learning framework .
When S \mathbf{S} S Expressed as 3D mesh when , R \mathbf{R} R There are two components :rasterizer and shader. First , At a given camera angle of view and will face After assigning to pixels ,rasterizer take mesh Transform from the world coordinate system to the view coordinate system . then shader according to face The assignment of creates multiple values for each pixel , And fuse these values .
When S \mathbf{S} S When expressed as point cloud , R \mathbf{R} R have access to alpha-blending mechanism.

View-Points Conditioned on 3D Shape
Through the study Multi-View Transformation Network (MVTN) G ∈ R P × 3 → R τ \mathbf{G} \in \mathbb{R}^{P \times 3} \rightarrow \mathbb{R}^{\tau} G∈RP×3→Rτ And parameters θ G \boldsymbol{\theta}_{\mathbf{G}} θG, take u \mathbf{u} u Designed to 3D A function of shape , among P P P From the shape S \mathbf{S} S Number of sampling points .MVTN The training of can be expressed as :
arg min θ C , θ G ∑ n N L ( C ( R ( S n , u n ) ) , y n ) s. t. u n = u bound ⋅ tanh ( G ( S n ) ) \begin{aligned} \underset{\boldsymbol{\theta}_{\mathbf{C}}, \boldsymbol{\theta}_{\mathrm{G}}}{\arg \min } & \sum_{n}^{N} L\left(\mathbf{C}\left(\mathbf{R}\left(\mathbf{S}_{n}, \mathbf{u}_{n}\right)\right), y_{n}\right) \text { s. t. } \quad \mathbf{u}_{n}=\mathbf{u}_{\text {bound }} \cdot \tanh \left(\mathbf{G}\left(\mathbf{S}_{n}\right)\right) \end{aligned} θC,θGargminn∑NL(C(R(Sn,un)),yn) s. t. un=ubound ⋅tanh(G(Sn))
among G \mathbf{G} G Yes 3D Shape coding , Predict task specific multiview Networks C \mathbf{C} C Optimal viewpoint of . because G \mathbf{G} G The goal of is only to predict the viewpoint, so G \mathbf{G} G Its structure is very simple , And very light . meanwhile , stay G \mathbf{G} G A simple point encoder is also used in ( such as PointNet Medium shared MLP), Used to process from S \mathbf{S} S Got P P P A little bit , And the generated dimension is b b b Of coarse Shape features . then shallow MLP From this global shape feature, the scene parameters are regressed u n \mathbf{u}_n un, In order to predict parameters u \mathbf{u} u The value of is forced into ± u bound \pm \mathbf{u}_{\text {bound }} ±ubound Within the scope of , Used tanh Function will u \mathbf{u} u Zoom to ± u bound \pm \mathbf{u}_{\text {bound }} ±ubound Inside .
MVTN for 3D Shape Classification
In order to MVTN Training , be used for 3D Shape classification , We define a cross entropy loss function , But other loss functions and regular terms can also be used . Multi perspective network ( C \mathbf{C} C) and MVTN( G \mathbf{G} G) Use the same loss function to train together . The advantage of our network structure is that it can handle 3D Point cloud . When S \mathbf{S} S When it is a group of point clouds , Simply put R \mathbf{R} R Defined as a steerable place cloud renderer .
MVTN for 3D Shape Retrieval
We consider the C \mathbf{C} C The feature representation of the last layer in front of the classifier , Use LFDA reduction Project these features to other spaces , And take the processed features as signature Describe a shape . In the test phase , shape signature It is used to retrieve the most similar shape in the test set .
experiment
3D Shape Classification


3D Shape Retrieval


Rotation Robustness

Occlusion Robustness


Ablation Study
Number of Views

Choice of Backbone and Point Encoders

Choice of Multi-View Network

Time and Memory Requirements

边栏推荐
- 拯救剧荒,程序员最爱看的高分美剧TOP10
- 22-06-28 Xi'an redis (02) persistence mechanism, entry, transaction control, master-slave replication mechanism
- How to check whether the disk is in guid format (GPT) or MBR format? Judge whether UEFI mode starts or legacy mode starts?
- Digital management medium + low code, jnpf opens a new engine for enterprise digital transformation
- AcWing 785. 快速排序(模板)
- We have a common name, XX Gong
- Methods of checking ports according to processes and checking processes according to ports
- 低代码前景可期,JNPF灵活易用,用智能定义新型办公模式
- Methods of using arrays as function parameters in shell
- Find the combination number acwing 885 Find the combination number I
猜你喜欢

Notes and bugs generated during the use of h:i:s and y-m-d

Facial expression recognition based on pytorch convolution -- graduation project

浅谈企业信息化建设

How to place the parameters of the controller in the view after encountering the input textarea tag in the TP framework

AcWing 787. Merge sort (template)

【点云处理之论文狂读经典版8】—— O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

22-06-28 Xi'an redis (02) persistence mechanism, entry, transaction control, master-slave replication mechanism

LeetCode 57. Insert interval

State compression DP acwing 291 Mondrian's dream

State compression DP acwing 91 Shortest Hamilton path
随机推荐
【点云处理之论文狂读经典版7】—— Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
Vscode connect to remote server
拯救剧荒,程序员最爱看的高分美剧TOP10
The "booster" of traditional office mode, Building OA office system, was so simple!
网络安全必会的基础知识
TP5 order multi condition sort
How to delete CSDN after sending a wrong blog? How to operate quickly
Convert video to GIF
<, < <,>, > > Introduction in shell
LeetCode 508. The most frequent subtree elements and
LeetCode 535. Encryption and decryption of tinyurl
Arbre DP acwing 285. Un bal sans patron.
高斯消元 AcWing 883. 高斯消元解线性方程组
How to use Jupiter notebook
Education informatization has stepped into 2.0. How can jnpf help teachers reduce their burden and improve efficiency?
Really explain the five data structures of redis
Phpstudy 80 port occupied W10 system
Divide candy (circular queue)
数位统计DP AcWing 338. 计数问题
DOM 渲染系统(render mount patch)响应式系统