当前位置：网站首页>【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion

【点云处理之论文狂读前沿版11】—— Unsupervised Point Cloud Pre-training via Occlusion Completion

2022-07-03 08:53:00 【LingbinBu】

OcCo：Unsupervised Point Cloud Pre-training via Occlusion Completion

摘要
引言
方法
- Generating Occlusions
- The Completion Task
实验
- OcCo Pre-Training Setup
- Fine-Tuning Setup
分析
讨论
生词

摘要

方法： 提出一种用于点云的预训练方法Occlusion Completion (OcCo)
技术细节：
1. mask相机视角里被遮挡的点
2. 学习一个encoder-decoder模型，用于重建被遮挡的点
3. 使用encoder的权值作为下游点云任务的初始化
应用： object classification & part-based and semantic segmentation
代码：https://github.com/hansen7/OcCo (支持PyTorch和TensorFlow)

引言

OcCo有着如下的性质：

在小样本学习(few-shot learning)实验中能够提高采样效率
在分类和分割任务中能提高泛化性
在微调后能更容易找到局部最小值
通过network dissection能够描述更具语义的表示
在jittering， translation 和 rotation 变换下仍能保持更好的分类质量

方法

记 $\mathcal{P}$ 为3D欧式空间中的一组点云， $\mathcal{P}=\left\{p_{1}, p_{2}, \ldots, p_{n}\right\}$ ，其中每个点 $p_{i}$ 是包含坐标 $\left(x_{i}, y_{i}, z_{i}\right)$ 和其他特征(颜色和法向量)的向量。先从occlusion mapping $o(\cdot)$ 开始描述，然后再介绍ompletion model $c(\cdot)$ ，伪代码和结构细节在附录里展示。

Generating Occlusions

定义一个randomised occlusion mapping $\mathbb{P} \rightarrow \mathbb{P}$ ，其中 $\mathbb{P}$ 是点云空间，描述的是从全部点云 $\mathcal{P}$ 到遮挡点云 $\tilde{\mathcal{P}}$ 之间的映射。该映射通过移除 $\mathbb{P}$ 中那些无法从特定视点看到的点来构建 $\tilde{\mathcal{P}}$ ，步骤如下：

世界坐标系中的完整点云根据相机的视点被投射到相机坐标系中的坐标上
确定该视点下被遮挡的点
再将相机坐标系中的点反投影到世界坐标系

Viewing the point cloud from a camera

通过针孔相机定义从3D世界坐标系到特定相机坐标系之间的映射：

其中 $(x, y, z)$ 是原始点云在世界坐标系中的坐标，相机视点被旋转矩阵 $\mathbf{R}$ 和平移向量 $\mathbf{t}$ 决定。相机内参 $\mathbf{K}$ 由焦距 $f$ ，skewness $\gamma$ ，成像的宽 $w$ ，高 $h$ 决定。在给定上述参数后，就可以计算点在相机坐标系中的坐标 $\left(x_{\mathrm{cam}}, y_{\mathrm{cam}}, z_{\mathrm{cam}}\right)$ 。

Determining occluded points

通过两种方式处理点 $\left(x_{\mathrm{cam}}, y_{\mathrm{cam}}, z_{\mathrm{cam}}\right)$ ：

相机坐标系中的3D点 $\left(x_{\mathrm{cam}}, y_{\mathrm{cam}}, z_{\mathrm{cam}}\right)$
深度为 $z_{\mathrm{cam}}$ 的2D像素坐标 $\left(f x_{\mathrm{cam}} / z_{\mathrm{cam}}, f y_{\mathrm{cam}} / z_{\mathrm{cam}}\right)$

通过这种方式，如果一些通过投影得到的点有相同的像素坐标，但是深度值却不相同，那么这些点间可能会存在遮挡关系。为了确定哪些点是被遮挡了，我们首先利用Delaunay triangulation来重建一个polygon mesh，然后移除属于hidden surface的点，这个hidden surface通过 z-buffering决定。

Mapping back from camera frame to world frame

一旦移除掉遮挡的点，我们就可以重新将点投射原先的世界坐标系中，使用的原理是公式1的逆变换。因此randomised occlusion mapping $o(\cdot)$ 的构造步骤如下：

固定一组初始点云 $\mathcal{P}$
给定相机的内参矩阵 $\mathbf{K}$ ，多个视点下的外参 $\left[\left[\mathbf{R}_{1} \mid \mathbf{t}_{1}\right], \ldots,\left[\mathbf{R}_{V} \mid \mathbf{t}_{V}\right]\right]$ ，其中 $V$ 表示视点的数量
对于每个视点 $\in[V]$ ，使用公式1将 $\mathcal{P}$ 都投射到对应的相机坐标系中
找到遮挡点并移除这些点
将剩下来的点反投影到世界坐标系中，对于每个视点 $\in[V]$ ，都得到最终的遮挡点云 $\tilde{\mathcal{P}}_{v}$

The Completion Task

给定通过遮挡映射 $o(\cdot)$ 得到的点云 $\tilde{\mathcal{P}}$ ，补全任务的目标便是从 $\tilde{\mathcal{P}}$ 学习一个completion mapping $\mathbb{P} \rightarrow \mathbb{P}$ ，用于补全点云 $\hat{\mathcal{P}}$ 。如果满足 $\mathbb{E}_{\tilde{\mathcal{P}} \sim o(\mathcal{P})} \ell(c(\tilde{\mathcal{P}}), \mathcal{P}) \rightarrow 0$ ，那么说明completion mapping 是准确的，其中 $\ell(\cdot, \cdot)$ 为损失函数。补全模型的结构是一个encoder-decoder的网络，encoder将遮挡的网络映射为一个向量，decoder对点云进行补全。在预训练后，encoder的权重可以作为下游任务的初始值。

实验

OcCo Pre-Training Setup

在所有的实验中都使用ModelNet40作为预训练数据集。相机的内参设置为 $\left\{ {f=1000，\gamma=0，w=1600，h=1200} \right\}$ 。对于每组点云，随机选择10组视点，视点旋转不同，平移设置为0。

completion model中，encoder可以设置为PointNet, PCN 和 DGCNN。decoder选择folding操作，重建步骤分为两步，第一步将1024维的遮挡向量转换成包含1024个点的coarse点云 $\hat{\mathcal{P}}_{\text {coarse }}$ ，然后对 $\hat{\mathcal{P}}_{\text {coarse }}$ 中的每个点都使用 $\times 4$ 的2D网格重建带有16384个点的fine形状 $\hat{\mathcal{P}}_{\text {fine }}$ ，使用Chamfer Distance (CD)作为预测 $\hat{\mathcal{P}}$ 和ground truth $\mathcal{P}$ 之间的损失函数：
$\begin{aligned} \mathrm{CD}(\hat{\mathcal{P}}, \mathcal{P}) &= \frac{1}{|\hat{\mathcal{P}}|} \sum_{\hat{x} \in \hat{\mathcal{P}}} \min _{x \in \mathcal{P}}\|\hat{x}-x\|_{2}+\frac{1}{|\mathcal{P}|} \sum_{x \in \mathcal{P}} \min _{\hat{x} \in \hat{\mathcal{P}}}\|x-\hat{x}\|_{2} \end{aligned}$
最终的补全模型损失是coarse和fine形状的Chamfer distances加权和：
$\ell:=\operatorname{CD}\left(\hat{\mathcal{P}}_{\text {coarse }}, \mathcal{P}_{\text {coarse }}\right)+\alpha \mathrm{CD}\left(\hat{\mathcal{P}}_{\text {fine }}, \mathcal{P}_{\text {fine }}\right)$