当前位置：网站首页>【Sparse-to-Dense】《Sparse-to-Dense：Depth Prediction from Sparse Depth Samples and a Single Image》

【Sparse-to-Dense】《Sparse-to-Dense：Depth Prediction from Sparse Depth Samples and a Single Image》

2022-07-02 07:43:00 【bryant_meng】

在這裏插入圖片描述

ICRA-2018

文章目錄

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
- 5.1 Datasets
- 5.2 RESULTS
6 Conclusion（own） / Future work

1 Background and Motivation

深度感知和深度估計在 robotics, autonomous driving, augmented reality (AR) and 3D mapping 等工程應用中至關重要！

然而現有的深度估計手段在落地時或多或少有著它的局限性：

1）3D LiDARs are cost-prohibitive
2）Structured-light-based depth sensors (e.g. Kinect) are sunlight-sensitive and power-consuming
3）stereo cameras require a large baseline and careful calibration for accurate triangulation, and usually fails at featureless regions

單目攝像頭由於其體積小，成本低，節能，在消費電子產品中無處不在等特點，單目深度估計方法也成為了人們探索的興趣點！

然而，the accuracy and reliability of such methods is still far from being practical（盡管這些年有了顯著的提昇）

作者在 rgb 圖像的基礎上，配合 sparse depth measurements，來進行深度估計，a few sparse depth samples drastically improves depth reconstruction performance

在這裏插入圖片描述

2 Related Work

RGB-based depth prediction
- hand-crafted features
- probabilistic graphical models
- Non-parametric approaches
- Semi-supervised learning
- unsupervised learning
Depth reconstruction from sparse samples
Sensor fusion

3 Advantages / Contributions

rgb + sparse depth 進行單目深度預測

ps：網絡結構沒啥創新，sparse depth 這種多模態也是借鑒別人的思想（當然，采樣方式不一樣）

4 Method

整體結構

采用的是 encoder 和 decoder 的形式
在這裏插入圖片描述
UpProj 的形式如下：

2）Depth Sampling

根據 Bernoulli probability 采樣（eg：拋硬幣，每次結果不相關）， $\frac{m}{n}$

伯努利試驗(Bernoulli experiment)是在同樣的條件下重複地、相互獨立地進行的一種隨機試驗,其特點是該隨機試驗只有兩種可能結果:發生或者不發生。我們假設該項試驗獨立重複地進行了n次,那麼就稱這一系列重複獨立的隨機試驗為n重伯努利試驗,或稱為伯努利概型。

在這裏插入圖片描述

$D *$ 完整的深度圖，dense depth map

$D$ sparse depth map

3）Data Augmentation

Scale / Rotation / Color Jitter / Color Normalization / Flips

scale 和 rotation 的時候采用的是 Nearest neighbor interpolation 以避免 creating spurious sparse depth points

4）loss function

l1
l2：sensitive to outliers，over-smooth boundaries instead of sharp transitions
berHu

berHu 綜合了 l1 和 l2

作者”事實說話”采用的是 l1

5 Experiments

5.1 Datasets

NYU-Depth-v2
464 different indoor scenes，249 Train + 215 test
the small labeled test dataset with 654 images is used for evaluating the final performance
KITTI Odometry Dataset

The KITTI dataset is more challenging for depth prediction, since the maximum distance is 100 meters as opposed to only 10 meters in the NYU-Depth-v2 dataset.

評價指標

RMSE: root mean squared error

在這裏插入圖片描述

REL: mean absolute relative error

在這裏插入圖片描述

$\delta_i$ :

在這裏插入圖片描述
其中

card：is the cardinality of a set（可簡單理解為對元素個數計數）
$\hat{y}$ ：prediction
$y$ ：GT

更多相關評價指標參考單目深度估計指標：SILog, SqRel, AbsRel, RMSE, RMSE（log）

5.2 RESULTS

1）Architecture Evaluation
在這裏插入圖片描述
DeConv3 比 DeConv2 好，

UpProj 比 DeConv3 好（even larger receptive field of 4x4, the UpProj module outperforms the others）

2）Comparison with the State-of-the-Art

NYU-Depth-v2 Dataset
在這裏插入圖片描述
sd 是 sparse-depth 的縮寫，也即輸入沒有 rgb

看看可視化的效果
在這裏插入圖片描述

KITTI Dataset
在這裏插入圖片描述

3）On Number of Depth Samples
在這裏插入圖片描述

sparse $10^1$ 這個數量級就可以和 rgb 媲美， $10^2$ 飛躍，

采樣越多，和 rgb 關系就不大了(performance gap between RGBd and sd shrinks as the sample size increases)，哈哈哈

This observation indicates that the information extracted from the sparse sample set dominates the prediction when the sample size is sufficiently large, and in this case the color cue becomes almost irrelevant. （全采樣，怎麼輸入我就怎麼給你輸出出來，別說跟 rgb 關系不大，跟神經網絡關系也不大了，哈哈哈）

再看看 KITTI 上的影響
在這裏插入圖片描述
大同小异

4）Application: Dense Map from Visual Odometry Features

在這裏插入圖片描述
5）Application: LiDAR Super-Resolution

6 Conclusion（own） / Future work

presentation
https://www.bilibili.com/video/av66343637/

下面看看另外一些多模態的單目深度預測方法

《Multi-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding》
Robotics: Science and Systems-2016
《Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation》
ICRA-2017

感覺這個落地成本比作者的更小