当前位置:网站首页>【Sparse-to-Dense】《Sparse-to-Dense:Depth Prediction from Sparse Depth Samples and a Single Image》
【Sparse-to-Dense】《Sparse-to-Dense:Depth Prediction from Sparse Depth Samples and a Single Image》
2022-07-02 07:43:00 【bryant_meng】

ICRA-2018
文章目錄
1 Background and Motivation
深度感知和深度估計在 robotics, autonomous driving, augmented reality (AR) and 3D mapping 等工程應用中至關重要!
然而現有的深度估計手段在落地時或多或少有著它的局限性:
1)3D LiDARs are cost-prohibitive
2)Structured-light-based depth sensors (e.g. Kinect) are sunlight-sensitive and power-consuming
3)stereo cameras require a large baseline and careful calibration for accurate triangulation, and usually fails at featureless regions
單目攝像頭由於其體積小,成本低,節能,在消費電子產品中無處不在等特點,單目深度估計方法也成為了人們探索的興趣點!
然而,the accuracy and reliability of such methods is still far from being practical(盡管這些年有了顯著的提昇)
作者在 rgb 圖像的基礎上,配合 sparse depth measurements,來進行深度估計,a few sparse depth samples drastically improves depth reconstruction performance

2 Related Work
- RGB-based depth prediction
- hand-crafted features
- probabilistic graphical models
- Non-parametric approaches
- Semi-supervised learning
- unsupervised learning
- Depth reconstruction from sparse samples
- Sensor fusion
3 Advantages / Contributions
rgb + sparse depth 進行單目深度預測
ps:網絡結構沒啥創新,sparse depth 這種多模態也是借鑒別人的思想(當然,采樣方式不一樣)
4 Method
整體結構
采用的是 encoder 和 decoder 的形式
UpProj 的形式如下:
2)Depth Sampling
根據 Bernoulli probability 采樣(eg:拋硬幣,每次結果不相關), p = m n p = \frac{m}{n} p=nm
伯努利試驗(Bernoulli experiment)是在同樣的條件下重複地、相互獨立地進行的一種隨機試驗,其特點是該隨機試驗只有兩種可能結果:發生或者不發生。我們假設該項試驗獨立重複地進行了n次,那麼就稱這一系列重複獨立的隨機試驗為n重伯努利試驗,或稱為伯努利概型。

D ∗ D* D∗ 完整的深度圖,dense depth map
D D D sparse depth map
3)Data Augmentation
Scale / Rotation / Color Jitter / Color Normalization / Flips
scale 和 rotation 的時候采用的是 Nearest neighbor interpolation 以避免 creating spurious sparse depth points
4)loss function
- l1
- l2:sensitive to outliers,over-smooth boundaries instead of sharp transitions
- berHu

berHu 綜合了 l1 和 l2
作者”事實說話”采用的是 l1
5 Experiments
5.1 Datasets
NYU-Depth-v2
464 different indoor scenes,249 Train + 215 test
the small labeled test dataset with 654 images is used for evaluating the final performance
KITTI Odometry Dataset
The KITTI dataset is more challenging for depth prediction, since the maximum distance is 100 meters as opposed to only 10 meters in the NYU-Depth-v2 dataset.
評價指標
RMSE: root mean squared error

REL: mean absolute relative error

δ i \delta_i δi:

其中
- card:is the cardinality of a set(可簡單理解為對元素個數計數)
- y ^ \hat{y} y^:prediction
- y y y:GT
更多相關評價指標參考 單目深度估計指標:SILog, SqRel, AbsRel, RMSE, RMSE(log)
5.2 RESULTS
1)Architecture Evaluation
DeConv3 比 DeConv2 好,
UpProj 比 DeConv3 好(even larger receptive field of 4x4, the UpProj module outperforms the others)
2)Comparison with the State-of-the-Art
NYU-Depth-v2 Dataset
sd 是 sparse-depth 的縮寫,也即輸入沒有 rgb
看看可視化的效果
KITTI Dataset
3)On Number of Depth Samples
sparse 1 0 1 10^1 101 這個數量級就可以和 rgb 媲美, 1 0 2 10^2 102 飛躍,
采樣越多,和 rgb 關系就不大了(performance gap between RGBd and sd shrinks as the sample size increases),哈哈哈
This observation indicates that the information extracted from the sparse sample set dominates the prediction when the sample size is sufficiently large, and in this case the color cue becomes almost irrelevant. (全采樣,怎麼輸入我就怎麼給你輸出出來,別說跟 rgb 關系不大,跟神經網絡關系也不大了,哈哈哈)
再看看 KITTI 上的影響
大同小异
4)Application: Dense Map from Visual Odometry Features

5)Application: LiDAR Super-Resolution
6 Conclusion(own) / Future work
presentation
https://www.bilibili.com/video/av66343637/
下面看看另外一些多模態的單目深度預測方法
《Multi-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding》
Robotics: Science and Systems-2016


《Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation》
ICRA-2017

感覺這個落地成本比作者的更小

边栏推荐
猜你喜欢

How do vision transformer work?【论文解读】

TimeCLR: A self-supervised contrastive learning framework for univariate time series representation

一份Slide两张表格带你快速了解目标检测

Tencent machine test questions

MMDetection安装问题
![[paper introduction] r-drop: regulated dropout for neural networks](/img/09/4755e094b789b560c6b10323ebd5c1.png)
[paper introduction] r-drop: regulated dropout for neural networks
![[medical] participants to medical ontologies: Content Selection for Clinical Abstract Summarization](/img/24/09ae6baee12edaea806962fc5b9a1e.png)
[medical] participants to medical ontologies: Content Selection for Clinical Abstract Summarization

SSM second hand trading website
![[model distillation] tinybert: distilling Bert for natural language understanding](/img/c1/e1c1a3cf039c4df1b59ef4b4afbcb2.png)
[model distillation] tinybert: distilling Bert for natural language understanding

论文写作tip2
随机推荐
【Mixed Pooling】《Mixed Pooling for Convolutional Neural Networks》
Proof and understanding of pointnet principle
Traditional target detection notes 1__ Viola Jones
Spark SQL task performance optimization (basic)
The difference and understanding between generative model and discriminant model
PointNet原理证明与理解
一份Slide两张表格带你快速了解目标检测
论文tips
Drawing mechanism of view (II)
Faster-ILOD、maskrcnn_benchmark训练自己的voc数据集及问题汇总
【BERT,GPT+KG调研】Pretrain model融合knowledge的论文集锦
Implementation of purchase, sales and inventory system with ssm+mysql
机器学习理论学习:感知机
Sorting out dialectics of nature
Point cloud data understanding (step 3 of pointnet Implementation)
Play online games with mame32k
【AutoAugment】《AutoAugment:Learning Augmentation Policies from Data》
Translation of the paper "written mathematical expression recognition with bidirectionally trained transformer"
A slide with two tables will help you quickly understand the target detection
自然辩证辨析题整理