当前位置：网站首页>【Sparse-to-Dense】《Sparse-to-Dense：Depth Prediction from Sparse Depth Samples and a Single Image》

【Sparse-to-Dense】《Sparse-to-Dense：Depth Prediction from Sparse Depth Samples and a Single Image》

2022-07-02 07:44:00 【bryant_ meng】

Insert picture description here

ICRA-2018

List of articles

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
- 5.1 Datasets
- 5.2 RESULTS
6 Conclusion（own） / Future work

1 Background and Motivation

Depth perception and depth estimation in robotics, autonomous driving, augmented reality (AR) and 3D mapping And other engineering applications ！

However, the existing depth estimation methods have more or less its limitations when landing ：

1）3D LiDARs are cost-prohibitive
2）Structured-light-based depth sensors (e.g. Kinect) are sunlight-sensitive and power-consuming
3）stereo cameras require a large baseline and careful calibration for accurate triangulation, and usually fails at featureless regions

Monocular camera due to its small size , The cost is low , Energy saving , It is ubiquitous in consumer electronic products , Monocular depth estimation method has also become a point of interest for people to explore ！

However ,the accuracy and reliability of such methods is still far from being practical（ Although there has been a significant improvement over the years ）

The author in rgb Based on the image , coordination sparse depth measurements, To estimate the depth ,a few sparse depth samples drastically improves depth reconstruction performance

Insert picture description here

2 Related Work

RGB-based depth prediction
- hand-crafted features
- probabilistic graphical models
- Non-parametric approaches
- Semi-supervised learning
- unsupervised learning
Depth reconstruction from sparse samples
Sensor fusion

3 Advantages / Contributions

rgb + sparse depth Perform monocular depth prediction

ps： There is no innovation in the network structure ,sparse depth This kind of multimodality also draws lessons from others' ideas （ Of course , Sampling methods are different ）

4 Method

The overall structure

It's using encoder and decoder In the form of
Insert picture description here
UpProj In the form of ：

2）Depth Sampling

according to Bernoulli probability sampling （eg： Flip a coin , Each result is irrelevant ）, $\frac{m}{n}$

Bernoulli's test (Bernoulli experiment) Is repeated under the same conditions 、 A randomized trial conducted independently of each other , It is characterized by the fact that there are only two possible results of the randomized trial : Happen or not . We assumed that the experiment was repeated independently n Time , So we call this series of repeated independent randomized trials n The heavy Bernoulli experiment , Or Bernoulli type .

Insert picture description here

$D *$ Complete depth map ,dense depth map

$D$ sparse depth map

3）Data Augmentation

Scale / Rotation / Color Jitter / Color Normalization / Flips

scale and rotation When it comes to Nearest neighbor interpolation To avoid creating spurious sparse depth points

4）loss function

l1
l2：sensitive to outliers,over-smooth boundaries instead of sharp transitions
berHu

berHu A combination of l1 and l2

author ” Facts speak ” It's using l1

5 Experiments

5.1 Datasets

NYU-Depth-v2
464 different indoor scenes,249 Train + 215 test
the small labeled test dataset with 654 images is used for evaluating the final performance
KITTI Odometry Dataset

The KITTI dataset is more challenging for depth prediction, since the maximum distance is 100 meters as opposed to only 10 meters in the NYU-Depth-v2 dataset.

The evaluation index

RMSE: root mean squared error

Insert picture description here

REL: mean absolute relative error

Insert picture description here

$\delta_i$ :

Insert picture description here
among

card：is the cardinality of a set（ It can be simply understood as counting the number of elements ）
$\hat{y}$ ：prediction
$y$ ：GT

More relevant evaluation index references Monocular depth estimation index ：SILog, SqRel, AbsRel, RMSE, RMSE（log）

5.2 RESULTS

1）Architecture Evaluation
Insert picture description here
DeConv3 Than DeConv2 good ,

UpProj Than DeConv3 good （even larger receptive field of 4x4, the UpProj module outperforms the others）

2）Comparison with the State-of-the-Art

NYU-Depth-v2 Dataset
Insert picture description here
sd yes sparse-depth Abbreviation , That is, enter no rgb

See the visual effect
Insert picture description here

KITTI Dataset
Insert picture description here

3）On Number of Depth Samples
Insert picture description here

sparse $10^1$ This order of magnitude can be compared with rgb comparable , $10^2$ leap ,

The more samples , and rgb It doesn't matter much (performance gap between RGBd and sd shrinks as the sample size increases), Ha ha ha

This observation indicates that the information extracted from the sparse sample set dominates the prediction when the sample size is sufficiently large, and in this case the color cue becomes almost irrelevant. （ Full sampling , I will output it to you as I input it , Don't talk about it rgb Not much to do with , It has little to do with Neural Networks , Ha ha ha ）

I want to see others KITTI The impact on
Insert picture description here
Be the same in essentials while differing in minor points

4）Application: Dense Map from Visual Odometry Features

Insert picture description here
5）Application: LiDAR Super-Resolution

6 Conclusion（own） / Future work

presentation
https://www.bilibili.com/video/av66343637/

Let's take a look at some other multimodal monocular depth prediction methods

《Multi-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding》
Robotics: Science and Systems-2016
《Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation》
ICRA-2017

I feel that the landing cost is smaller than that of the author