当前位置：网站首页>【FastDepth】《FastDepth：Fast Monocular Depth Estimation on Embedded Systems》

【FastDepth】《FastDepth：Fast Monocular Depth Estimation on Embedded Systems》

2022-07-02 07:44:00 【bryant_ meng】

Insert picture description here

Insert picture description here

ICRA-2019

List of articles

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own） / Future work

1 Background and Motivation

Accelerate the existing monocular depth estimation model , It has low delay while not losing accuracy , Can be in micro aerial vehicle Deployment run , auxiliary mapping, localization, and obstacle avoidance etc. robotic tasks

2 Related Work

Monocular Depth Estimation
Efficient Neural Networks
Network Pruning

3 Advantages / Contributions

Accelerated monocular depth estimation model ：

a low-complexity and low-latency decoder design
a state-of-the-art pruning algorithm（NetAdapt prune ）
Hardware-specific compilation（TVM Deploy DWConv Optimize ）

4 Method

1） The overall structure
Insert picture description here
Unsophisticated U-Net structure ,skip connection With add（ useless concat,avoid increasing the number of feature map channels）

upsample layer The details are as follows

Insert picture description here

conv5（ Depth separates the convolution ） + linear interpolation( Compared with bilinear , The underlying implementation is simple and general )

2）Network Pruning

With NetAdapt Methods to prune

《NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications》

Insert picture description here

It's more violent and direct , The following figure is more intuitive

Insert picture description here

3）Network Compilation

use TVM To speed up DWConv

Reference resources ：

TVM It's a support GPU、CPU、FPGA Open source compiler framework for instruction generation
TVM The biggest feature is to optimize instruction generation based on graph and operator structure , Maximize hardware execution efficiency , It butts up Tensorflow、Pytorch Equal depth learning framework , Backwards compatible GPU、CPU、ARM、TPU Etc
TVM Is an end-to-end instruction generator . It receives model input from the deep learning framework , Then transform the graph and optimize it basically , Finally, generate instructions to complete the deployment of hardware .

TVM There are two main features ：

Support will Keras、MxNet、PyTorch、Tensorflow、CoreML、DarkNet The deep learning model of the framework is compiled into the minimum deployable model of a variety of hardware backend .
It can automatically generate and optimize multiple back-end tensor operations and achieve better performance .

Now feel the overall framework

Insert picture description here

Feel it again
Insert picture description here

5 Experiments

5.1 Datasets

NYU Depth v2

Insert picture description here

The evaluation index

$\delta1$ (the percentage of predicted pixels where the relative error is within 25%), The bigger the better
RMSE (root mean squared error), The smaller the better.

5.2 Final Results and Comparison With Prior Work

The experiment platform

Insert picture description here

NVIDIA Jetson TX2 Series modules can be embedded AI Computing devices provide excellent speed and energy efficiency . Equipped with NVIDIA Pascal GPU、 the height is 8 GB Memory 、59.7 GB/s Video memory bandwidth and various standard hardware interfaces , Every supercomputer module will really AI The calculation is brought to the edge .

comparison encoder,decoder Occupy more runtime, Need to focus on Optimization
Insert picture description here
Jetson TX2 in high performance (max-N) In mode , Compare with other methods

Jetson TX2 in high energy-efficiency (max-Q) Results in mode
Insert picture description here
The visualization results are as follows ,the error is highest at boundaries and at distant objects.

（c） and （d） The difference is that skip connection,（d） Refined a lot