当前位置：网站首页>【FastDepth】《FastDepth：Fast Monocular Depth Estimation on Embedded Systems》

【FastDepth】《FastDepth：Fast Monocular Depth Estimation on Embedded Systems》

2022-07-02 06:26:00 【bryant_meng】

在这里插入图片描述

在这里插入图片描述

ICRA-2019

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own） / Future work

1 Background and Motivation

加速现有单目深度估计模型，使其不太损失精度的同时有较低延迟，能在 micro aerial vehicle 部署运行，辅助 mapping, localization, and obstacle avoidance 等 robotic tasks

2 Related Work

Monocular Depth Estimation
Efficient Neural Networks
Network Pruning

3 Advantages / Contributions

加速单目深度估计模型：

a low-complexity and low-latency decoder design
a state-of-the-art pruning algorithm（NetAdapt 剪枝）
Hardware-specific compilation（TVM 部署 DWConv 优化）

4 Method

1）整体结构
在这里插入图片描述
朴实无华的 U-Net 结构，skip connection 用的 add（没用 concat，avoid increasing the number of feature map channels）

upsample layer 细节如下

在这里插入图片描述

conv5（深度可分离卷积） + linear interpolation(相比于双线性，底层实现简单通用)

2）Network Pruning

用的 NetAdapt 方法来剪枝

《NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications》

在这里插入图片描述

就比较暴力和直接，下面的图更直观一些

在这里插入图片描述

3）Network Compilation

用 TVM 来加速 DWConv

参考：

TVM是一个支持 GPU、CPU、FPGA指令生成的开源编译器框架
TVM最大的特点是基于图和算符结构来优化指令生成，最大化硬件执行效率，它向上对接Tensorflow、Pytorch等深度学习框架，向下兼容GPU、CPU、ARM、TPU等硬件设备
TVM是一个端到端的指令生成器。它从深度学习框架中接收模型输入，然后进行图的转化和基本的优化，最后生成指令完成到硬件的部署。

TVM有两个主要特性：

支持将Keras、MxNet、PyTorch、Tensorflow、CoreML、DarkNet框架的深度学习模型编译为多种硬件后端的最小可部署模型。
能够自动生成和优化多个后端的张量操作并达到更好的性能。

下面感受下整体框架

在这里插入图片描述

再感受一下
在这里插入图片描述

再再再感受一下
在这里插入图片描述

5 Experiments

5.1 Datasets

NYU Depth v2

在这里插入图片描述

评价指标

$\delta1$ (the percentage of predicted pixels where the relative error is within 25%)，越大越好
RMSE (root mean squared error)，越小越好

5.2 Final Results and Comparison With Prior Work

实验平台

在这里插入图片描述

NVIDIA Jetson TX2 系列模组可为嵌入式 AI 计算设备提供出色的速度与能效。配备NVIDIA Pascal GPU、高达 8 GB 内存、59.7 GB/s 的显存带宽以及各种标准硬件接口，每款超级计算机模组将真正的AI计算带到边缘端。

相比 encoder，decoder占了更多 runtime，需要重点优化
在这里插入图片描述
Jetson TX2 in high performance (max-N) 模式下，和其他方法对比

Jetson TX2 in high energy-efficiency (max-Q) 模式下的结果
在这里插入图片描述
可视化结果如下，the error is highest at boundaries and at distant objects.

（c）和（d）区别是 skip connection，（d）精细化了很多