PyTorch DepthNet Training on Still Box dataset

Overview

DepthNet training on Still Box

Project page

This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in your work, please cite us with the following bibtex :

@Article{isprs-annals-IV-2-W3-67-2017,
AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.},
TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}

depthnet

End-to-end depth from motion with stabilized monocular videos

  • This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement.
  • Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m.

See our second paper for information about using this code on real videos with speed estimation

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

Click Below for video

youtube video

DepthNet

DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image.

  • No information is given about movement direction
  • DepthNet is Fully Convolutional, which means it is completely robust to optical center fault
  • This network only works for pinhole-like pictures

Still Box

stillbox

Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is impossible to infer from a single image, as shapes get all kinds of sizes and positions.

  • You can download it here
  • The dataset webpage also provides a tutorial on how to read the data

Training

Requirements

[sudo] pip3 install -r requirements.txt

If you want to log some outputs from the validation set with the --log-output option, you need openCV python bindings to convert depth to RGB with a rainbow colormap.

If you don't have opencv, grayscales will be logged

Usage

Best results can be obtained by training on still box 64 and then finetuned successively up to the resolution you target. Here are the parameters used for the paper (please note how learning rate and batch size are changed, training was done a single GTX 980Ti).

python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn
python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64
python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128
python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256

Note: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation

Pretrained networks

Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm.

Name training set Error (m)
DepthNet_elu_bn_64.pth.tar 64 4.65 Link
DepthNet_elu_bn_128.pth.tar 128 3.08 Link
DepthNet_elu_bn_256.pth.tar 256 2.29 Link
DepthNet_elu_bn_512.pth.tar 512 1.97 Link

All the networks have the same size and same structure.

Custom FOV and focal length

Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively

  • 32px for 64x64 images
  • 64px for 128x128 images
  • 128px for 128x128 images
  • 256px for 512x512 images

Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training.

If you need to use a custom focal length and FOV you can simply resize the pictures and crop them.

Say you have a picture of width w with an associated FOV fov. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match fov (cropping doesn't affect focal length in pixels), and then resize it to w. Note that DepthNet can take rectangular pictures as input.

cropped_w = w/tan(pi*fov/360)

we naturally recommend to do this operation offline, metadata from metadata.json won't need to be altered.

with pretrained DepthNet

If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh w and FOV fov we get with a network trained with a particular focal length f the following width to resize to:

resized_w = f/2*tan(pi*fov/360)

That way, you won't have to make a dedicated training or even download the still box dataset


/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet

Testing Inference

The run_inference.py lets you run an inference on a folder of images, and save the depth maps in different visualizations.

A simple still box scene of 512x512 pictures for testing can be downloaded here. Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above.

python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs

Visualise training

Training can be visualized via tensorboard by launching this command in another terminal

tensorboard --logdir=/path/to/DepthNet/Results

You can then access the board from any computer in the local network by accessing machine_ip:6006 from a web browser, just as a regular tensorboard server. More info here

Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time The implementation is based on SIGGRAPH Aisa'20. Dependencies Python 3.7 Ubuntu

soratobtai 124 Dec 08, 2022
No Code AI/ML platform

NoCodeAIML No Code AI/ML platform - Community Edition Video credits: Uday Kiran Typical No Code AI/ML Platform will have features like drag and drop,

Bhagvan Kommadi 5 Jan 28, 2022
PyTorch implementation of 'Gen-LaneNet: a generalized and scalable approach for 3D lane detection'

(pytorch) Gen-LaneNet: a generalized and scalable approach for 3D lane detection Introduction This is a pytorch implementation of Gen-LaneNet, which p

Yuliang Guo 233 Jan 06, 2023
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 02, 2023
K-Nearest Neighbor in Pytorch

Pytorch KNN CUDA 2019/11/02 This repository will no longer be maintained as pytorch supports sort() and kthvalue on tensors. git clone https://github.

Chris Choy 65 Dec 01, 2022
ReSSL: Relational Self-Supervised Learning with Weak Augmentation

ReSSL: Relational Self-Supervised Learning with Weak Augmentation This repository contains PyTorch evaluation code, training code and pretrained model

mingkai 45 Oct 25, 2022
Official implementation of Densely connected normalizing flows

Densely connected normalizing flows This repository is the official implementation of NeurIPS 2021 paper Densely connected normalizing flows. Poster a

Matej Grcić 31 Dec 12, 2022
(Personalized) Page-Rank computation using PyTorch

torch-ppr This package allows calculating page-rank and personalized page-rank via power iteration with PyTorch, which also supports calculation on GP

Max Berrendorf 69 Dec 03, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 03, 2023
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023
Count GitHub Stars ⭐

Count GitHub Stars per Day ⭐ Track GitHub stars per day over a date range to measure the open-source popularity of different repositories. Requirement

Ultralytics 20 Nov 20, 2022
Code for "Retrieving Black-box Optimal Images from External Databases" (WSDM 2022)

Retrieving Black-box Optimal Images from External Databases (WSDM 2022) We propose how a user retreives an optimal image from external databases of we

joisino 5 Apr 13, 2022
This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans This repository contains the implementation of the pap

Photogrammetry & Robotics Bonn 40 Dec 01, 2022
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models This repo contains code for DDPM training. Based on Denoising Diffusion Probabilistic Models, Improved Denois

Alexander Markov 7 Dec 15, 2022
PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

PyTorch Image Classification Following papers are implemented using PyTorch. ResNet (1512.03385) ResNet-preact (1603.05027) WRN (1605.07146) DenseNet

1.2k Jan 04, 2023
A transformer-based method for Healthcare Image Captioning in Vietnamese

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H

Doanh B C 4 May 05, 2022
The repo of Feedback Networks, CVPR17

Feedback Networks http://feedbacknet.stanford.edu/ Paper: Feedback Networks, CVPR 2017. Amir R. Zamir*,Te-Lin Wu*, Lin Sun, William B. Shen, Bertram E

Stanford Vision and Learning Lab 87 Nov 19, 2022
LSUN Dataset Documentation and Demo Code

LSUN Please check LSUN webpage for more information about the dataset. Data Release All the images in one category are stored in one lmdb database fil

Fisher Yu 426 Jan 02, 2023