Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Overview

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?"

Install // Datasets // Experiments // Models // License // Reference

Full video

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Installation

We recommend using docker (see nvidia-docker2 instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):

git clone https://github.com/TRI-ML/dd3d.git
cd dd3d
# If you want to use docker (recommended)
make docker-build # CUDA 10.2
# Alternative docker image for cuda 11.1
# make docker-build DOCKERFILE=Dockerfile-cu111

Please check the version of your nvidia driver and cuda compatibility to determine which Dockerfile to use.

We will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with make docker-dev to land in a shell where you can type those commands, or you can do it in one step:

# single GPU
make docker-run COMMAND="<some-command>"
# multi GPU
make docker-run-mpi COMMAND="<some-command>"

If you want to use features related to AWS (for caching the output directory) and Weights & Biases (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables before building the docker image:

export AWS_SECRET_ACCESS_KEY="<something>"
export AWS_ACCESS_KEY_ID="<something>"
export AWS_DEFAULT_REGION="<something>"
export WANDB_ENTITY="<something>"
export WANDB_API_KEY="<something>"

You should also enable these features in configuration, such as WANDB.ENABLED and SYNC_OUTPUT_DIR_S3.ENABLED.

Datasets

By default, datasets are assumed to be downloaded in /data/datasets/<dataset-name> (can be a symbolic link). The dataset root is configurable by DATASET_ROOT.

KITTI

The KITTI 3D dataset used in our experiments can be downloaded from the KITTI website. For convenience, we provide the standard splits used in 3DOP for training and evaluation:

# download a standard splits subset of KITTI
curl -s https://tri-ml-public.s3.amazonaws.com/github/dd3d/mv3d_kitti_splits.tar | sudo tar xv -C /data/datasets/KITTI3D

The dataset must be organized as follows:

<DATASET_ROOT>
    └── KITTI3D
        ├── mv3d_kitti_splits
        │   ├── test.txt
        │   ├── train.txt
        │   ├── trainval.txt
        │   └── val.txt
        ├── testing
        │   ├── calib
        |   │   ├── 000000.txt
        |   │   ├── 000001.txt
        |   │   └── ...
        │   └── image_2
        │       ├── 000000.png
        │       ├── 000001.png
        │       └── ...
        └── training
            ├── calib
            │   ├── 000000.txt
            │   ├── 000001.txt
            │   └── ...
            ├── image_2
            │   ├── 000000.png
            │   ├── 000001.png
            │   └── ...
            └── label_2
                ├── 000000.txt
                ├── 000001.txt
                └── ..

nuScenes

The nuScenes dataset (v1.0) can be downloaded from the nuScenes website. The dataset must be organized as follows:

<DATASET_ROOT>
    └── nuScenes
        ├── samples
        │   ├── CAM_FRONT
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243512465.jpg
        │   │   ├── ...
        │   │  
        │   ├── CAM_FRONT_LEFT
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243004917.jpg
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243504917.jpg
        │   │   ├── ...
        │   │  
        │   ├── ...
        │  
        ├── v1.0-trainval
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...
        │  
        ├── v1.0-test
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...
        │  
        ├── v1.0-mini
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...

Pre-trained DD3D models

The DD3D models pre-trained on dense depth estimation using DDAD15M can be downloaded here:

backbone download
DLA34 model
V2-99 model

(Optional) Eigen-clean subset of KITTI raw.

To train our Pseudo-Lidar detector, we curated a new subset of KITTI (raw) dataset and use it to fine-tune its depth network. This subset can be downloaded here. Each row contains left and right image pairs. The KITTI raw dataset can be download here.

Validating installation

To validate and visualize the dataloader (including data augmentation), run the following:

./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4

To validate the entire training loop (including evaluation and visualization), run the overfit experiment (trained on test set):

./scripts/train.py +experiments=dd3d_kitti_dla34_overfit
experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 6 0.25 log 84.54 88.83 model

Experiments

Configuration

We use hydra to configure experiments, specifically following this pattern to organize and compose configurations. The experiments under configs/experiments describe the delta from the default configuration, and can be run as follows:

# omit the '.yaml' extension from the experiment file.
./scripts/train.py +experiments=<experiment-file> <config-override>

The configuration is modularized by various components such as datasets, backbones, evaluators, and visualizers, etc.

Using multiple GPUs

The training script supports (single-node) multi-GPU for training and evaluation via mpirun. This is most conveniently executed by the make docker-run-mpi command (see above). Internally, IMS_PER_BATCH parameters of the optimizer and the evaluator denote the total size of batch that is sharded across available GPUs while training or evaluating. They are required to be set as a multuple of available GPUs.

Evaluation

One can run only evaluation using the pretrained models:

./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model>
# use smaller batch size for single-gpu
./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model> TEST.IMS_PER_BATCH=4

Gradient accumulation

If you have insufficient GPU memory for any experiment, you can use gradient accumulation by configuring ACCUMULATE_GRAD_BATCHES, at the cost of longer training time. For instance, if the experiment requires at least 400 of GPU memory (e.g. V2-99, KITTI) and you have only 128 (e.g., 8 x 16G GPUs), then you can update parameters at every 4th step:

# The original batch size is 64.
./scripts/train.py +experiments=dd3d_kitti_v99 SOLVER.IMS_PER_BATCH=16 SOLVER.ACCUMULATE_GRAD_BATCHES=4

Models

All experiments here use 8 A100 40G GPUs, and use gradient accumulation when more GPU memory is needed. We subsample nuScenes validation set by a factor of 8 (2Hz ⟶ 0.25Hz) to save training time.

KITTI

experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 256 4.5 log 16.92 24.77 model
config V2-99 400 9.0 log 23.90 32.01 model

nuScenes

experiment backbone train mem. (GB) train time (hr) train log mAP (%) NDS download
config DLA-34 TBD TBD TBD) TBD TBD TBD
config V2-99 TBD TBD TBD TBD TBD TBD

License

The source code is released under the MIT license. We note that some code in this repository is adapted from the following repositories:

Reference

@inproceedings{park2021dd3d,
  author = {Dennis Park and Rares Ambrus and Vitor Guizilini and Jie Li and Adrien Gaidon},
  title = {Is Pseudo-Lidar needed for Monocular 3D Object detection?},
  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
  primaryClass = {cs.CV},
  year = {2021},
}
Owner
Toyota Research Institute - Machine Learning
Toyota Research Institute - Machine Learning
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
Deep Learning Specialization by Andrew Ng, deeplearning.ai.

Deep Learning Specialization on Coursera Master Deep Learning, and Break into AI This is my personal projects for the course. The course covers deep l

Engen 1.5k Jan 07, 2023
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales. In this case, we ended up using XGBoost because it was the o

1 Jan 04, 2022
Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Philipp Erler 329 Jan 06, 2023
Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

pmapper pmapper is a super-resolution and deconvolution toolkit for python 3.6+. PMAP stands for Poisson Maximum A-Posteriori, a highly flexible and a

NASA Jet Propulsion Laboratory 8 Nov 06, 2022
⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances This repository contains the code for Measuring the Co

Daniel Steinberg 0 Nov 06, 2022
Pytorch code for semantic segmentation using ERFNet

ERFNet (PyTorch version) This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation. For t

Edu 394 Jan 01, 2023
Zero-Cost Proxies for Lightweight NAS

Zero-Cost-NAS Companion code for the ICLR2021 paper: Zero-Cost Proxies for Lightweight NAS tl;dr A single minibatch of data is used to score neural ne

SamsungLabs 108 Dec 20, 2022
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition How Fast Compare to Other Zero-Shot NAS Proxies on CIFAR-10/100 Pre-trained Model

190 Dec 29, 2022
To model the probability of a soccer coach leave his/her team during Campeonato Brasileiro for 10 chosen teams and considering years 2018, 2019 and 2020.

To model the probability of a soccer coach leave his/her team during Campeonato Brasileiro for 10 chosen teams and considering years 2018, 2019 and 2020.

Larissa Sayuri Futino Castro dos Santos 1 Jan 20, 2022
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

CycleGAN PyTorch | project page | paper Torch implementation for learning an image-to-image translation (i.e. pix2pix) without input-output pairs, for

Jun-Yan Zhu 11.5k Dec 30, 2022
Official PyTorch implementation of GDWCT (CVPR 2019, oral)

This repository provides the official code of GDWCT, and it is written in PyTorch. Paper Image-to-Image Translation via Group-wise Deep Whitening-and-

WonwoongCho 135 Dec 02, 2022
ElasticFace: Elastic Margin Loss for Deep Face Recognition

This is the official repository of the paper: ElasticFace: Elastic Margin Loss for Deep Face Recognition Paper on arxiv: arxiv Model Log file Pretrain

Fadi Boutros 113 Dec 14, 2022
SegNet model implemented using keras framework

keras-segnet Implementation of SegNet-like architecture using keras. Current version doesn't support index transferring proposed in SegNet article, so

185 Aug 30, 2022
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. 🚀 Installat

Jintang Li 54 Jan 05, 2023
NER for Indian languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages Code for the paper - https://arxiv.org/abs/2111.11815 Setup Setup a virtual environment Th

Akshara P 0 Nov 24, 2021
Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have underg

Nafis Ahmed 1 Dec 28, 2021
Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

Victor Chen 40 Nov 23, 2022
机器学习、深度学习、自然语言处理等人工智能基础知识总结。

说明 机器学习、深度学习、自然语言处理基础知识总结。 目前主要参考李航老师的《统计学习方法》一书,也有一些内容例如XGBoost、聚类、深度学习相关内容、NLP相关内容等是书中未提及的。

Peter 445 Dec 12, 2022