Pseudo lidar - (CVPR 2019) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Overview

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

This paper has been accpeted by Conference on Computer Vision and Pattern Recognition (CVPR) 2019.

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

by Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell and Kilian Q. Weinberger

Figure

Citation

@inproceedings{wang2019pseudo,
  title={Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving},
  author={Wang, Yan and Chao, Wei-Lun and Garg, Divyansh and Hariharan, Bharath and Campbell, Mark and Weinberger, Kilian},
  booktitle={CVPR},
  year={2019}
}

Update

  • 2nd July 2020: Add a jupyter script to visualize point cloud. It is in ./visualization folder.
  • 29th July 2019: submission.py will save the disparity to the numpy file, not png file. And fix the generate_lidar.py.
  • I have modifed the official avod a little bit. Now you can directly train and test pseudo-lidar with avod. Please check the code https://github.com/mileyan/avod_pl.

Contents

Introduction

3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that data representation (rather than its quality) accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance --- raising the detection accuracy of objects within 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo image based approaches.

Usage

1. Overview

We provide the guidance and codes to train stereo depth estimator and 3D object detector using the KITTI object detection benchmark. We also provide our pre-trained models.

2. Stereo depth estimation models

We provide our pretrained PSMNet model using the Scene Flow dataset and the 3,712 training images of the KITTI detection benchmark.

We also directly provide the pseudo-LiDAR point clouds and the ground planes of training and testing images estimated by this pre-trained model.

We also provide codes to train your own stereo depth estimator and prepare the point clouds and gound planes. If you want to use our pseudo-LiDAR data for 3D object detection, you may skip the following contents and directly move on to object detection models.

2.1 Dependencies

  • Python 3.5+
  • numpy, scikit-learn, scipy
  • KITTI 3D object detection dataset

2.2 Download the dataset

You need to download the KITTI dataset from here, including left and right color images, Velodyne point clouds, camera calibration matrices, and training labels. You also need to download the image set files from here. Then you need to organize the data in the following way.

KITTI/object/
    
    train.txt
    val.txt
    test.txt 
    
    training/
        calib/
        image_2/ #left image
        image_3/ #right image
        label_2/
        velodyne/ 

    testing/
        calib/
        image_2/
        image_3/
        velodyne/

The Velodyne point clouds (by LiDAR) are used ONLY as the ground truths to train a stereo depth estimator (e.g., PSMNet).

2.3 Generate ground-truth image disparities

Use the script./preprocessing/generate_disp.py to process all velodyne files appeared in train.txt. This is our training ground truth. Or you can directly download them from disparity. Name this folder as disparity and put it inside the training folder.

python generate_disp.py --data_path ./KITTI/object/training/ --split_file ./KITTI/object/train.txt 

2.4. Train the stereo model

You can train any stereo disparity model as you want. Here we give an example to train the PSMNet. The modified code is saved in the subfolder psmnet. Make sure you follow the README inside this folder to install the correct python and library. I strongly suggest using conda env to organize the python environments since we will use Python with different versions. Download the psmnet model pretrained on Sceneflow dataset from here.

# train psmnet with 4 TITAN X GPUs.
python ./psmnet/finetune_3d.py --maxdisp 192 \
     --model stackhourglass \
     --datapath ./KITTI/object/training/ \
     --split_file ./KITTI/object/train.txt \
     --epochs 300 \
     --lr_scale 50 \
     --loadmodel ./pretrained_sceneflow.tar \
     --savemodel ./psmnet/kitti_3d/  --btrain 12

2.5 Predict the point clouds

Predict the disparities.
# training
python ./psmnet/submission.py \
    --loadmodel ./psmnet/kitti_3d/finetune_300.tar \
    --datapath ./KITTI/object/training/ \
    --save_path ./KITTI/object/training/predict_disparity
# testing
python ./psmnet/submission.py \
    --loadmodel ./psmnet/kitti_3d/finetune_300.tar \
    --datapath ./KITTI/object/testing/ \
    --save_path ./KITTI/object/testing/predict_disparity
Convert the disparities to point clouds.
# training
python ./preprocessing/generate_lidar.py  \
    --calib_dir ./KITTI/object/training/calib/ \
    --save_dir ./KITTI/object/training/pseudo-lidar_velodyne/ \
    --disparity_dir ./KITTI/object/training/predict_disparity \
    --max_high 1
# testing
python ./preprocessing/generate_lidar.py  \
    --calib_dir ./KITTI/object/testing/calib/ \
    --save_dir ./KITTI/object/testing/pseudo-lidar_velodyne/ \
    --disparity_dir ./KITTI/object/testing/predict_disparity \
    --max_high 1

If you want to generate point cloud from depth map (like DORN), you can add --is_depth in the command.

2.6 Generate ground plane

If you want to train an AVOD model for 3D object detection, you need to generate ground planes from pseudo-lidar point clouds.

#training
python ./preprocessing/kitti_process_RANSAC.py \
    --calib ./KITTI/object/training/calib/ \
    --lidar_dir  ./KITTI/object/training/pseudo-lidar_velodyne/ \
    --planes_dir /KITTI/object/training/pseudo-lidar_planes/
#testing
python ./preprocessing/kitti_process_RANSAC.py \
    --calib ./KITTI/object/testing/calib/ \
    --lidar_dir  ./KITTI/object/testing/pseudo-lidar_velodyne/ \
    --planes_dir /KITTI/object/testing/pseudo-lidar_planes/

3. Object Detection models

AVOD model

Download the code from https://github.com/kujason/avod and install the Python dependencies.

Follow their README to prepare the data and then replace (1) files in velodyne with those in pseudo-lidar_velodyne and (2) files in planes with those in pseudo-lidar_planes. Note that you should still keep the folder names as velodyne and planes.

Follow their README to train the pyramid_cars_with_aug_example model. You can also download our pretrained model and directly evaluate on it. But if you want to submit your result to the leaderboard, you need to train it on trainval.txt.

Frustum-PointNets model

Download the code from https://github.com/charlesq34/frustum-pointnets and install the Python dependencies.

Follow their README to prepare the data and then replace files in velodyne with those in pseudo-lidar_velodyne. Note that you should still keep the folder name as velodyne.

Follow their README to train the v1 model. You can also download our pretrained model and directly evaluate on it.

Results

The main results on the validation dataset of our pseudo-LiDAR method. Figure

You can download the avod validation results from HERE.

Contact

If you have any question, please feel free to email us.

Yan Wang ([email protected]), Harry Chao([email protected]), Div Garg([email protected])

Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

semantic-segmentation-tensorflow This is a Tensorflow implementation of semantic segmentation models on MIT ADE20K scene parsing dataset and Cityscape

HsuanKung Yang 83 Oct 13, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Demonstrational Session git repo for H SAF User Workshop (28/1)

5th H SAF User Workshop The 5th H SAF User Workshop supported by EUMeTrain will be held in online in January 24-28 2022. This repository contains inst

H SAF 4 Aug 04, 2022
Measuring and Improving Consistency in Pretrained Language Models

ParaRel 🤘 This repository contains the code and data for the paper: Measuring and Improving Consistency in Pretrained Language Models as well as the

Yanai Elazar 26 Dec 02, 2022
Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

Chair for Sys­tems Se­cu­ri­ty 541 Nov 27, 2022
Fuwa-http - The http client implementation for the fuwa eco-system

Fuwa HTTP The HTTP client implementation for the fuwa eco-system Example import

Fuwa 2 Feb 16, 2022
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning DouZero is a reinforcement learning framework for DouDizhu (斗地主), t

Kwai Inc. 3.1k Jan 04, 2023
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 799 Dec 28, 2022
TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

ICNet_tensorflow This repo provides a TensorFlow-based implementation of paper "ICNet for Real-Time Semantic Segmentation on High-Resolution Images,"

HsuanKung Yang 406 Nov 27, 2022
Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks This is our Pytorch implementation for the paper: Zirui Zhu, Chen Gao, Xu C

Zirui Zhu 3 Dec 30, 2022
Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

43 Dec 21, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

47 Dec 16, 2022
Hooks for VCOCO

Verbs in COCO (V-COCO) Dataset This repository hosts the Verbs in COCO (V-COCO) dataset and associated code to evaluate models for the Visual Semantic

Saurabh Gupta 131 Nov 24, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

ProHMR - Probabilistic Modeling for Human Mesh Recovery Code repository for the paper: Probabilistic Modeling for Human Mesh Recovery Nikos Kolotouros

Nikos Kolotouros 209 Dec 13, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022
Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022
Flexible time series feature extraction & processing

tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data. Useful

PreDiCT.IDLab 206 Dec 28, 2022
《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

41 Oct 27, 2022