Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

Overview

CARLA-Roach

This is the official code release of the paper
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
by Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu and Luc van Gool, accepted at ICCV 2021.

It contains the code for benchmark, off-policy data collection, on-policy data collection, RL training and IL training with DAGGER. It also contains trained models of RL experts and IL agents. The supplementary videos can be found at the paper's homepage.

Installation

Please refer to INSTALL.md for installation. We use AWS EC2, but you can also install and run all experiments on your computer or cluster.

Quick Start: Collect an expert dataset using Roach

Roach is an end-to-end trained agent that drives better and more naturally than hand-crafted CARLA experts. To collect a dataset from Roach, use run/data_collect_bc.sh and modify the following arguments:

  • save_to_wandb: set to False if you don't want to upload the dataset to W&B.
  • dataset_root: local directory for saving the dataset.
  • test_suites: default is eu_data which collects data in Town01 for the NoCrash-dense benchmark. Available configurations are found here. You can also create your own configuration.
  • n_episodes: how many episodes to collect, each episode will be saved to a separate h5 file.
  • agent/cilrs/obs_configs: observation (i.e. sensor) configuration, default is central_rgb_wide. Available configurations are found here. You can also create your own configuration.
  • inject_noise: default is True. As introduced in CILRS, triangular noise is injected to steering and throttle such that the ego-vehicle does not always follow the lane center. Very useful for imitation learning.
  • actors.hero.terminal.kwargs.max_time: Maximum duration of an episode, in seconds.
  • Early stop the episode if traffic rule is violated, such that the collected dataset is error-free.
    • actors.hero.terminal.kwargs.no_collision: default is True.
    • actors.hero.terminal.kwargs.no_run_rl: default is False.
    • actors.hero.terminal.kwargs.no_run_stop: default is False.

Benchmark

To benchmark checkpoints, use run/benchmark.sh and modify the arguments to select different settings. We recommend g4dn.xlarge with 50 GB free disk space for video recording. Use screen if you want to run it in the background

screen -L -Logfile ~/screen.log -d -m run/benchmark.sh

Trained Models

The trained models are hosted here on W&B. Given the corresponding W&B run path, our code will automatically download and load the checkpoint with the configuration yaml file.

The following checkpoints are used to produce the results reported in our paper.

  • To benchmark the Autopilot, use benchmark() with agent="roaming".
  • To benchmark the RL experts, use benchmark() with agent="ppo" and set agent.ppo.wb_run_path to one of the following.
    • iccv21-roach/trained-models/1929isj0: Roach
    • iccv21-roach/trained-models/1ch63m76: PPO+beta
    • iccv21-roach/trained-models/10pscpih: PPO+exp
  • To benchmark the IL agents, use benchmark() with agent="cilrs" and set agent.cilrs.wb_run_path to one of the following.
    • Checkpoints trained for the NoCrash benchmark, at DAGGER iteration 5:
      • iccv21-roach/trained-models/39o1h862: L_A(AP)
      • iccv21-roach/trained-models/v5kqxe3i: L_A
      • iccv21-roach/trained-models/t3x557tv: L_K
      • iccv21-roach/trained-models/1w888p5d: L_K+L_V
      • iccv21-roach/trained-models/2tfhqohp: L_K+L_F
      • iccv21-roach/trained-models/3vudxj38: L_K+L_V+L_F
      • iccv21-roach/trained-models/31u9tki7: L_K+L_F(c)
      • iccv21-roach/trained-models/aovrm1fs: L_K+L_V+L_F(c)
    • Checkpoints trained for the LeaderBoard benchmark, at DAGGER iteration 5:
      • iccv21-roach/trained-models/1myvm4mw: L_A(AP)
      • iccv21-roach/trained-models/nw226h5h: L_A
      • iccv21-roach/trained-models/12uzu2lu: L_K
      • iccv21-roach/trained-models/3ar2gyqw: L_K+L_V
      • iccv21-roach/trained-models/9rcwt5fh: L_K+L_F
      • iccv21-roach/trained-models/2qq2rmr1: L_K+L_V+L_F
      • iccv21-roach/trained-models/zwadqx9z: L_K+L_F(c)
      • iccv21-roach/trained-models/21trg553: L_K+L_V+L_F(c)

Available Test Suites

Set argument test_suites to one of the following.

  • NoCrash-busy
    • eu_test_tt: NoCrash, busy traffic, train town & train weather
    • eu_test_tn: NoCrash, busy traffic, train town & new weather
    • eu_test_nt: NoCrash, busy traffic, new town & train weather
    • eu_test_nn: NoCrash, busy traffic, new town & new weather
    • eu_test: eu_test_tt/tn/nt/nn, all 4 conditions in one file
  • NoCrash-dense
    • nocrash_dense: NoCrash, dense traffic, all 4 conditions
  • LeaderBoard:
    • lb_test_tt: LeaderBoard, busy traffic, train town & train weather
    • lb_test_tn: LeaderBoard, busy traffic, train town & new weather
    • lb_test_nt: LeaderBoard, busy traffic, new town & train weather
    • lb_test_nn: LeaderBoard, busy traffic, new town & new weather
    • lb_test: lb_test_tt/tn/nt/nn all, 4 conditions in one file
  • LeaderBoard-all
    • cc_test: LeaderBoard, busy traffic, all 76 routes, dynamic weather

Collect Datasets

We recommend g4dn.xlarge for dataset collecting. Make sure you have enough disk space attached to the instance.

Collect Off-Policy Datasets

To collect off-policy datasets, use run/data_collect_bc.sh and modify the arguments to select different settings. You can use Roach (given a checkpoint) or the Autopilot to collect off-policy datasets. In our paper, before the DAGGER training the IL agents are initialized via behavior cloning (BC) using an off-policy dataset collected in this way.

Some arguments you may want to modify:

  • Set save_to_wandb=False if you don't want to upload the dataset to W&B.
  • Select the environment for collecting data by setting the argument test_suites to one of the following
    • eu_data: NoCrash, train town & train weather. We collect n_episodes=80 for BC dataset on NoCrash, that is around 75 GB and 6 hours of data.
    • lb_data: LeaderBoard, train town & train weather. We collect n_episodes=160 for BC dataset on LeaderBoard, that is around 150 GB and 12 hours of data.
    • cc_data: CARLA Challenge, all six maps (Town1-6), dynamic weather. We collect n_episodes=240 for BC dataset on CARLA Challenge, that is around 150 GB and 18 hours of data.
  • For RL experts, the used checkpoint is set via agent.ppo.wb_run_path and agent.ppo.wb_ckpt_step.
    • agent.ppo.wb_run_path is the W&B run path where the RL training is logged and the checkpoints are saved.
    • agent.ppo.wb_ckpt_step is the step of the checkpoint you want to use. If it's an integer, the script will find the checkpoint closest to that step. If it's null, the latest checkpoint will be used.

Collect On-Policy Datasets

To collect on-policy datasets, use run/data_collect_dagger.sh and modify the arguments to select different settings. You can use Roach or the Autopilot to label on-policy (DAGGER) datasets generated by an IL agent (given a checkpoint). This is done by running the data_collect.py using an IL agent as the driver, and Roach/Autopilot as the coach. So the expert supervisions are generated and recorded on the fly.

Most things are the same as collecting off-policy BC datasets. Here are some changes:

  • Set agent.cilrs.wb_run_path to the W&B run path where the IL training is logged and the checkpoints are saved.
  • By adjusting n_episodes we make sure the size of the DAGGER dataset at each iteration to be around 20% of the BC dataset size.
    • For RL experts we use an n_episodes which is the half of n_episodes of the BC dataset.
    • For the Autopilot we use an n_episodes which is the same as n_episodes of the BC dataset.

Train RL Experts

To train RL experts, use run/train_rl.sh and modify the arguments to select different settings. We recommend to use g4dn.4xlarge for training the RL experts, you will need around 50 GB free disk space for videos and checkpoints. We train RL experts on CARLA 0.9.10.1 because 0.9.11 crashes more often for unknown reasons.

Train IL Agents

To train IL agents, use run/train_il.sh and modify the arguments to select different settings. Training IL agents does not require CARLA and it's a GPU-heavy task. Therefore, we recommend to use AWS p-instances or your cluster to run the IL training. Our implementation follows DA-RB (paper, repo), which trains a CILRS (paper, repo) agent using DAGGER.

The training starts with training the basic CILRS via behavior cloning using an off-policy dataset.

  1. Collect off-policy DAGGER dataset.
  2. Train the IL model.
  3. Benchmark the trained model.

Then repeat the following DAGGER steps until the model achieves decent results.

  1. Collect on-policy DAGGER dataset.
  2. Train the IL model.
  3. Benchmark the trained model.

For the BC training,the following arguments have to be set.

  • Datasets
    • dagger_datasets: a vector of strings, for BC training it should only contain the path (local or W&B) to the BC dataset.
  • Measurement vector
    • agent.cilrs.env_wrapper.kwargs.input_states can be a subset of [speed,vec,cmd]
    • speed: scalar ego_vehicle speed
    • vec: 2D vector pointing to the next GNSS waypoint
    • cmd: one-hot vector of high-level command
  • Branching
    • For 6 branches:
      • agent.cilrs.policy.kwargs.number_of_branches=6
      • agent.cilrs.training.kwargs.branch_weights=[1.0,1.0,1.0,1.0,1.0,1.0]
    • For 1 branch:
      • agent.cilrs.policy.kwargs.number_of_branches=1
      • agent.cilrs.training.kwargs.branch_weights=[1.0]
  • Action Loss
    • L1 action loss
      • agent.cilrs.env_wrapper.kwargs.action_distribution=null
      • agent.cilrs.training.kwargs.action_kl=false
    • KL loss
      • agent.cilrs.env_wrapper.kwargs.action_distribution="beta_shared"
      • agent.cilrs.training.kwargs.action_kl=true
  • Value Loss
    • Disable
      • agent.cilrs.env_wrapper.kwargs.value_as_supervision=false
      • agent.cilrs.training.kwargs.value_weight=0.0
    • Enable
      • agent.cilrs.env_wrapper.kwargs.value_as_supervision=true
      • agent.cilrs.training.kwargs.value_weight=0.001
  • Pre-trained action/value head
    • agent.cilrs.rl_run_path and agent.cilrs.rl_ckpt_step are used to initialize the IL agent's action/value heads with Roach's action/value head.
  • Feature Loss
    • Disable
      • agent.cilrs.env_wrapper.kwargs.dim_features_supervision=0
      • agent.cilrs.training.kwargs.features_weight=0.0
    • Enable
      • agent.cilrs.env_wrapper.kwargs.dim_features_supervision=256
      • agent.cilrs.training.kwargs.features_weight=0.05

During the DAGGER training, a trained IL agent will be loaded and you cannot change the configuration any more. You will have to set

  • agent.cilrs.wb_run_path: the W&B run path where the previous IL training was logged and the checkpoints are saved.
  • agent.cilrs.wb_ckpt_step: the step of the checkpoint you want to use. Leave it as null will load the latest checkpoint.
  • dagger_datasets: vector of strings, W&B run path or local path to DAGGER datasets and the BC dataset in time-reversed order, for example [PATH_DAGGER_DATA_2, PATH_DAGGER_DATA_1, PATH_DAGGER_DATA_0, BC_DATA]
  • train_epochs: optionally you can change it if you want to train for more epochs.

Citation

Please cite our work if you found it useful:

@inproceedings{zhang2021roach,
  title = {End-to-End Urban Driving by Imitating a Reinforcement Learning Coach},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  author = {Zhang, Zhejun and Liniger, Alexander and Dai, Dengxin and Yu, Fisher and Van Gool, Luc},
  year = {2021},
}

License

This software is released under a CC-BY-NC 4.0 license, which allows personal and research use only. For a commercial license, please contact the authors. You can view a license summary here.

Portions of source code taken from external sources are annotated with links to original files and their corresponding licenses.

Acknowledgements

This work was supported by Toyota Motor Europe and was carried out at the TRACE Lab at ETH Zurich (Toyota Research on Automated Cars in Europe - Zurich).

Owner
Zhejun Zhang
PhD Candidate at CVL, ETH Zurich
Zhejun Zhang
Robust & Reliable Route Recommendation on Road Networks

NeuroMLR: Robust & Reliable Route Recommendation on Road Networks This repository is the official implementation of NeuroMLR: Robust & Reliable Route

4 Dec 20, 2022
harmonic-percussive-residual separation algorithm wrapped as a VST3 plugin (iPlug2)

Harmonic-percussive-residual separation plug-in This work is a study on the plausibility of a sines-transients-noise decomposition inspired algorithm

Derp Learning 9 Sep 01, 2022
Convert dog pictures into various painting styles. Try LimnPet

LimnPet Cartoon stylization service project Try our service » Home page · Team notion · Members 목차 프로젝트 소개 프로젝트 목표 사용한 기술스택과 수행도구 팀원 구현 기능 주요 기능 추가 기능

LiJell 7 Jul 14, 2022
Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

Yonghye Kwon 8 Jul 27, 2022
Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Contra-OOD Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers. Requirements PyTorch Transformers datasets

Wenxuan Zhou 27 Oct 28, 2022
PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Pre

Facebook Research 437 Dec 23, 2022
(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

RDPNet IEEE TIP 2021: Regularized Densely-connected Pyramid Network for Salient Instance Segmentation PyTorch training and testing code are available.

Yu-Huan Wu 41 Oct 21, 2022
Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

The Face Synthetics dataset Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. It was introduced in ou

Microsoft 608 Jan 02, 2023
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

Jia Research Lab 115 Dec 23, 2022
A clean and robust Pytorch implementation of PPO on continuous action space.

PPO-Continuous-Pytorch I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable. And this is

XinJingHao 56 Dec 16, 2022
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization This repository contains the source code for the paper (link wi

Rakuten Group, Inc. 0 Nov 19, 2021
An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Fast Transformer This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer

Rishit Dagli 139 Dec 28, 2022
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
Keras Image Embeddings using Contrastive Loss

Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B

Shravan Anand K 5 Mar 21, 2022
A PyTorch Implementation of FaceBoxes

FaceBoxes in PyTorch By Zisian Wong, Shifeng Zhang A PyTorch implementation of FaceBoxes: A CPU Real-time Face Detector with High Accuracy. The offici

Zi Sian Wong 797 Dec 17, 2022
TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition This is an implementation of TCPNet. Introduction For video recognition task, a g

Zilin Gao 21 Dec 08, 2022
GeneralOCR is open source Optical Character Recognition based on PyTorch.

Introduction GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on

57 Dec 29, 2022
ML-PersonalWork - Big assignment PersonalWork in Machine Learning, 2021 autumn BUAA.

ML-PersonalWork - Big assignment PersonalWork in Machine Learning, 2021 autumn BUAA.

Snapdragon Lee 2 Dec 16, 2022
Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

ScottYuan 7 Jan 05, 2023
LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Package Description The difficulties in acquiring spectroscopic data have been a major challenge for supernova surveys. snlstm is developed to provide

7 Oct 11, 2022