Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

Overview

CARLA-Roach

This is the official code release of the paper
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
by Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu and Luc van Gool, accepted at ICCV 2021.

It contains the code for benchmark, off-policy data collection, on-policy data collection, RL training and IL training with DAGGER. It also contains trained models of RL experts and IL agents. The supplementary videos can be found at the paper's homepage.

Installation

Please refer to INSTALL.md for installation. We use AWS EC2, but you can also install and run all experiments on your computer or cluster.

Quick Start: Collect an expert dataset using Roach

Roach is an end-to-end trained agent that drives better and more naturally than hand-crafted CARLA experts. To collect a dataset from Roach, use run/data_collect_bc.sh and modify the following arguments:

  • save_to_wandb: set to False if you don't want to upload the dataset to W&B.
  • dataset_root: local directory for saving the dataset.
  • test_suites: default is eu_data which collects data in Town01 for the NoCrash-dense benchmark. Available configurations are found here. You can also create your own configuration.
  • n_episodes: how many episodes to collect, each episode will be saved to a separate h5 file.
  • agent/cilrs/obs_configs: observation (i.e. sensor) configuration, default is central_rgb_wide. Available configurations are found here. You can also create your own configuration.
  • inject_noise: default is True. As introduced in CILRS, triangular noise is injected to steering and throttle such that the ego-vehicle does not always follow the lane center. Very useful for imitation learning.
  • actors.hero.terminal.kwargs.max_time: Maximum duration of an episode, in seconds.
  • Early stop the episode if traffic rule is violated, such that the collected dataset is error-free.
    • actors.hero.terminal.kwargs.no_collision: default is True.
    • actors.hero.terminal.kwargs.no_run_rl: default is False.
    • actors.hero.terminal.kwargs.no_run_stop: default is False.

Benchmark

To benchmark checkpoints, use run/benchmark.sh and modify the arguments to select different settings. We recommend g4dn.xlarge with 50 GB free disk space for video recording. Use screen if you want to run it in the background

screen -L -Logfile ~/screen.log -d -m run/benchmark.sh

Trained Models

The trained models are hosted here on W&B. Given the corresponding W&B run path, our code will automatically download and load the checkpoint with the configuration yaml file.

The following checkpoints are used to produce the results reported in our paper.

  • To benchmark the Autopilot, use benchmark() with agent="roaming".
  • To benchmark the RL experts, use benchmark() with agent="ppo" and set agent.ppo.wb_run_path to one of the following.
    • iccv21-roach/trained-models/1929isj0: Roach
    • iccv21-roach/trained-models/1ch63m76: PPO+beta
    • iccv21-roach/trained-models/10pscpih: PPO+exp
  • To benchmark the IL agents, use benchmark() with agent="cilrs" and set agent.cilrs.wb_run_path to one of the following.
    • Checkpoints trained for the NoCrash benchmark, at DAGGER iteration 5:
      • iccv21-roach/trained-models/39o1h862: L_A(AP)
      • iccv21-roach/trained-models/v5kqxe3i: L_A
      • iccv21-roach/trained-models/t3x557tv: L_K
      • iccv21-roach/trained-models/1w888p5d: L_K+L_V
      • iccv21-roach/trained-models/2tfhqohp: L_K+L_F
      • iccv21-roach/trained-models/3vudxj38: L_K+L_V+L_F
      • iccv21-roach/trained-models/31u9tki7: L_K+L_F(c)
      • iccv21-roach/trained-models/aovrm1fs: L_K+L_V+L_F(c)
    • Checkpoints trained for the LeaderBoard benchmark, at DAGGER iteration 5:
      • iccv21-roach/trained-models/1myvm4mw: L_A(AP)
      • iccv21-roach/trained-models/nw226h5h: L_A
      • iccv21-roach/trained-models/12uzu2lu: L_K
      • iccv21-roach/trained-models/3ar2gyqw: L_K+L_V
      • iccv21-roach/trained-models/9rcwt5fh: L_K+L_F
      • iccv21-roach/trained-models/2qq2rmr1: L_K+L_V+L_F
      • iccv21-roach/trained-models/zwadqx9z: L_K+L_F(c)
      • iccv21-roach/trained-models/21trg553: L_K+L_V+L_F(c)

Available Test Suites

Set argument test_suites to one of the following.

  • NoCrash-busy
    • eu_test_tt: NoCrash, busy traffic, train town & train weather
    • eu_test_tn: NoCrash, busy traffic, train town & new weather
    • eu_test_nt: NoCrash, busy traffic, new town & train weather
    • eu_test_nn: NoCrash, busy traffic, new town & new weather
    • eu_test: eu_test_tt/tn/nt/nn, all 4 conditions in one file
  • NoCrash-dense
    • nocrash_dense: NoCrash, dense traffic, all 4 conditions
  • LeaderBoard:
    • lb_test_tt: LeaderBoard, busy traffic, train town & train weather
    • lb_test_tn: LeaderBoard, busy traffic, train town & new weather
    • lb_test_nt: LeaderBoard, busy traffic, new town & train weather
    • lb_test_nn: LeaderBoard, busy traffic, new town & new weather
    • lb_test: lb_test_tt/tn/nt/nn all, 4 conditions in one file
  • LeaderBoard-all
    • cc_test: LeaderBoard, busy traffic, all 76 routes, dynamic weather

Collect Datasets

We recommend g4dn.xlarge for dataset collecting. Make sure you have enough disk space attached to the instance.

Collect Off-Policy Datasets

To collect off-policy datasets, use run/data_collect_bc.sh and modify the arguments to select different settings. You can use Roach (given a checkpoint) or the Autopilot to collect off-policy datasets. In our paper, before the DAGGER training the IL agents are initialized via behavior cloning (BC) using an off-policy dataset collected in this way.

Some arguments you may want to modify:

  • Set save_to_wandb=False if you don't want to upload the dataset to W&B.
  • Select the environment for collecting data by setting the argument test_suites to one of the following
    • eu_data: NoCrash, train town & train weather. We collect n_episodes=80 for BC dataset on NoCrash, that is around 75 GB and 6 hours of data.
    • lb_data: LeaderBoard, train town & train weather. We collect n_episodes=160 for BC dataset on LeaderBoard, that is around 150 GB and 12 hours of data.
    • cc_data: CARLA Challenge, all six maps (Town1-6), dynamic weather. We collect n_episodes=240 for BC dataset on CARLA Challenge, that is around 150 GB and 18 hours of data.
  • For RL experts, the used checkpoint is set via agent.ppo.wb_run_path and agent.ppo.wb_ckpt_step.
    • agent.ppo.wb_run_path is the W&B run path where the RL training is logged and the checkpoints are saved.
    • agent.ppo.wb_ckpt_step is the step of the checkpoint you want to use. If it's an integer, the script will find the checkpoint closest to that step. If it's null, the latest checkpoint will be used.

Collect On-Policy Datasets

To collect on-policy datasets, use run/data_collect_dagger.sh and modify the arguments to select different settings. You can use Roach or the Autopilot to label on-policy (DAGGER) datasets generated by an IL agent (given a checkpoint). This is done by running the data_collect.py using an IL agent as the driver, and Roach/Autopilot as the coach. So the expert supervisions are generated and recorded on the fly.

Most things are the same as collecting off-policy BC datasets. Here are some changes:

  • Set agent.cilrs.wb_run_path to the W&B run path where the IL training is logged and the checkpoints are saved.
  • By adjusting n_episodes we make sure the size of the DAGGER dataset at each iteration to be around 20% of the BC dataset size.
    • For RL experts we use an n_episodes which is the half of n_episodes of the BC dataset.
    • For the Autopilot we use an n_episodes which is the same as n_episodes of the BC dataset.

Train RL Experts

To train RL experts, use run/train_rl.sh and modify the arguments to select different settings. We recommend to use g4dn.4xlarge for training the RL experts, you will need around 50 GB free disk space for videos and checkpoints. We train RL experts on CARLA 0.9.10.1 because 0.9.11 crashes more often for unknown reasons.

Train IL Agents

To train IL agents, use run/train_il.sh and modify the arguments to select different settings. Training IL agents does not require CARLA and it's a GPU-heavy task. Therefore, we recommend to use AWS p-instances or your cluster to run the IL training. Our implementation follows DA-RB (paper, repo), which trains a CILRS (paper, repo) agent using DAGGER.

The training starts with training the basic CILRS via behavior cloning using an off-policy dataset.

  1. Collect off-policy DAGGER dataset.
  2. Train the IL model.
  3. Benchmark the trained model.

Then repeat the following DAGGER steps until the model achieves decent results.

  1. Collect on-policy DAGGER dataset.
  2. Train the IL model.
  3. Benchmark the trained model.

For the BC training,the following arguments have to be set.

  • Datasets
    • dagger_datasets: a vector of strings, for BC training it should only contain the path (local or W&B) to the BC dataset.
  • Measurement vector
    • agent.cilrs.env_wrapper.kwargs.input_states can be a subset of [speed,vec,cmd]
    • speed: scalar ego_vehicle speed
    • vec: 2D vector pointing to the next GNSS waypoint
    • cmd: one-hot vector of high-level command
  • Branching
    • For 6 branches:
      • agent.cilrs.policy.kwargs.number_of_branches=6
      • agent.cilrs.training.kwargs.branch_weights=[1.0,1.0,1.0,1.0,1.0,1.0]
    • For 1 branch:
      • agent.cilrs.policy.kwargs.number_of_branches=1
      • agent.cilrs.training.kwargs.branch_weights=[1.0]
  • Action Loss
    • L1 action loss
      • agent.cilrs.env_wrapper.kwargs.action_distribution=null
      • agent.cilrs.training.kwargs.action_kl=false
    • KL loss
      • agent.cilrs.env_wrapper.kwargs.action_distribution="beta_shared"
      • agent.cilrs.training.kwargs.action_kl=true
  • Value Loss
    • Disable
      • agent.cilrs.env_wrapper.kwargs.value_as_supervision=false
      • agent.cilrs.training.kwargs.value_weight=0.0
    • Enable
      • agent.cilrs.env_wrapper.kwargs.value_as_supervision=true
      • agent.cilrs.training.kwargs.value_weight=0.001
  • Pre-trained action/value head
    • agent.cilrs.rl_run_path and agent.cilrs.rl_ckpt_step are used to initialize the IL agent's action/value heads with Roach's action/value head.
  • Feature Loss
    • Disable
      • agent.cilrs.env_wrapper.kwargs.dim_features_supervision=0
      • agent.cilrs.training.kwargs.features_weight=0.0
    • Enable
      • agent.cilrs.env_wrapper.kwargs.dim_features_supervision=256
      • agent.cilrs.training.kwargs.features_weight=0.05

During the DAGGER training, a trained IL agent will be loaded and you cannot change the configuration any more. You will have to set

  • agent.cilrs.wb_run_path: the W&B run path where the previous IL training was logged and the checkpoints are saved.
  • agent.cilrs.wb_ckpt_step: the step of the checkpoint you want to use. Leave it as null will load the latest checkpoint.
  • dagger_datasets: vector of strings, W&B run path or local path to DAGGER datasets and the BC dataset in time-reversed order, for example [PATH_DAGGER_DATA_2, PATH_DAGGER_DATA_1, PATH_DAGGER_DATA_0, BC_DATA]
  • train_epochs: optionally you can change it if you want to train for more epochs.

Citation

Please cite our work if you found it useful:

@inproceedings{zhang2021roach,
  title = {End-to-End Urban Driving by Imitating a Reinforcement Learning Coach},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  author = {Zhang, Zhejun and Liniger, Alexander and Dai, Dengxin and Yu, Fisher and Van Gool, Luc},
  year = {2021},
}

License

This software is released under a CC-BY-NC 4.0 license, which allows personal and research use only. For a commercial license, please contact the authors. You can view a license summary here.

Portions of source code taken from external sources are annotated with links to original files and their corresponding licenses.

Acknowledgements

This work was supported by Toyota Motor Europe and was carried out at the TRACE Lab at ETH Zurich (Toyota Research on Automated Cars in Europe - Zurich).

Owner
Zhejun Zhang
PhD Candidate at CVL, ETH Zurich
Zhejun Zhang
Learning Time-Critical Responses for Interactive Character Control

Learning Time-Critical Responses for Interactive Character Control Abstract This code implements the paper Learning Time-Critical Responses for Intera

Movement Research Lab 227 Dec 31, 2022
PyTorch implementation for Convolutional Networks with Adaptive Inference Graphs

Convolutional Networks with Adaptive Inference Graphs (ConvNet-AIG) This repository contains a PyTorch implementation of the paper Convolutional Netwo

Andreas Veit 176 Dec 07, 2022
Research code for Arxiv paper "Camera Motion Agnostic 3D Human Pose Estimation"

GMR(Camera Motion Agnostic 3D Human Pose Estimation) This repo provides the source code of our arXiv paper: Seong Hyun Kim, Sunwon Jeong, Sungbum Park

Seong Hyun Kim 1 Feb 07, 2022
Local Multi-Head Channel Self-Attention for FER2013

LHC-Net Local Multi-Head Channel Self-Attention This repository is intended to provide a quick implementation of the LHC-Net and to replicate the resu

12 Jan 04, 2023
This package is for running the semantic SLAM algorithm using extracted planar surfaces from the received detection

Semantic SLAM This package can perform optimization of pose estimated from VO/VIO methods which tend to drift over time. It uses planar surfaces extra

Hriday Bavle 125 Dec 02, 2022
Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification This repository is the official implementation of [Dealing With Misspeci

0 Oct 25, 2021
BlockUnexpectedPackets - Preventing BungeeCord CPU overload due to Layer 7 DDoS attacks by scanning BungeeCord's logs

BlockUnexpectedPackets This script automatically blocks DDoS attacks that are sp

SparklyPower 3 Mar 31, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
Identifying Stroke Indicators Using Rough Sets

Identifying Stroke Indicators Using Rough Sets With the spirit of reproducible research, this repository contains all the codes required to produce th

Muhammad Salman Pathan 0 Jun 09, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Swin Transformer for Semantic Segmentation of satellite images This repo contains the supported code and configuration files to reproduce semantic seg

23 Oct 10, 2022
Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

Follow the development of our desktop client here Paaster Paaster is a secure by default end-to-end encrypted pastebin built with the objective of sim

Ward 211 Dec 25, 2022
2021 credit card consuming recommendation

2021 credit card consuming recommendation

Wang, Chung-Che 7 Mar 08, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
Image Segmentation and Object Detection in Pytorch

Image Segmentation and Object Detection in Pytorch Pytorch-Segmentation-Detection is a library for image segmentation and object detection with report

Daniil Pakhomov 732 Dec 10, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
Irrigation controller for Home Assistant

Irrigation Unlimited This integration is for irrigation systems large and small. It can offer some complex arrangements without large and messy script

Robert Cook 176 Jan 02, 2023
Hand tracking demo for DIY Smart Glasses with a remote computer doing the work

CameraStream This is a demonstration that streams the image from smartglasses to a pc, does the hand recognition on the remote pc and streams the proc

Teemu Laurila 20 Oct 13, 2022
An implementation of "Learning human behaviors from motion capture by adversarial imitation"

Merel-MoCap-GAIL An implementation of Merel et al.'s paper on generative adversarial imitation learning (GAIL) using motion capture (MoCap) data: Lear

Yu-Wei Chao 34 Nov 12, 2022
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our rep

7.7k Jan 06, 2023