Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Overview

Robust Object Detection via Instance-Level Temporal Cycle Confusion

This repo contains the implementation of the ICCV 2021 paper, Robust Object Detection via Instance-Level Temporal Cycle Confusion.

Screenshot

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motion, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on various domain adaptation benchmarks with static images (Cityscapes, Foggy Cityscapes, Sim10K) and large-scale video datasets (BDD100K and Waymo open data).

Installation

Environment

  • CUDA 10.2
  • Python >= 3.7
  • Pytorch >= 1.6
  • THe Detectron2 version matches Pytorch and CUDA versions.

Dependencies

  1. Create a virtual env.
  • python3 -m pip install --user virtualenv
  • python3 -m venv cyc-conf
  • source cyc-conf/bin/activate
  1. Install dependencies.
  • pip install -r requirements.txt

  • Install Pytorch 1.9

pip3 install torch torchvision

Check out the previous Pytorch versions here.

  • Install Detectron2 Build Detectron2 from Source (gcc & g++ >= 5.4) python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Or, you can install Pre-built detectron2 (example for CUDA 10.2, Pytorch 1.9)

python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html

More details can be found here.

Data Preparation

BDD100K

  1. Download the BDD100K MOT 2020 dataset (MOT 2020 Images and MOT 2020 Labels) and the detection labels (Detection 2020 Labels) here and the detailed description is available here. Put the BDD100K data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   │   ├── images
│   │   │    └── track
│   │   │        ├── train
│   │   │        ├── val
│   │   │        └── test
│   │   └── labels
│   │        ├── box_track_20
│   │        │   ├── train
│   │        │   └── val
│   │        └── det_20
│   │            ├── det_train.json
│   │            └── det_val.json
│   ├── waymo

Convert the labels of the MOT 2020 data (train & val sets) into COCO format by running:

python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/val/ -o datasets/bdd100k/labels/track/bdd100k_mot_val_coco.json -m track
python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/train/ -o datasets/bdd100k/labels/track/bdd100k_mot_train_coco.json -m track
  1. Split the original videos into different domains (time of day). Run the following command:
python3 -m datasets.domain_splits_bdd100k

This script will first extract the domain attributes from the BDD100K detection set and then map them to the tracking set sequences. After the processing steps, you would see two additional folders domain_splits and per_seq under the datasets/bdd100k/labels/box_track_20. The domain splits of all attributes in BDD100K detection set can be found at datasets/bdd100k/labels/domain_splits.

Waymo

  1. Download the Waymo dataset here. Put the Waymo raw data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   ├── waymo
│   │   └── raw

Convert the raw TFRecord data files into COCO format by running:

python3 -m datasets.waymo2coco

Note that this script takes a long time to run, be prepared to keep it running for over a day.

  1. Convert the BDD100K dataset labels into 3 classes (originally 8). This needs to be done in order to match the 3 classes of the Waymo dataset. Run the following command:
python3 -m datasets.convert_bdd_3cls

Get Started

For joint training,

python3 -m tools.train_net --config-file [config_file] --num-gpus 8

For evaluation,

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only

This command will load the latest checkpoint in the folder. If you want to specify a different checkpoint or evaluate the pretrained checkpoints, you can run

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only MODEL.WEIGHTS [PATH_TO_CHECKPOINT]

Benchmark Results

Dataset Statistics

Dataset Split Seq frames/seq. boxes classes
BDD100K Daytime train 757 204 1.82M 8
val 108 204 287K 8
BDD100K Night train 564 204 895K 8
val 71 204 137K 8
Waymo Open Data train 798 199 3.64M 3
val 202 199 886K 3

Out of Domain Evaluation

BDD100K Daytime to Night. The base detector is Faster R-CNN with ResNet-50.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 17.84 31.35 17.68 4.92 16.15 35.56 link link
+ Rotation 18.58 32.95 18.15 5.16 16.93 36.00 link link
+ Jigsaw 17.47 31.22 16.81 5.08 15.80 33.84 link link
+ Cycle Consistency 18.35 32.44 18.07 5.04 17.07 34.85 link link
+ Cycle Confusion 19.09 33.58 19.14 5.70 17.68 35.86 link link

BDD100K Night to Daytime.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 19.14 33.04 19.16 5.38 21.42 40.34 link link
+ Rotation 19.07 33.25 18.83 5.53 21.32 40.06 link link
+ Jigsaw 19.22 33.87 18.71 5.67 22.35 38.57 link link
+ Cycle Consistency 18.89 33.50 18.31 5.82 21.01 39.13 link link
+ Cycle Confusion 19.57 34.34 19.26 6.06 22.55 38.95 link link

Waymo Front Left to BDD100K Night.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 10.07 19.62 9.05 2.67 10.81 18.62 link link
+ Rotation 11.34 23.12 9.65 3.53 11.73 21.60 link link
+ Jigsaw 9.86 19.93 8.40 2.77 10.53 18.82 link link
+ Cycle Consistency 11.55 23.44 10.00 2.96 12.19 21.99 link link
+ Cycle Confusion 12.27 26.01 10.24 3.44 12.22 23.56 link link

Waymo Front Right to BDD100K Night.

Model AP AP50 AP75 APs APm APl Config Checkpoint
Faster R-CNN 8.65 17.26 7.49 1.76 8.29 19.99 link link
+ Rotation 9.25 18.48 8.08 1.85 8.71 21.08 link link
+ Jigsaw 8.34 16.58 7.26 1.61 8.01 18.09 link link
+ Cycle Consistency 9.11 17.92 7.98 1.78 9.36 19.18 link link
+ Cycle Confusion 9.99 20.58 8.30 2.18 10.25 20.54 link link

Citation

If you find this repository useful for your publications, please consider citing our paper.

@article{wang2021robust,
  title={Robust Object Detection via Instance-Level Temporal Cycle Confusion},
  author={Wang, Xin and Huang, Thomas E and Liu, Benlin and Yu, Fisher and Wang, Xiaolong and Gonzalez, Joseph E and Darrell, Trevor},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}
Owner
Xin Wang
Researcher from Microsoft Research. Prev. Ph.D. student at UC Berkeley.
Xin Wang
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 01, 2023
Linear image-to-image translation

Linear (Un)supervised Image-to-Image Translation Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Tr

Eitan Richardson 40 Aug 31, 2022
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

Gen Li 91 Dec 23, 2022
Heterogeneous Temporal Graph Neural Network

Heterogeneous Temporal Graph Neural Network This repository contains the datasets and source code of HTGNN. run_mag.ipynb is the training and testing

15 Dec 22, 2022
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

Change is Everywhere Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery by Zhuo Zheng, Ailong Ma, Liangpei Zhang and Yanfei

Zhuo Zheng 125 Dec 13, 2022
A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Emma 1 Jan 18, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI'22)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models Requirements A suitable conda environment named ldm can be created and activated with: conda env create -f environment.yaml co

CompVis Heidelberg 5.6k Jan 04, 2023
“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品

“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品,并且能够返回完整地购物清单及顾客应付的实际商品总价格,极大地降低零售行业实际运营过程中巨大的人力成本,提升零售行业无人化、自动化、智能化水平。

thomas-yanxin 192 Jan 05, 2023
METER: Multimodal End-to-end TransformER

METER Code and pre-trained models will be publicized soon. Citation @article{dou2021meter, title={An Empirical Study of Training End-to-End Vision-a

Zi-Yi Dou 257 Jan 06, 2023
Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

How Tight Can PAC-Bayes be in the Small Data Regime? This is the code to reproduce all experiments for the following paper: @inproceedings{Foong:2021:

5 Dec 21, 2021
A python module for configuration of block devices

Blivet is a python module for system storage configuration. CI status Licence See COPYING Installation From Fedora repositories Blivet is available in

78 Dec 14, 2022
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

Graph Neural Networks with Learnable Structural and Positional Representations Source code for the paper "Graph Neural Networks with Learnable Structu

Vijay Prakash Dwivedi 180 Dec 22, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022
TSIT: A Simple and Versatile Framework for Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation This repository provides the official PyTorch implementation for the following p

Liming Jiang 255 Nov 23, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Self-Supervised Image Denoising via Iterative Data Refinement

Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang1, Dasong Li1, Ka Lung Law2, Xiaogang Wang1, Hongwei Qin2, Hongsheng Li1 1CUHK-S

Zhang Yi 72 Jan 01, 2023
Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

Sachini Herath 68 Jan 03, 2023
TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

TensorFlow GNN This is an early (alpha) release to get community feedback. It's under active development and we may break API compatibility in the fut

889 Dec 30, 2022