The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

Overview

ReCo - Regional Contrast

This repository contains the source code of ReCo and baselines from the paper, Bootstrapping Semantic Segmentation with Regional Contrast, introduced by Shikun Liu, Shuaifeng Zhi, Edward Johns, and Andrew Davison.

Check out our project page for more qualitative results.

Datasets

ReCo is evaluated with three datasets: CityScapes, PASCAL VOC and SUN RGB-D in the full label mode, among which CityScapes and PASCAL VOC are additionally evaluated in the partial label mode.

  • For CityScapes, please download the original dataset from the official CityScapes site: leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip. Create and extract them to the corresponding dataset/cityscapes folder.
  • For Pascal VOC, please download the original training images from the official PASCAL site: VOCtrainval_11-May-2012.tar and the augmented labels here: SegmentationClassAug.zip. Extract the folder JPEGImages and SegmentationClassAug into the corresponding dataset/pascal folder.
  • For SUN RGB-D, please download the train dataset here: SUNRGBD-train_images.tgz, test dataset here: SUNRGBD-test_images.tgz and labels here: sunrgbd_train_test_labels.tar.gz. Extract and place them into the corresponding dataset/sun folder.

After making sure all datasets having been downloaded and placed correctly, run each processing file python dataset/{DATASET}_preprocess.py to pre-process each dataset ready for the experiments. The preprocessing file also includes generating partial label for Cityscapes and Pascal dataset with three random seeds. Feel free to modify the partial label size and random seed to suit your own research setting.

For the lazy ones: just download the off-the-shelf pre-processed datasets here: CityScapes, Pascal VOC and SUN RGB-D.

Training Supervised and Semi-supervised Models

In this paper, we introduce two novel training modes for semi-supervised learning.

  1. Full Labels Partial Dataset: A sparse subset of training images has full ground-truth labels, with the remaining data unlabelled.
  2. Partial Labels Full Dataset: All images have some labels, but covering only a sparse subset of pixels.

Running the following four scripts would train each mode with supervised or semi-supervised methods respectively:

python train_sup.py             # Supervised learning with full labels.
python train_semisup.py         # Semi-supervised learning with full labels.
python train_sup_partial.py     # Supervised learning with partial labels.
python train_semisup_patial.py  # Semi-supervised learning with partial labels.

Important Flags

All supervised and semi-supervised methods can be trained with different flags (hyper-parameters) when running each training script. We briefly introduce some important flags for the experiments below.

Flag Name Usage Comments
num_labels number of labelled images in the training set, choose 0 for training all labelled images only available in the full label mode
partial percentage of labeled pixels for each class in the training set, choose p0, p1, p5, p25 for training 1, 1%, 5%, 25% labelled pixel(s) respectively only available in the partial label mode
num_negatives number of negative keys sampled for each class in each mini-batch only applied when training with ReCo loss
num_queries number of queries sampled for each class in each mini-batch only applied when training with ReCo loss
output_dim dimensionality for pixel-level representation only applied when training with ReCo loss
temp temperature used in contrastive learning only applied when training with ReCo loss
apply_aug semi-supervised methods with data augmentation, choose cutout, cutmix, classmix only available in the semi-supervised methods; our implementations for CutOut, CutMix and ClassMix
weak_threshold weak threshold delta_w in active sampling only applied when training with ReCo loss
strong_threshold strong threshold delta_s in active sampling only applied when training with ReCo loss
apply_reco toggle on or off apply our proposed ReCo loss

Training ReCo + ClassMix with the fewest full label setting in each dataset (the least appeared classes in each dataset have appeared in 5 training images):

python train_semisup.py --dataset pascal --num_labels 60 --apply_aug classmix --apply_reco
python train_semisup.py --dataset cityscapes --num_labels 20 --apply_aug classmix --apply_reco
python train_semisup.py --dataset sun --num_labels 50 --apply_aug classmix --apply_reco

Training ReCo + ClassMix with the fewest partial label setting in each dataset (each class in each training image only has 1 labelled pixel):

python train_semisup_partial.py --dataset pascal --partial p0 --apply_aug classmix --apply_reco
python train_semisup_partial.py --dataset cityscapes --partial p0 --apply_aug classmix --apply_reco
python train_semisup_partial.py --dataset sun --partial p0 --apply_aug classmix --apply_reco

Training ReCo + Supervised with all labelled data:

python train_sup.py --dataset {DATASET} --num_labels 0 --apply_reco

Training with ReCo is expected to require 12 - 16G of memory in a single GPU setting. All the other baselines can be trained under 12G in a single GPU setting.

Visualisation on Pre-trained Models

We additionally provide the pre-trained baselines and our method for 20 labelled Cityscapes and 60 labelled Pascal VOC, as examples for visualisation. The precise mIoU performance for each model is listed in the following table. The pre-trained models will produce the exact same qualitative results presented in the original paper.

Supervised ClassMix ReCo + ClassMix
CityScapes (20 Labels) 38.10 [link] 45.13 [link] 50.14 [link]
Pascal VOC (60 Labels) 36.06 [link] 53.71 [link] 57.12 [link]

Download the pre-trained models with the links above, then create and place them into the folder model_weights in this repository. Run python visual.py to visualise the results.

Other Notices

  1. We observe that the performance for the full label semi-supervised setting in CityScapes dataset is not stable across different machines, for which all methods may drop 2-5% performance, though the ranking keeps the same. Different GPUs in the same machine do not affect the performance. The performance for the other datasets in the full label mode, and the performance for all datasets in the partial label mode is consistent.
  2. Please use --seed 0, 1, 2 to accurately reproduce/compare our results with the exactly same labelled and unlabelled split we used in our experiments.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@article{liu2021reco,
    title={Bootstrapping Semantic Segmentation with Regional Contrast},
    author={Liu, Shikun and Zhi, Shuaifeng and Johns, Edward and Davison, Andrew J},
    journal={arXiv preprint arXiv:2104.04465},
    year={2021}
}

Contact

If you have any questions, please contact [email protected].

Owner
Shikun Liu
Ph.D. Student, The Dyson Robotics Lab at Imperial College.
Shikun Liu
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

FwordCTF 2021 You can find here the source code of the challenges I wrote (Web and Bash) in FwordCTF 2021 and the source code of the platform with our

Kahla 5 Nov 25, 2022
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022
Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

stroke-predictions-ml-model machine learning model to predict individuals chance

Alex Volchek 1 Jan 03, 2022
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory = 8G Numpy 1.

46 Dec 14, 2022
Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

This repository has become incompatible with the latest and recommended version of Tensorflow 2.0 Instead of refactoring this code painfully, I create

M Faber 769 Dec 08, 2022
Vision transformers (ViTs) have found only limited practical use in processing images

CXV Convolutional Xformers for Vision Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-o

Cloudwalker 23 Sep 10, 2022
Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

Yonghye Kwon 8 Jul 27, 2022
Repository for MDPGT

MD-PGT Repository for implementing and reproducing the results for the paper MDPGT: Momentum-based Decentralized Policy Gradient Tracking. Available E

Xian Yeow Lee 2 Dec 30, 2021
PyTorch implementation of VAGAN: Visual Feature Attribution Using Wasserstein GANs

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 93 Aug 17, 2022
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in

Wen Wang 61 Dec 12, 2022
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

idn-solver Paper | Project Page This repository contains the code release of our ICCV 2021 paper: A Confidence-based Iterative Solver of Depths and Su

zhaowang 43 Nov 17, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
✔️ Visual, reactive testing library for Julia. Time machine included.

PlutoTest.jl (alpha release) Visual, reactive testing library for Julia A macro @test that you can use to verify your code's correctness. But instead

Pluto 68 Dec 20, 2022
[CVPR 2022] Deep Equilibrium Optical Flow Estimation

Deep Equilibrium Optical Flow Estimation This is the official repo for the paper Deep Equilibrium Optical Flow Estimation (CVPR 2022), by Shaojie Bai*

CMU Locus Lab 136 Dec 18, 2022
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍 基于SEGAN模型的改进版本,使用自主设计的非

Nitin 53 Nov 17, 2022
A self-supervised learning framework for audio-visual speech

AV-HuBERT (Audio-Visual Hidden Unit BERT) Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Robust Self-Supervised A

Meta Research 431 Jan 07, 2023
Job Assignment System by Real-time Emotion Detection

Emotion-Detection Job Assignment System by Real-time Emotion Detection Emotion is the essential role of facial expression and it could provide a lot o

1 Feb 08, 2022
Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation"

1 Introduction Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation". The code s

Liang Zhang 10 Dec 10, 2022