Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Overview

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation

In this repo, we provide the code for our paper : "Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?", available at https://arxiv.org/abs/2012.06166:

Getting Started

Minimum requirements

  1. Software :
  • torch==1.7.0
  • numpy==1.18.4
  • cv2==4.2.0
  • pyyaml==5.3.1

For both training and testing, metrics monitoring is done through visdom_logger (https://github.com/luizgh/visdom_logger). To install this package with pip, use the following command:

pip install git+https://github.com/luizgh/visdom_logger.git
  1. Hardware : A 11 GB+ CUDA-enabled GPU

Download data

All pre-processed from Google Drive

We provide the versions of Pascal-VOC 2012 and MS-COCO 2017 used in this work at https://drive.google.com/file/d/1Lj-oBzBNUsAqA9y65BDrSQxirV8S15Rk/view?usp=sharing. You can download the full .zip and directly extract it at the root of this repo.

If the previous download failed

Here is the structure of the data folder for you to reproduce:

data
├── coco
│   ├── annotations
│   ├── train
│   ├── train2014
│   ├── val
│   └── val2014
└── pascal
|    ├── JPEGImages
|    └── SegmentationClassAug

Pascal : The JPEG images can be found in the PascalVOC 2012 toolkit to be downloaded at PascalVOC2012 and SegmentationClassAug (pre-processed ground-truth masks).

Coco : Coco 2014 train, validation images and annotations can be downloaded at Coco. Once this is done, you will have to generate the subfolders coco/train and coco/val (ground truth masks). Both folders can be generated by executing the python script data/coco/create_masks.py (note that the script uses the package pycocotools that can be found at https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/pycocotools):

python

cd data/coco
python create_masks.py

About the train/val splits

The train/val splits are directly provided in lists/. How they were obtained is explained at https://github.com/Jia-Research-Lab/PFENet

Download pre-trained models

Pre-trained backbones

First, you will need to download the ImageNet pre-trained backbones at https://drive.google.com/drive/folders/1Hrz1wOxOZm4nIIS7UMJeL79AQrdvpj6v and put them under initmodel/. These will be used if you decide to train your models from scratch.

Pre-trained models

We directly provide the full pre-trained models at https://drive.google.com/file/d/1iuMAo5cJ27oBdyDkUI0JyGIEH60Ln2zm/view?usp=sharing. You can download them and directly extract them at the root of this repo. This includes Resnet50 and Resnet101 backbones on Pascal-5i, and Resnet50 on Coco-20i.

Overview of the repo

Data are located in data/. All the code is provided in src/. Default configuration files can be found in config_files/. Training and testing scripts are located in scripts/. Lists/ contains the train/validation splits for each dataset.

Training (optional)

If you want to use the pre-trained models, this step is optional. Otherwise, you can train your own models from scratch with the scripts/train.sh script, as follows.

bash scripts/train.sh {data} {fold} {[gpu_ids]} {layers}

For instance, if you want to train a Resnet50-based model on the fold-0 of Pascal-5i on GPU 1, use:

bash scripts/train.sh pascal 0 [1] 50

Note that this code supports distributed training. If you want to train on multiple GPUs, you may simply replace [1] in the previous examples with the list of gpus_id you want to use.

Testing

To test your models, use the scripts/test.sh script, the general synthax is:

bash scripts/test.sh {data} {shot} {[gpu_ids]} {layers}

This script will test successively on all folds of the current dataset. Below are presented specific commands for several experiments.

Pascal-5i

Results :

(1 shot/5 shot) Arch Fold-0 Fold-1 Fold-2 Fold-3 Mean
RePRI Resnet-50 59.8 / 64.6 68.3 / 71.4 62.1 / 71.1 48.5 / 59.3 59.7 / 66.6
Oracle-RePRI Resnet-50 72.4 / 75.1 78.0 / 80.8 77.1 / 81.4 65.8 / 74.4 73.3 / 77.9
RePRI Resnet-101 59.6 / 66.2 68.3 / 71.4 62.2 / 67.0 47.2 / 57.7 59.4 / 65.6
Oracle-RePRI Resnet-101 73.9 / 76.8 79.7 / 81.7 76.1 / 79.5 65.1 / 74.5 73.7 / 78.1

Command:

bash scripts/test.sh pascal 1 [0] 50  # 1-shot
bash scripts/test.sh pascal 5 [0] 50  # 5-shot

Coco-20i

Results :

(1 shot/5 shot) Arch Fold-0 Fold-1 Fold-2 Fold-3 Mean
RePRI Resnet-50 32.0 / 39.3 38.7 / 45.4 32.7 / 39.7 33.1 / 41.8 34.1/41.6
Oracle-RePRI Resnet-50 49.3 / 51.5 51.4 / 60.8 38.2 / 54.7 41.6 / 55.2 45.1 / 55.5

Command :

bash scripts/test.sh coco 1 [0] 50  # 1-shot
bash scripts/test.sh coco 5 [0] 50  # 5-shot

Coco-20i -> Pascal-VOC

The folds used for cross-domain experiments are presented in the image below:

Results :

(1 shot/5 shot) Arch Fold-0 Fold-1 Fold-2 Fold-3 Mean
RePRI Resnet-50 52.8 / 57.7 64.0 / 66.1 64.1 / 67.6 71.5 / 73.1 63.1 / 66.2
Oracle-RePRI Resnet-50 69.6 / 73.5 71.7 / 74.9 77.6 / 82.2 86.2 / 88.1 76.2 / 79.7

Command :

bash scripts/test.sh coco2pascal 1 [0] 50  # 1-shot
bash scripts/test.sh coco2pascal 5 [0] 50  # 5-shot

Monitoring metrics

For both training and testing, you can monitor metrics using visdom_logger (https://github.com/luizgh/visdom_logger). To install this package, simply clone the repo and install it with pip:

git clone https://github.com/luizgh/visdom_logger.git
pip install -e visdom_logger

Then, you need to start a visdom server with:

python -m visdom.server -port 8098

Finally, add the line visdom_port 8098 in the options in scripts/train.sh or scripts/test.sh, and metrics will be displayed at this port. You can monitor them through your navigator.

Contact

For further questions or details, please post an issue or directly reach out to Malik Boudiaf ([email protected])

Acknowledgments

We gratefully thank the authors of https://github.com/Jia-Research-Lab/PFENet, as well as https://github.com/hszhao/semseg from which some parts of our code are inspired.

Owner
Malik Boudiaf
Malik Boudiaf
Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".

Shaping Visual Representations with Attributes for Few-Shot Learning This code implements the Shaping Visual Representations with Attributes for Few-S

chx_nju 9 Sep 01, 2022
🔀 Visual Room Rearrangement

AI2-THOR Rearrangement Challenge Welcome to the 2021 AI2-THOR Rearrangement Challenge hosted at the CVPR'21 Embodied-AI Workshop. The goal of this cha

AI2 55 Dec 22, 2022
A bare-bones Python library for quality diversity optimization.

pyribs Website Source PyPI Conda CI/CD Docs Docs Status Twitter pyribs.org GitHub docs.pyribs.org A bare-bones Python library for quality diversity op

ICAROS 127 Jan 06, 2023
alfred-py: A deep learning utility library for **human**

Alfred Alfred is command line tool for deep-learning usage. if you want split an video into image frames or combine frames into a single video, then a

JinTian 800 Jan 03, 2023
ICS 4u HD project, start before-wards. A curtain shooting game using python.

Touhou-Star-Salvation HDCH ICS 4u HD project, start before-wards. A curtain shooting game using python and pygame. By Jason Li For arts and gameplay,

15 Dec 22, 2022
STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

STARCH (Storm Tracking And Regional CHaracterization) STARCH computes regional extreme storm physical and moisture balance characteristics based on sp

Onosama 7 Oct 20, 2022
Database Reasoning Over Text project for ACL paper

Database Reasoning over Text This repository contains the code for the Database Reasoning Over Text paper, to appear at ACL2021. Work is performed in

Facebook Research 320 Dec 12, 2022
Yolact-keras实例分割模型在keras当中的实现

Yolact-keras实例分割模型在keras当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料 Reference 性能情况 训练数

Bubbliiiing 11 Dec 26, 2022
Hand gesture recognition model that can be used as a remote control for a smart tv.

Gesture_recognition The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds lon

Pratyush Negi 1 Aug 11, 2022
Audio Visual Emotion Recognition using TDA

Audio Visual Emotion Recognition using TDA RAVDESS database with two datasets analyzed: Video and Audio dataset: Audio-Dataset: https://www.kaggle.com

Combinatorial Image Analysis research group 3 May 11, 2022
Face Recognition Attendance Project

Face-Recognition-Attendance-Project In This Project You will learn how to mark attendance using face recognition, Hello Guys This is Gautam Kumar, Thi

Gautam Kumar 1 Dec 03, 2022
Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

This is a repository for the paper of "Well-classified Examples are Underestimated in Classification with Deep Neural Networks" The implementation and

LancoPKU 25 Dec 11, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

CPC_audio This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper Unsupervised Pretraining Transfers we

Meta Research 283 Dec 30, 2022
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

NNI Doc | 简体中文 NNI (Neural Network Intelligence) is a lightweight but powerful toolkit to help users automate Feature Engineering, Neural Architecture

Microsoft 12.4k Dec 31, 2022
A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021

PGL-SUM: Combining Global and Local Attention with Positional Encoding for Video Summarization PyTorch Implementation of PGL-SUM From "PGL-SUM: Combin

Evlampios Apostolidis 35 Dec 22, 2022
Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data

Ayush Daksh 12 Dec 01, 2022
How to train a CNN to 99% accuracy on MNIST in less than a second on a laptop

Training a NN to 99% accuracy on MNIST in 0.76 seconds A quick study on how fast you can reach 99% accuracy on MNIST with a single laptop. Our answer

Tuomas Oikarinen 42 Dec 10, 2022
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 06, 2022