Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Overview

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

Pytorch implementation for "Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity" (CVPR 2022, link TBD) by Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, and Du Tran. We propose a framework for open-world instance segmentation, Generic Grouping Network (GGN), which exploits pseudo Ground Truth training strategy. On the same backbone, GGN produces impressive AR gains compared to closed-world training on cross-category generalization (+11% VOC to Non-VOC) and cross-dataset generalization (+5.2% COCO to UVO).

What is it? Open-world instance segmentation requires a model to group pixels into object instances without a pre-defined taxonomy, that is, both "seen" categories (those present during training) and "unseen" categories (not seen during training). There is generally a large performance gap between the seen and unseen domains. For example, a baseline Mask R-CNN miss 15 annotated masks in the example below. Without additional training data or annotations, Mask R-CNN trained with GGN framework produces 9 more segments correctly, being much closer to ground truth annotations.

How we do it? Our approach first learns a pairwise affinity predictor that captures correctly if two pixels belong to same instance or not. We demonstrate such pairwise affinity representation generalizes well to unseen domains. We then use a grouping module (e.g. MCG) to extract and rank segments from predicted PA. We can run this on any image dataset without using annotations; we extract highest ranked segments as "pseudo ground truth" candidate masks. This is a large and category-agnostic set; we add it to our (much smaller) datasets of curated annotations to train a detector.


About the code. This repo is built based on mmdetection with the addition of OLN backbone (concurrent work). The repo is tested under Python 3.7, PyTorch 1.7.0, Cuda 11.0, and mmcv==1.2.5. We thank authors of OLN for releasing their work to facilitate research.

Model zoo

Below we release PA predictor models, pseudo-GT generated by PA predictors and GGN trained with both annotated-GT and pseudo-GT. We also release some of the processed annotations from LVIS to conduct cross-category generalization experiments.

Training Eval url Baseline AR GGN AR Top-K Pseudo
Person, COCO Non-Person, COCO PA/Pseudo/GGN 4.9 20.9 3
VOC, COCO Non-VOC, COCO PA/Pseudo/Pseudo-OLN/ GGN/GGN-OLN 19.9 28.7 (33.7 with OLN) 3
COCO, LVIS Non-COCO, LVIS PA/Pseudo/GGN 16.5 20.4 1
Non-COCO, LVIS COCO PA/Pseudo/GGN 21.7 23.6 1
COCO UVO PA/Pseudo/GGN 40.1 43.4 3
COCO, random init ImageNet PA/Pseudo/GGN 10

We remark using large-scale pre-training in the last row as initialization and finetune GGN on COCO with pseudo-GT on COCO gives further improvement (45.3 on UVO), with model.

Installation

This repo is built based on mmdetection.

You can use following commands to create conda env with related dependencies.

conda create -n ggn python=3.7 -y
conda activate ggn
conda install pytorch=1.7.0 torchvision cudatoolkit=11.0 -c pytorch -y
pip install mmcv-full
pip install -r requirements.txt
pip install -v -e .

Please also refer to get_started.md for more details of installation.

Next you will need to build the library for our grouping module:

cd pa_lib/cython_lib
python3 setup.py build_ext --inplace

Data Preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Our work also uses LVIS, UVO and ADE20K. To use ADE20K, please convert them into COCO-style annotations.

Training of pairwise affinity predictor

bash tools/dist_train.sh configs/pairwise_affinity/pa_train.py ${NUM_GPUS} --work-dir ${WORK_DIR}

Test PA

We provide a tool tools/test_pa.py to directly evaluate PA performance (e.g. on PA prediction and on grouped masks).

python tools/test_pa.py configs/pairwise_affinity/pa_train.py ${WORK_DIR}/latest.pth --eval pa --eval-proposals --test-partition nonvoc

Extracting pseudo-GT masks

We first begin by extracting masks. Example config pa_extract.py extracts pseudo-GT masks from PA trained on VOC subsets of COCO. use-gt-masks flag asks the pipeline to compute maximum IoU an extracted masks has with the GT. It is recommended to split the dataset into multiple shards to run extractions. On original image resolution and Nvidia V100 machine, it takes about 4.8s per image to run the full pipeline (compute PA, run grouping, ranking then compute IoU with annotated GT) without globalization and trained ranker or 10s with globalization and trained ranker.

python tools/extract_pa_masks.py configs/pairwise_affinity/pa_extract.py ${PA_MODEL_PATH} --out ${OUT_DIR}/masks.json --use-gt-masks 1

The extracted masks will be stored in JSON with the following format

[
  [segm1, segm2,..., segm20] ## Result of an image
  ...
]

We refer to tools/merge_annotations.py for reference on formatting the extracted masks as a new COCO-style annotation file. We remark that tools/interpolate_extracted_masks.py may be necessary if not running extraction on original image resolution.

Training of GGN

Please specify additional_ann_file with the extracted pseudo-GT in previous step in class_agn_mask_rcnn_pa.py.

bash tools/dist_train.sh configs/mask_rcnn/class_agn_mask_rcnn_pa.py ${NUM_GPUS}

class_agn_mask_rcnn_gn_online.py is used to train ImageNet extracted masks since there are too many annotations and we cannot store everything in a single json file without OOM. We will need to break it into per-image annotations in the format of "{image_id}.json".

Testing

python tools/test.py configs/mask_rcnn/class_agn_mask_rcnn.py ${WORK_DIR}/latest.pth --eval segm

To cite this work

@article{wang2022ggn,
  title={Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity},
  author={Wang, Weiyao and Feiszli, Matt and Wang, Heng and Malik, Jitendra and Tran, Du},
  journal={CVPR},
  year={2022}
}

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Owner
Meta Research
Meta Research
Agile SVG maker for python

Agile SVG Maker Need to draw hundreds of frames for a GIF? Need to change the style of all pictures in a PPT? Need to draw similar images with differe

SemiWaker 4 Sep 25, 2022
Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

An official implementation of paper Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

11 Nov 23, 2022
Where2Act: From Pixels to Actions for Articulated 3D Objects

Where2Act: From Pixels to Actions for Articulated 3D Objects The Proposed Where2Act Task. Given as input an articulated 3D object, we learn to propose

Kaichun Mo 69 Nov 28, 2022
Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

DV Lab 39 Nov 23, 2022
The full training script for Enformer (Tensorflow Sonnet) on TPU clusters

Enformer TPU training script (wip) The full training script for Enformer (Tensorflow Sonnet) on TPU clusters, in an effort to migrate the model to pyt

Phil Wang 10 Oct 19, 2022
OpenMMLab 3D Human Parametric Model Toolbox and Benchmark

Introduction English | 简体中文 MMHuman3D is an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and comput

OpenMMLab 782 Jan 04, 2023
[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

VITA 40 Dec 30, 2022
Personalized Federated Learning using Pytorch (pFedMe)

Personalized Federated Learning with Moreau Envelopes (NeurIPS 2020) This repository implements all experiments in the paper Personalized Federated Le

Charlie Dinh 226 Dec 30, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)

UDL UDL is a practicable framework used in Deep Learning (computer vision). Benchmark codes, results and models are available in UDL, please contact @

Xiao Wu 11 Sep 30, 2022
Code repository for our paper "Learning to Generate Scene Graph from Natural Language Supervision" in ICCV 2021

Scene Graph Generation from Natural Language Supervision This repository includes the Pytorch code for our paper "Learning to Generate Scene Graph fro

Yiwu Zhong 64 Dec 24, 2022
Implementation of momentum^2 teacher

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning Requirements All experiments are done with python3.6, torch

jemmy li 121 Sep 26, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"

Introduction This repository contains research code for the ACL 2021 paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual

AdapterHub 20 Aug 04, 2022
Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

Implementation of our ECCV 2020 paper The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation This repo contains code o

twang 98 Sep 17, 2022
Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

ApproxMVBB Status Build UnitTests Homepage Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in

Gabriel Nützi 390 Dec 31, 2022
This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch code for Robust Vision Transformers.

117 Dec 07, 2022
Elegy is a framework-agnostic Trainer interface for the Jax ecosystem.

Elegy Elegy is a framework-agnostic Trainer interface for the Jax ecosystem. Main Features Easy-to-use: Elegy provides a Keras-like high-level API tha

435 Dec 30, 2022
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023