Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Overview

MosaicOS

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Introduction

Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images.

We propose Mosaic of Object-centric images as Scene-centric images (MosaicOS), a simple and novel framework that is surprisingly effective at tackling the challenges of long-tailed object detection. Keys to our approach are three-fold: (i) pseudo scene-centric image construction from object-centric images for mitigating domain differences, (ii) high-quality bounding box imputation using the object-centric images’ class labels, and (iii) a multistage training procedure. Check our paper for further details:

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

by Cheng Zhang*, Tai-Yu Pan*, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao.

Mosaics

The script mosaic.py generates mosaic images and annotaions by given an annotation file in COCO format (for more information here). The following command will generate 2x2 mosaic images and the annotation file for COCO training dataset in OUTPUT_DIR/images/ and OUTPUT_DIR/annotation.json with 4 processors. --shuffle is to shuffle the order of images to synthesize and --drop-last is to drop the last couple of images if they are not enough for nrow * ncol. --demo 10 plots 10 synthesized images with annotated boxes in OUTPUT_DIR/demo/ for visualization.

 python mosaic.py --coco-file datasets/coco/annotations/instances_train2017.json --img-dir datasets/coco --output-dir output_mosaics --num-proc 4 --nrow 2 --ncol 2 --shuffle --drop-last --demo 10

*Note: In our work, we sythesize mosaics from object-centric images with pseudo bounding box to find-tune the pre-trained detector.

Pre-trained models

Our impelementation is based on Detectron2. All models are trained on LVIS training set with Repeated Factor Sampling (RFS).

LVIS v0.5 validation set

  • Object detection
Backbone Method APb APbr APbc APbf Download
R50-FPN Faster R-CNN 23.4 13.0 22.6 28.4 model
R50-FPN MosaicOS 25.0 20.2 23.9 28.3 model
  • Instance segmentation
Backbone Method AP APr APc APf APb Download
R50-FPN Mask R-CNN 24.4 16.0 24.0 28.3 23.6 model
R50-FPN MosaicOS 26.3 19.7 26.6 28.5 25.8 model

LVIS v1.0 validation set

  • Object detection
Backbone Method APb APbr APbc APbf Download
R50-FPN Faster R-CNN 22.0 10.6 20.1 29.2 model
R50-FPN MosaicOS 23.9 15.5 22.4 29.3 model
  • Instance segmentation
Backbone Method AP APr APc APf APb Download
R50-FPN Mask R-CNN 22.6 12.3 21.3 28.6 23.3 model
R50-FPN MosaicOS 24.5 18.2 23.0 28.8 25.1 model
R101-FPN Mask R-CNN 24.8 15.2 23.7 30.3 25.5 model
R101-FPN MosaicOS 26.7 20.5 25.8 30.5 27.4 model
X101-FPN Mask R-CNN 26.7 17.6 25.6 31.9 27.4 model
X101-FPN MosaicOS 28.3 21.8 27.2 32.4 28.9 model

Citation

Please cite with the following bibtex if you find it useful.

@inproceedings{zhang2021mosaicos,
  title={{MosaicOS}: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection},
  author={Zhang, Cheng and Pan, Tai-Yu and Li, Yandong and Hu, Hexiang and Xuan, Dong and Changpinyo, Soravit and Gong, Boqing and Chao, Wei-Lun},
  booktitle = {ICCV},
  year={2021}
}

Questions

Feel free to email us if you have any questions.

Cheng Zhang ([email protected]), Tai-Yu Pan ([email protected]), Wei-Lun Harry Chao ([email protected])

Owner
Cheng Zhang
Cheng Zhang
Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Hierarchical Skills for Efficient Exploration This is the source code release for the paper Hierarchical Skills for Efficient Exploration. It contains

Facebook Research 38 Dec 06, 2022
Auto grind btdb2 exp for tower

Bloons TD Battles 2 EXP Grinder Auto grind btdb2 exp for towers Setup I suggest checking out every screenshot to see what they are supposed to be, so

Vincent 6 Jul 29, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

The ISMIR 2020 Beat Detection, Downbeat Detection and Tempo Estimation Model Implementation. This is an implementation in TensorFlow to implement the

Koen van den Brink 1 Nov 12, 2021
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

Hera Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. Setting up Step 1. Plant the spy Install the package pip

Keplr 495 Dec 10, 2022
LiDAR R-CNN: An Efficient and Universal 3D Object Detector

LiDAR R-CNN: An Efficient and Universal 3D Object Detector Introduction This is the official code of LiDAR R-CNN: An Efficient and Universal 3D Object

TuSimple 295 Jan 05, 2023
Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application

QuickDraw - AirGesture Introduction Here is my python source code for QuickDraw - an online game developed by google, combined with AirGesture - a sim

Viet Nguyen 89 Dec 18, 2022
Unofficial Tensorflow Implementation of ConvNeXt from A ConvNet for the 2020s

Tensorflow Implementation of "A ConvNet for the 2020s" This is the unofficial Tensorflow Implementation of ConvNeXt from "A ConvNet for the 2020s" pap

DK 11 Oct 12, 2022
This is an unofficial PyTorch implementation of Meta Pseudo Labels

This is an unofficial PyTorch implementation of Meta Pseudo Labels. The official Tensorflow implementation is here.

Jungdae Kim 320 Jan 08, 2023
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

Jacob Gildenblat 6.6k Jan 06, 2023
Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Unified Objective for Novel Class Discovery This is the official repository for the paper: A Unified Objective for Novel Class Discovery Enrico Fini

Enrico Fini 118 Dec 26, 2022
Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

hydrosphere.io 317 Dec 01, 2022
MoCoGAN: Decomposing Motion and Content for Video Generation

MoCoGAN: Decomposing Motion and Content for Video Generation This repository contains an implementation and further details of MoCoGAN: Decomposing Mo

Sergey Tulyakov 514 Dec 18, 2022
This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

Cybercore Co. Ltd 78 Dec 29, 2022
Integrated physics-based and ligand-based modeling.

ComBind ComBind integrates data-driven modeling and physics-based docking for improved binding pose prediction and binding affinity prediction. Given

Dror Lab 44 Oct 26, 2022
AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

AI Virtual Calculator: This is a simple virtual calculator that works with gestures using OpenCV. We will use our hand in the air to click on the calc

Md. Rakibul Islam 1 Jan 13, 2022
使用深度学习框架提取视频硬字幕;docker容器免安装深度学习库,使用本地api接口使得界面和后端识别分离;

extract-video-subtittle 使用深度学习框架提取视频硬字幕; 本地识别无需联网; CPU识别速度可观; 容器提供API接口; 运行环境 本项目运行环境非常好搭建,我做好了docker容器免安装各种深度学习包; 提供windows界面操作; 容器为CPU版本; 视频演示 https

歌者 16 Aug 06, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023