Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Last update: Jan 07, 2023

Overview

Awesome Few-Shot Object Detection (FSOD)

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

For an introduction to the few-shot object detection framework read below, or check our our survey on few-shot and self-supervised object detection and its project page for full explanations, discussions on the pitfalls of the Pascal, COCO, and LVIS benchmarks used below, main takeaways and future research directions.

Contributing

If you want to add your paper or report a mistake, please create a pull request with all supporting information. Thanks!

Pascal VOC and MS COCO FSOD Leaderboard

In this table we distinguish Kang's Splits (Meta-YOLO) from TFA's splits (Frustratingly Simple FSOD), as the Kang splits have been shown to have high variance and overestimate performance for low number of shots (see for yourself -- check the difference between TFA 1-shot and Kang 1-shot in the table below).

Name	Type	VOC TFA 1-shot (mAP50)	VOC TFA 3-shot (mAP50)	VOC TFA 10-shot (mAP50)	VOC Kang 1-shot (mAP50)	VOC Kang 3-shot (mAP50)	VOC Kang 10-shot (mAP50)	MS COCO 10-shot (mAP)	MS COCO 30-shot (mAP)
LSTD	finetuning	-	-	-	8.2	12.4	38.5	-	-
RepMet	prototype	-	-	-	26.1	34.4	41.3	-	-
Meta-YOLO	modulation	14.2	29.8	-	14.8	26.7	47.2	5.6	9.1
MetaDet	modulation	-	-	-	18.9	30.2	49.6	7.1	11.3
Meta-RCNN	modulation	-	-	-	19.9	35.0	51.5	8.7	12.4
Faster RCNN+FT	finetuning	9.9	21.6	35.6	15.2	29.0	45.5	9.2	12.5
ACM-MetaRCNN	modulation	-	-	-	31.9	35.9	53.1	9.4	12.8
TFA w/fc	finetuning	22.9	40.4	52.0	36.8	43.6	57.0	10.0	13.4
TFA w/cos	finetuning	25.3	42.1	52.8	39.8	44.7	56.0	10.0	13.7
Retentive RCNN	finetuning	-	-	-	42.0	46.0	56.0	10.5	13.8
MPSR	finetuning	-	-	-	41.7	51.4	61.8	9.8	14.1
Attention-FSOD	modulation	-	-	-	-	-	-	12.0	-
FsDetView	finetuning	24.2	42.2	57.4	-	-	-	12.5	14.7
CME	finetuning	-	-	-	41.5	50.4	60.9	15.1	16.9
TIP	add-on	27.7	43.3	59.6	-	-	-	16.3	18.3
DAnA	modulation	-	-	-	-	-	-	18.6	21.6
DeFRCN	prototype	-	-	-	53.6	61.5	60.8	18.5	22.6
Meta-DETR	modulation	20.4	46.6	57.8	-	-	-	17.8	22.9
DETReg	finetuning	-	-	-	-	-	-	18.0	30.0

Few-Shot Object Detection Explained

We explain the few-shot object detection framework as defined by the Meta-YOLO paper (Kang's splits - full details here). FSOD partitions objects into two disjoint sets of categories: base or known/source classes, which are object categories for which we have access to a large number of training examples; and novel or unseen/target classes, for which we have only a few training examples (shots) per class. The FSOD task is formalized into the following steps:

1. Base training.¹ Annotations are given only for the base classes, with a large number of training examples per class (bikes in the example). We train the FSOD method on the base classes.
2. Few-shot finetuning. Annotations are given for the support set, a very small number of training examples from both the base and novel classes (one bike and one human in the example). Most methods finetune the FSOD model on the support set, but some methods might only use the support set for conditioning during evaluation (finetuning-free methods).
3. Few-shot evaluation. We evaluate the FSOD to jointly detect base and novel classes from the test set (few-shot refers to the size of the support set). The performance metrics are reported separately for base and novel classes. Common evaluation metrics are variants of the mean average precision: mAP50 for Pascal and COCO-style mAP for COCO. They are often denoted bAP50, bAP75, bAP (resp. nAP50, nAP75, nAP) for the base and novel classes respectively, where the number is the IoU-threshold in percentage.

In pure FSOD, methods are usually compared solely on the basis of novel class performance, whereas in Generalized FSOD, methods are compared on both base and novel class performances [2]. Note that "training" and "test" set refer to the splits used in traditional object detection. Base and novel classes are typically present in both the training and testing sets; however, the novel class annotations are filtered out from the training set during base training; during few-shot finetuning, the support set is typically taken to be a (fixed) subset of the training set; during few-shot evaluation, all of the test set is used to reduce uncertainty [1].

For conditioning-based methods with no finetuning, few-shot finetuning and few-shot evaluation are merged into a single step; the novel examples are used as support examples to condition the model, and predictions are made directly on the test set. In practice, the majority of conditioning-based methods reviewed in this survey do benefit from some form of finetuning.

*¹In the context of self-supervised learning, base-training may also be referred to as finetuning or training. This should not be confused with base training in the meta-learning framework; rather this is similar to the meta-training phase [3].

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Related tags

Overview

Awesome Few-Shot Object Detection (FSOD)

Contributing

Pascal VOC and MS COCO FSOD Leaderboard

Few-Shot Object Detection Explained

Owner

Gabriel Huang

Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning

Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021)

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

The (Official) PyTorch Implementation of the paper "Deep Extraction of Manga Structural Lines"

Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

kullanışlı ve işinizi kolaylaştıracak bir araç

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

ROS support for Velodyne 3D LIDARs

Anti-UAV base on PaddleDetection

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Model Zoo of BDD100K Dataset

ivadomed is an integrated framework for medical image analysis with deep learning.

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

PyArmadillo: an alternative approach to linear algebra in Python

[ECCV'20] Convolutional Occupancy Networks