Localization Distillation for Object Detection

Overview

Localization Distillation for Object Detection

This repo is based on mmDetection.

This is the code for our paper:

LD is the extension of knowledge distillation on localization task, which utilizes the learned bbox distributions to transfer the localization dark knowledge from teacher to student.

LD stably improves over GFocalV1 about ~0.8 AP and ~1 AR100 without adding any computational cost!

Introduction

Knowledge distillation (KD) has witnessed its powerful ability in learning compact models in deep learning field, but it is still limited in distilling localization information for object detection. Existing KD methods for object detection mainly focus on mimicking deep features between teacher model and student model, which not only is restricted by specific model architectures, but also cannot distill localization ambiguity. In this paper, we first propose localization distillation (LD) for object detection. In particular, our LD can be formulated as standard KD by adopting the general localization representation of bounding box. Our LD is very flexible, and is applicable to distill localization ambiguity for arbitrary architecture of teacher model and student model. Moreover, it is interesting to find that Self-LD, i.e., distilling teacher model itself, can further boost state-of-the-art performance. Second, we suggest a teacher assistant (TA) strategy to fill the possible gap between teacher model and student model, by which the distillation effectiveness can be guaranteed even the selected teacher model is not optimal. On benchmark datasets PASCAL VOC and MS COCO, our LD can consistently improve the performance for student detectors, and also boosts state-of-the-art detectors notably.

Installation

Please refer to INSTALL.md for installation and dataset preparation.

Get Started

Please see GETTING_STARTED.md for the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'

./tools/dist_train.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py 8

Learning rate setting

lr=(samples_per_gpu * num_gpu) / 16 * 0.01

For 2 GPUs and mini-batch size 6, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.00375, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=3,

For 8 GPUs and mini-batch size 16, the relevant portion of the config file would be:

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
data = dict(
    samples_per_gpu=2,

Convert model

After training with LD, the weight file .pth will be large. You'd better convert the model to save a new small one. See convert_model.py#L38-L40, you can set them to your .pth file and config file. Then, run

python convert_model.py

Speed Test (FPS)

CUDA_VISIBLE_DEVICES=0 python3 ./tools/benchmark.py configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth

COCO Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r50_fpn_coco_1x.py work_dirs/ld_gflv1_r101_r50_fpn_coco_1x/epoch_24.pth 8 --eval bbox

GFocalV1 with LD

Teacher Student Training schedule Mini-batch size AP (val) AP50 (val) AP75 (val) AP (test-dev) AP50 (test-dev) AP75 (test-dev) AR100 (test-dev)
-- R-18 1x 6 35.8 53.1 38.2 36.0 53.4 38.7 55.3
R-101 R-18 1x 6 36.5 52.9 39.3 36.8 53.5 39.9 56.6
-- R-34 1x 6 38.9 56.6 42.2 39.2 56.9 42.3 58.0
R-101 R-34 1x 6 39.8 56.6 43.1 40.0 57.1 43.5 59.3
-- R-50 1x 6 40.1 58.2 43.1 40.5 58.8 43.9 59.0
R-101 R-50 1x 6 41.1 58.7 44.9 41.2 58.8 44.7 59.8
-- R-101 2x 6 44.6 62.9 48.4 45.0 63.6 48.9 62.3
R-101-DCN R-101 2x 6 45.4 63.1 49.5 45.6 63.7 49.8 63.3

GFocalV1 with Self-LD

Teacher Student Training schedule Mini-batch size AP (val) AP50 (val) AP75 (val)
-- R-18 1x 6 35.8 53.1 38.2
R-18 R-18 1x 6 36.1 52.9 38.5
-- R-50 1x 6 40.1 58.2 43.1
R-50 R-50 1x 6 40.6 58.2 43.8
-- X-101-32x4d-DCN 1x 4 46.9 65.4 51.1
X-101-32x4d-DCN X-101-32x4d-DCN 1x 4 47.5 65.8 51.8

GFocalV2 with LD

Teacher Student Training schedule Mini-batch size AP (test-dev) AP50 (test-dev) AP75 (test-dev) AR100 (test-dev)
-- R-50 2x 16 44.4 62.3 48.5 62.4
R-101 R-50 2x 16 44.8 62.4 49.0 63.1
-- R-101 2x 16 46.0 64.1 50.2 63.5
R-101-DCN R-101 2x 16 46.8 64.5 51.1 64.3
-- R-101-DCN 2x 16 48.2 66.6 52.6 64.4
R2-101-DCN R-101-DCN 2x 16 49.1 67.1 53.7 65.6
-- X-101-32x4d-DCN 2x 16 49.0 67.6 53.4 64.7
R2-101-DCN X-101-32x4d-DCN 2x 16 50.2 68.3 54.9 66.3
-- R2-101-DCN 2x 16 50.5 68.9 55.1 66.2
R2-101-DCN R2-101-DCN 2x 16 51.0 69.1 55.9 66.8

VOC Evaluation

./tools/dist_test.sh configs/ld/ld_gflv1_r101_r18_fpn_voc.py work_dirs/ld_gflv1_r101_r18_fpn_voc/epoch_4.pth 8 --eval mAP

GFocalV1 with LD

Teacher Student Training Epochs Mini-batch size AP AP50 AP75
-- R-18 4 6 51.8 75.8 56.3
R-101 R-18 4 6 53.0 75.9 57.6
-- R-50 4 6 55.8 79.0 60.7
R-101 R-50 4 6 56.1 78.5 61.2
-- R-34 4 6 55.7 78.9 60.6
R-101-DCN R-34 4 6 56.7 78.4 62.1
-- R-101 4 6 57.6 80.4 62.7
R-101-DCN R-101 4 6 58.4 80.2 63.7

This is an example of evaluation results (R-101→R-18).

+-------------+------+-------+--------+-------+
| class       | gts  | dets  | recall | ap    |
+-------------+------+-------+--------+-------+
| aeroplane   | 285  | 4154  | 0.081  | 0.030 |
| bicycle     | 337  | 7124  | 0.125  | 0.108 |
| bird        | 459  | 5326  | 0.096  | 0.018 |
| boat        | 263  | 8307  | 0.065  | 0.034 |
| bottle      | 469  | 10203 | 0.051  | 0.045 |
| bus         | 213  | 4098  | 0.315  | 0.247 |
| car         | 1201 | 16563 | 0.193  | 0.131 |
| cat         | 358  | 4878  | 0.254  | 0.128 |
| chair       | 756  | 32655 | 0.053  | 0.027 |
| cow         | 244  | 4576  | 0.131  | 0.109 |
| diningtable | 206  | 13542 | 0.150  | 0.117 |
| dog         | 489  | 6446  | 0.196  | 0.076 |
| horse       | 348  | 5855  | 0.144  | 0.036 |
| motorbike   | 325  | 6733  | 0.052  | 0.017 |
| person      | 4528 | 51959 | 0.099  | 0.037 |
| pottedplant | 480  | 12979 | 0.031  | 0.009 |
| sheep       | 242  | 4706  | 0.132  | 0.060 |
| sofa        | 239  | 9640  | 0.192  | 0.060 |
| train       | 282  | 4986  | 0.142  | 0.042 |
| tvmonitor   | 308  | 7922  | 0.078  | 0.045 |
+-------------+------+-------+--------+-------+
| mAP         |      |       |        | 0.069 |
+-------------+------+-------+--------+-------+
AP:  0.530091167986393
['AP50: 0.759393', 'AP55: 0.744544', 'AP60: 0.724239', 'AP65: 0.693551', 'AP70: 0.639848', 'AP75: 0.576284', 'AP80: 0.489098', 'AP85: 0.378586', 'AP90: 0.226534', 'AP95: 0.068834']
{'mAP': 0.7593928575515747}

Note:

  • For more experimental details, please refer to GFocalV1, GFocalV2 and mmdetection.
  • According to ATSS, there is no gap between box-based regression and point-based regression. Personal conjectures: 1) If xywh form is able to work when using general distribution (apply uniform subinterval division for xywh), our LD can also work in xywh form. 2) If xywh form with general distribution cannot obtain better result, then the best modification is to firstly switch xywh form to tblr form and then apply general distribution and LD. Consequently, whether xywh form + general distribution works or not, our LD benefits for all the regression-based detector.

Pretrained weights

VOC COCO
GFocalV1 teacher R101 pan.baidu pw: ufc8 GFocalV1 + LD R101_R18_1x pan.baidu pw: hj8d
GFocalV1 teacher R101DCN pan.baidu pw: 5qra GFocalV1 + LD R101_R50_1x pan.baidu pw: bvzz
GFocalV1 + LD R101_R18 pan.baidu pw: 1bd3 GFocalV2 + LD R101_R50_2x pan.baidu pw: 3jtq
GFocalV1 + LD R101DCN_R34 pan.baidu pw: thuw GFocalV2 + LD R101DCN_R101_2x pan.baidu pw: zezq
GFocalV1 + LD R101DCN_R101 pan.baidu pw: mp8t GFocalV2 + LD R2N_R101DCN_2x pan.baidu pw: fsbm
GFocalV2 + LD R2N_X101_2x pan.baidu pw: 9vcc
GFocalV2 + Self-LD R2N_R2N_2x pan.baidu pw: 9azn

For any other teacher model, you can download at GFocalV1, GFocalV2 and mmdetection.

Score voting Cluster-DIoU-NMS

We provide Score voting Cluster-DIoU-NMS which is a speed up version of score voting NMS and combination with DIoU-NMS. For GFocalV1 and GFocalV2, Score voting Cluster-DIoU-NMS will bring 0.1-0.3 AP increase, 0.2-0.5 AP75 increase, <=0.4 AP50 decrease and <=1.5 FPS decrease, while it is much faster than score voting NMS in mmdetection. The relevant portion of the config file would be:

# Score voting Cluster-DIoU-NMS
test_cfg = dict(
nms=dict(type='voting_cluster_diounms', iou_threshold=0.6),

# Original NMS
test_cfg = dict(
nms=dict(type='nms', iou_threshold=0.6),

Citation

If you find LD useful in your research, please consider citing:

@Article{zheng2021LD,
  title={Localization Distillation for Object Detection},
  author= {Zhaohui Zheng, Rongguang Ye, Ping Wang, Jun Wang, Dongwei Ren, Wangmeng Zuo},
  journal={arXiv:2102.12252},
  year={2021}
}
Owner
Master student
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022
PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

FIERY This is the PyTorch implementation for inference and training of the future prediction bird's-eye view network as described in: FIERY: Future In

Wayve 406 Dec 24, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

Microsoft 25 Dec 02, 2022
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
A mini lib that implements several useful functions binding to PyTorch in C++.

Torch-gather A mini library that implements several useful functions binding to PyTorch in C++. What does gather do? Why do we need it? When dealing w

maxwellzh 8 Sep 07, 2022
EfficientNetV2 implementation using PyTorch

EfficientNetV2-S implementation using PyTorch Train Steps Configure imagenet path by changing data_dir in train.py python main.py --benchmark for mode

Jahongir Yunusov 86 Dec 29, 2022
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
Semi-Supervised Learning with Ladder Networks in Keras. Get 98% test accuracy on MNIST with just 100 labeled examples !

Semi-Supervised Learning with Ladder Networks in Keras This is an implementation of Ladder Network in Keras. Ladder network is a model for semi-superv

Divam Gupta 101 Sep 07, 2022
FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows.

Meta Incubator 272 Jan 02, 2023
7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

8 Jul 09, 2021
A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Wilderness Scavenger: 3D Open-World FPS Game AI Challenge This is a platform for intelligent agent learning based on a 3D open-world FPS game develope

46 Nov 24, 2022
Gluon CV Toolkit

Gluon CV Toolkit | Installation | Documentation | Tutorials | GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in

Distributed (Deep) Machine Learning Community 5.4k Jan 06, 2023
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

bottom-up-attention This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and at

Peter Anderson 1.3k Jan 09, 2023
Commonsense Ability Tests

CATS Commonsense Ability Tests Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models Use making_sense.py to run the exper

XUHUI ZHOU 28 Oct 19, 2022
AISTATS 2019: Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

Confidence-based Graph Convolutional Networks for Semi-Supervised Learning Source code for AISTATS 2019 paper: Confidence-based Graph Convolutional Ne

MALL Lab (IISc) 56 Dec 03, 2022
Dynamical Wasserstein Barycenters for Time Series Modeling

Dynamical Wasserstein Barycenters for Time Series Modeling This is the code related for the Dynamical Wasserstein Barycenter model published in Neurip

8 Sep 09, 2022
Find the Heart simple Python Game

This is a simple Python game for finding a heart emoji. There is a 3 x 3 matrix in which a heart emoji resides. The location of the heart is randomized and is not revealed. The player must guess the

p.katekomol 1 Jan 24, 2022
Code for reproducible experiments presented in KSD Aggregated Goodness-of-fit Test.

Code for KSDAgg: a KSD aggregated goodness-of-fit test This GitHub repository contains the code for the reproducible experiments presented in our pape

Antonin Schrab 5 Dec 15, 2022