Unified tracking framework with a single appearance model

Related tags

Deep LearningUniTrack
Overview

UniTrack Logo


Paper: Do different tracking tasks require different appearance model?

[ArXiv] (comming soon) [Project Page] (comming soon)

UniTrack is a simple and Unified framework for versatile visual Tracking tasks.

As an important problem in computer vision, tracking has been fragmented into a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now the novel approaches proposed by the community are usually specialized to fit only one specific setup. To understand to what extend this specialization is actually necessary, we present UniTrack, a solution to address multiple different tracking tasks within the same framework. All tasks share the same universal appearance model. UniTrack enjoys the following advantages,

Tasks & Framework

tasksframework

Tasks

We classify existing tracking tasks along four axes: (1) Single or multiple targets; (2) Users specify targets or automatic detectors specify targets; (3) Observation formats (bounding box/mask/pose); (2) Class-agnostic or class-specific (i.e. human/vehicles). We mainly expriment on 5 tasks: SOT, VOS, MOT, MOTS, and PoseTrack. Task setups are summarized in the above figure.

Appearance model

An appearance model is the only learnable component in UniTrack. It should provide universal visual representation, and is usually pre-trained on large-scale dataset in supervised or unsupervised manners. Typical examples include ImageNet pre-trained ResNets (supervised), and recent self-supervised models such as MoCo and SimCLR (unsupervised).

Propagation and Association

Two fundamental algorithm building blocks in UniTrack. Both employ features extracted by the appearance model as input. For propagation we adopt exiting methods such as cross correlation, DCF, and mask propation. For association we employ a simple algorithm and develop a novel similarity metric to make full use of the appearance model.

Results

Below we show results of UniTrack with a simple ImageNet Pre-trained ResNet-18 as the appearance model. More results (other tasks/datasets, more visualization) can be found in results.md.

Qualitative results

Single Object Tracking (SOT) on OTB-2015

Video Object Segmentation (VOS) on DAVIS-2017 val split

Multiple Object Tracking (MOT) on MOT-16 test set private detector track (Detections from FairMOT)

Multiple Object Tracking and Segmentation (MOTS) on MOTS challenge test set (Detections from COSTA_st)

Pose Tracking on PoseTrack-2018 val split (Detections from LightTrack)

Quantitative results

Single Object Tracking (SOT) on OTB-2015

Method SiamFC SiamRPN SiamRPN++ UDT* UDT+* LUDT* LUDT+* UniTrack_XCorr* UniTrack_DCF*
AUC 58.2 63.7 69.6 59.4 63.2 60.2 63.9 55.5 61.8

* indicates non-supervised methods

Video Object Segmentation (VOS) on DAVIS-2017 val split

Method SiamMask FeelVOS STM Colorization* TimeCycle* UVC* CRW* VFS* UniTrack*
J-mean 54.3 63.7 79.2 34.6 40.1 56.7 64.8 66.5 58.4

* indicates non-supervised methods

Multiple Object Tracking (MOT) on MOT-16 test set private detector track

Method POI DeepSORT-2 JDE CTrack TubeTK TraDes CSTrack FairMOT* UniTrack*
IDF-1 65.1 62.2 55.8 57.2 62.2 64.7 71.8 72.8 71.8
IDs 805 781 1544 1897 1236 1144 1071 1074 683
MOTA 66.1 61.4 64.4 67.6 66.9 70.1 70.7 74.9 74.7

* indicates methods using the same detections

Multiple Object Tracking and Segmentation (MOTS) on MOTS challenge test set

Method TrackRCNN SORTS PointTrack GMPHD COSTA_st* UniTrack*
IDF-1 42.7 57.3 42.9 65.6 70.3 67.2
IDs 567 577 868 566 421 622
sMOTA 40.6 55.0 62.3 69.0 70.2 68.9

* indicates methods using the same detections

Pose Tracking on PoseTrack-2018 val split

Method MDPN OpenSVAI Miracle KeyTrack LightTrack* UniTrack*
IDF-1 - - - - 52.2 73.2
IDs - - - - 3024 6760
sMOTA 50.6 62.4 64.0 66.6 64.8 63.5

* indicates methods using the same detections

Getting started

Demo

Update log

[2021.6.24]: Start writing docs, please stay tuned!

Acknowledgement

VideoWalk by Allan A. Jabri

SOT code by Zhipeng Zhang

Owner
ZhongdaoWang
Computer Vision, Multi-Object Tracking
ZhongdaoWang
Codebase for Inducing Causal Structure for Interpretable Neural Networks

Interchange Intervention Training (IIT) Codebase for Inducing Causal Structure for Interpretable Neural Networks Release Notes 12/01/2021: Code and Pa

Zen 6 Oct 10, 2022
Fast, accurate and reliable software for algebraic CT reconstruction

KCT CBCT Fast, accurate and reliable software for algebraic CT reconstruction. This set of software tools includes OpenCL implementation of modern CT

Vojtěch Kulvait 4 Dec 14, 2022
Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

Attendance-System-based-on-Facial-recognition-Attendance-data-stored-in-csv-file- Simple Python project using Opencv and datetime package to recognise

3 Aug 09, 2022
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

111 Dec 27, 2022
YoloV3 Implemented in Tensorflow 2.0

YoloV3 Implemented in TensorFlow 2.0 This repo provides a clean implementation of YoloV3 in TensorFlow 2.0 using all the best practices. Key Features

Zihao Zhang 2.5k Dec 26, 2022
Deploy recommendation engines with Edge Computing

RecoEdge: Bringing Recommendations to the Edge A one stop solution to build your recommendation models, train them and, deploy them in a privacy prese

NimbleEdge 131 Jan 02, 2023
Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation

NorCal Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation On Model Calibration for Long-Tailed Object Detec

Tai-Yu (Daniel) Pan 24 Dec 25, 2022
Pixel Consensus Voting for Panoptic Segmentation (CVPR 2020)

Implementation for Pixel Consensus Voting (CVPR 2020). This codebase contains the essential ingredients of PCV, including various spatial discretizati

Haochen 23 Oct 25, 2022
This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Mask-RCNN on Mycobacterium tuberculosis This is an example of object detection on Mycobacterium Tuberculosis using Mask RCNN. Implement of Mask R-CNN

Jun-En Ding 1 Sep 16, 2021
A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Manifold Matching via Deep Metric Learning for Generative Modeling A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generat

69 Dec 10, 2022
Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

Facebook Research 69 Dec 29, 2022
A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms This repo contains: the HDG implementation (Matlab codes) for 'Analysis and

Lei Wang 5 Oct 22, 2022
Testbed of AI Systems Quality Management

qunomon Description A testbed for testing and managing AI system qualities. Demo Sorry. Not deployment public server at alpha version. Requirement Ins

AIST AIRC 15 Nov 27, 2021
TensorFlow implementation for Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How

Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How TensorFlow implementation for Bayesian Modeling and Unce

Shen Lab at Texas A&M University 8 Sep 02, 2022
MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

Secomp Lab at Purdue University 34 Dec 14, 2022
Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

TL;DR: CrossVIS (Crossover Learning for Fast Online Video Instance Segmentation) proposes a novel crossover learning paradigm to fully leverage rich c

Hust Visual Learning Team 79 Nov 25, 2022
General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

QData 154 Dec 21, 2022
Localization Distillation for Object Detection

Localization Distillation for Object Detection This repo is based on mmDetection. This is the code for our paper: Localization Distillation

274 Dec 26, 2022
Multi-Modal Machine Learning toolkit based on PyTorch.

简体中文 | English TorchMM 简介 多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据提供高效的解决方案,助力多模态学习应用落地。 近期更新 2022.1.5 发布 TorchMM 初始版本 v1.0 特性 丰富的任务场景:工具

njustkmg 1 Jan 05, 2022
This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

NNProject - DeepMask This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks. Th

189 Nov 16, 2022