[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Overview

MixFormer

The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention

PWC

PWC

[Models and Raw results] (Google Driver) [Models and Raw results] (Baidu Driver: hmuv)

MixFormer_Framework

News

[Mar 21, 2022]

  • MixFormer is accepted to CVPR2022.
  • We release Code, models and raw results.

[Mar 29, 2022]

  • Our paper is selected for an oral presentation.

Highlights

New transformer tracking framework

MixFormer is composed of a target-search mixed attention (MAM) based backbone and a simple corner head, yielding a compact tracking pipeline without an explicit integration module.

End-to-end, Positional-embedding-free, multi-feature-aggregation-free

Mixformer is an end-to-end tracking framework without post-processing. Compared with other transformer trackers, MixFormer doesn's use positional embedding, attentional mask and multi-layer feature aggregation strategy.

Strong performance

Tracker VOT2020 (EAO) LaSOT (NP) GOT-10K (AO) TrackingNet (NP)
MixFormer 0.555 79.9 70.7 88.9
ToMP101* (CVPR2022) - 79.2 - 86.4
SBT-large* (CVPR2022) 0.529 - 70.4 -
SwinTrack* (Arxiv2021) - 78.6 69.4 88.2
Sim-L/14* (Arxiv2022) - 79.7 69.8 87.4
STARK (ICCV2021) 0.505 77.0 68.8 86.9
KeepTrack (ICCV2021) - 77.2 - -
TransT (CVPR2021) 0.495 73.8 67.1 86.7
TrDiMP (CVPR2021) - - 67.1 83.3
Siam R-CNN (CVPR2020) - 72.2 64.9 85.4
TREG (Arxiv2021) - 74.1 66.8 83.8

Install the environment

Use the Anaconda

conda create -n mixformer python=3.6
conda activate mixformer
bash install_pytorch17.sh

Data Preparation

Put the tracking datasets in ./data. It should look like:

${MixFormer_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- train2017
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train MixFormer

Training with multiple GPUs using DDP. More details of other training settings can be found at tracking/train_mixformer.sh

# MixFormer
bash tracking/train_mixformer.sh

Test and evaluate MixFormer on benchmarks

  • LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123. More details of test settings can be found at tracking/test_mixformer.sh
bash tracking/test_mixformer.sh
  • VOT2020
    Before evaluating "MixFormer+AR" on VOT2020, please install some extra packages following external/AR/README.md. Also, the VOT toolkit is required to evaluate our tracker. To download and instal VOT toolkit, you can follow this tutorial. For convenience, you can use our example workspaces of VOT toolkit under external/vot20/ by setting trackers.ini.
cd external/vot20/<workspace_dir>
vot evaluate --workspace . MixFormerPython
# generating analysis results
vot analysis --workspace . --nocache

Run MixFormer on your own video

bash tracking/run_video_demo.sh

Compute FLOPs/Params and test speed

bash tracking/profile_mixformer.sh

Visualize attention maps

bash tracking/vis_mixformer_attn.sh

vis_attn

Model Zoo and raw results

The trained models and the raw tracking results are provided in the [Models and Raw results] (Google Driver) or [Models and Raw results] (Baidu Driver: hmuv).

Contact

Yutao Cui: [email protected]

Cheng Jiang: [email protected]

Acknowledgments

  • Thanks for PyTracking Library and STARK Library, which helps us to quickly implement our ideas.
  • We use the implementation of the CvT from the official repo CvT.
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
Local Attention - Flax module for Jax

Local Attention - Flax Autoregressive Local Attention - Flax module for Jax Install $ pip install local-attention-flax Usage from jax import random fr

Phil Wang 16 Jun 16, 2022
Continuous Diffusion Graph Neural Network

We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE.

Twitter Research 227 Jan 05, 2023
A Framework for Encrypted Machine Learning in TensorFlow

TF Encrypted is a framework for encrypted machine learning in TensorFlow. It looks and feels like TensorFlow, taking advantage of the ease-of-use of t

TF Encrypted 0 Jul 06, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 03, 2023
Real-time multi-object tracker using YOLO v5 and deep sort

This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of object detection architectures and models pretrained on the COCO dataset, are passed to a Deep Sort algor

Mike 3.6k Jan 05, 2023
PyTorch code of my WACV 2022 paper Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (WACV 2022) Paper ArXiv Why it matters? When data augmen

Rowel Atienza 5 Mar 04, 2022
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

BitPack is a practical tool that can efficiently save quantized neural network models with mixed bitwidth.

Zhen Dong 36 Dec 02, 2022
Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

Logan Yang 81 Dec 25, 2022
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

PointNav-VO The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation Project Page | Paper Table of Contents Setup

Xiaoming Zhao 41 Dec 15, 2022
A framework for multi-step probabilistic time-series/demand forecasting models

JointDemandForecasting.py A framework for multi-step probabilistic time-series/demand forecasting models File stucture JointDemandForecasting contains

Stanford Intelligent Systems Laboratory 3 Sep 28, 2022
PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

PyTorch NeRF and pixelNeRF NeRF: Tiny NeRF: pixelNeRF: This repository contains minimal PyTorch implementations of the NeRF model described in "NeRF:

Michael A. Alcorn 178 Dec 20, 2022
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

217 Jan 03, 2023
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 08, 2023
[WWW 2022] Zero-Shot Stance Detection via Contrastive Learning

PT-HCL for Zero-Shot Stance Detection The code of this repository is constantly being updated... Please look forward to it! Introduction This reposito

Akuchi 12 Dec 21, 2022
Pre-trained NFNets with 99% of the accuracy of the official paper

NFNet Pytorch Implementation This repo contains pretrained NFNet models F0-F6 with high ImageNet accuracy from the paper High-Performance Large-Scale

Benjamin Schmidt 133 Dec 09, 2022
Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training"

Saliency Guided Training Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training" by Aya Abdelsalam Ismail, Hector Cor

8 Sep 22, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save SAVE_NAME --data PATH_TO_DATA_DIR --dataset DATASET --model model_name [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Ibai Gorordo 42 Oct 07, 2022