AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Last update: Dec 26, 2022

Related tags

Deep Learning AdaFocusV2

Overview

AdaFocusV2

This repo contains the official code and pre-trained models for AdaFocusV2.

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Introduction

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy. As a representative work, the adaptive focus method (AdaFocus) has achieved a favorable trade-off between accuracy and inference speed by dynamically identifying and attending to the informative regions in each video frame. However, AdaFocus requires a complicated three-stage training pipeline (involving reinforcement learning), leading to slow convergence and is unfriendly to practitioners. This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. We further present an improved training scheme to address the issues introduced by the one-stage formulation, including the lack of supervision, input diversity and training stability. Moreover, a conditional-exit technique is proposed to perform temporal adaptive computation on top of AdaFocus without additional training. Extensive experiments on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, and Jester) demonstrate that our model significantly outperforms the original AdaFocus and other competitive baselines, while being considerably more simple and efficient to train.

Results

Compared with AdaFocusV1

ActivityNet, FCVID and Mini-Kinetics

Something-Something V1&V2 and Jester

Visualization

Get Started

Please go to the folder Experiments on ActivityNet, FCVID and Mini-Kinetics and Experiments on Sth-Sth and Jester for specific docs.

Contact

If you have any question, feel free to contact the authors or raise an issue. Yulin Wang: [email protected].

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Related tags

Overview

AdaFocusV2

Introduction

Results

Get Started

Contact

Owner

Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On

git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

Deep Q Learning with OpenAI Gym and Pokemon Showdown

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

This repository contains project created during the Data Challenge module at London School of Hygiene & Tropical Medicine

This is the repository of the NeurIPS 2021 paper "Curriculum Disentangled Recommendation withNoisy Multi-feedback"

PyContinual (An Easy and Extendible Framework for Continual Learning)

Implementation of Basic Machine Learning Algorithms on small datasets using Scikit Learn.

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

PIXIE: Collaborative Regression of Expressive Bodies

Benchmark for evaluating open-ended generation

Learning to Initialize Neural Networks for Stable and Efficient Training

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Segment axon and myelin from microscopy data using deep learning

Compare neural networks by their feature similarity

DGL-TreeSearch and the Gurobi-MWIS interface

Contains supplementary materials for reproduce results in HMC divergence time estimation manuscript