NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

Overview

NExT-QA

We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2 'Weak Accept's).

NExT-QA is a VideoQA benchmark targeting the explanation of video contents. It challenges QA models to reason about the causal and temporal actions and understand the rich object interactions in daily activities. We set up both multi-choice and open-ended QA tasks on the dataset. This repo. provides resources for multi-choice QA; open-ended QA is found in NExT-OE. For more details, please refer to our dataset page.

Environment

Anaconda 4.8.4, python 3.6.8, pytorch 1.6 and cuda 10.2. For other libs, please refer to the file requirements.txt.

Install

Please create an env for this project using anaconda (should install anaconda first)

>conda create -n videoqa python=3.6.8
>conda activate videoqa
>git clone https://github.com/doc-doc/NExT-QA.git
>pip install -r requirements.txt #may take some time to install

Data Preparation

Please download the pre-computed features and QA annotations from here. There are 4 zip files:

  • ['vid_feat.zip']: Appearance and motion feature for video representation. (With code provided by HCRN).
  • ['qas_bert.zip']: Finetuned BERT feature for QA-pair representation. (Based on pytorch-pretrained-BERT).
  • ['nextqa.zip']: Annotations of QAs and GloVe Embeddings.
  • ['models.zip']: Learned HGA model.

After downloading the data, please create a folder ['data/feats'] at the same directory as ['NExT-QA'], then unzip the video and QA features into it. You will have directories like ['data/feats/vid_feat/', 'data/feats/qas_bert/' and 'NExT-QA/'] in your workspace. Please unzip the files in ['nextqa.zip'] into ['NExT-QA/dataset/nextqa'] and ['models.zip'] into ['NExT-QA/models/'].

(You are also encouraged to design your own pre-computed video features. In that case, please download the raw videos from VidOR. As NExT-QA's videos are sourced from VidOR, you can easily link the QA annotations with the corresponding videos according to the key 'video' in the ['nextqa/.csv'] files, during which you may need the map file ['nextqa/map_vid_vidorID.json']).

Usage

Once the data is ready, you can easily run the code. First, to test the environment and code, we provide the prediction and model of the SOTA approach (i.e., HGA) on NExT-QA. You can get the results reported in the paper by running:

>python eval_mc.py

The command above will load the prediction file under ['results/'] and evaluate it. You can also obtain the prediction by running:

>./main.sh 0 val #Test the model with GPU id 0

The command above will load the model under ['models/'] and generate the prediction file. If you want to train the model, please run

>./main.sh 0 train # Train the model with GPU id 0

It will train the model and save to ['models']. (The results may be slightly different depending on the environments)

Results

Methods Text Rep. Acc_C Acc_T Acc_D Acc Text Rep. Acc_C Acc_T Acc_D Acc
BlindQA GloVe 26.89 30.83 32.60 30.60 BERT-FT 42.62 45.53 43.89 43.76
EVQA GloVe 28.69 31.27 41.44 31.51 BERT-FT 42.64 46.34 45.82 44.24
STVQA [CVPR17] GloVe 36.25 36.29 55.21 39.21 BERT-FT 44.76 49.26 55.86 47.94
CoMem [CVPR18] GloVe 35.10 37.28 50.45 38.19 BERT-FT 45.22 49.07 55.34 48.04
HME [CVPR19] GloVe 37.97 36.91 51.87 39.79 BERT-FT 46.18 48.20 58.30 48.72
HCRN [CVPR20] GloVe 39.09 40.01 49.16 40.95 BERT-FT 45.91 49.26 53.67 48.20
HGA [AAAI20] GloVe 35.71 38.40 55.60 39.67 BERT-FT 46.26 50.74 59.33 49.74
Human - 87.61 88.56 90.40 88.38 - 87.61 88.56 90.40 88.38

Multi-choice QA vs. Open-ended QA

vis mc_oe

Citation

@article{xiao2021next,
  title={NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions},
  author={Xiao, Junbin and Shang, Xindi and Yao, Angela and Chua, Tat-Seng},
  journal={arXiv preprint arXiv:2105.08276},
  year={2021}
}

Todo

  1. Open evaluation server and release test data.
  2. Release spatial feature.
  3. Release RoI feature.

Acknowledgement

Our reproduction of the methods are based on the respective official repositories, we thank the authors to release their code. If you use the related part, please cite the corresponding paper commented in the code.

Owner
Junbin Xiao
PhD Candidate
Junbin Xiao
Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

Aerial Imagery dataset for fire detection: classification and segmentation using Unmanned Aerial Vehicle (UAV) Title FLAME (Fire Luminosity Airborne-b

79 Jan 06, 2023
Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

Intellia ICT 5 Oct 30, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022
(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

LAV Learning from All Vehicles Dian Chen, Philipp Krähenbühl CVPR 2022 (also arXiV 2203.11934) This repo contains code for paper Learning from all veh

Dian Chen 300 Dec 15, 2022
learned_optimization: Training and evaluating learned optimizers in JAX

learned_optimization: Training and evaluating learned optimizers in JAX learned_optimization is a research codebase for training learned optimizers. I

Google 533 Dec 30, 2022
Platform-agnostic AI Framework 🔥

🇬🇧 TensorLayerX is a multi-backend AI framework, which can run on almost all operation systems and AI hardwares, and support hybrid-framework progra

TensorLayer Community 171 Jan 06, 2023
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
duralava is a neural network which can simulate a lava lamp in an infinite loop.

duralava duralava is a neural network which can simulate a lava lamp in an infinite loop. Example This is not a real lava lamp but a "fake" one genera

Maximilian Bachl 87 Dec 20, 2022
This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

41 Dec 14, 2022
Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.

pixel_character_generator Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included. Dataset TinyHero D

Agnieszka Mikołajczyk 88 Nov 17, 2022
Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

To make the comparison with Animatable NeRF easier on the Human3.6M dataset, we save the quantitative results at here, which also contains the results of other methods, including Neural Body, D-NeRF,

ZJU3DV 359 Jan 08, 2023
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

CBREN This is the Pytorch implementation for our IEEE TCSVT paper : CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhanceme

Zhao Hengrun 3 Nov 04, 2022
This repo provides the base code for pytorch-lightning and weight and biases simultaneous integration.

Write your model faster with pytorch-lightning-wadb-code-backbone This repository provides the base code for pytorch-lightning and weight and biases s

9 Mar 29, 2022
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 04, 2023
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Junxian He 57 Jan 01, 2023
Flappy bird automation using Neuroevolution of Augmenting Topologies (NEAT) in Python

FlappyAI Flappy bird automation using Neuroevolution of Augmenting Topologies (NEAT) in Python Everything Used Genetic Algorithm especially NEAT conce

Eryawan Presma Y. 2 Mar 24, 2022
Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

VIN: Value Iteration Networks A quick thank you A few others have released amazing related work which helped inspire and improve my own implementation

Kent Sommer 297 Dec 26, 2022
Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

Tu Anh Dinh 1 Sep 07, 2022