Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

Owner

Heeseung Yun

Model of an AI powered sign language interpreter.

A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

FMA: A Dataset For Music Analysis

Learning Optical Flow from a Few Matches (CVPR 2021)

Pytorch Lightning Distributed Accelerators using Ray

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

mPose3D, a mmWave-based 3D human pose estimation model.

A Learning-based Camera Calibration Toolbox

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture