Syntax-Aware Action Targeting for Video Captioning

Last update: Oct 13, 2022

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

Python 3.6
Pytorch 1.1
CUDA 10.0
Microsoft COCO Caption Evaluation
CIDEr

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (1.6GB). This folder contains:

input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
output/feature: extracted features of IRv2, C3D and Category embeddings
output/metadata: preprocessed annotations
output/model_svo/xe: model file and generated captions on test videos, the reported result can be reproduced by the model provided in this folder (CIDEr 49.1 for XE training)

Test

make -f SpecifiedMakefile test [options]

Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result

make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20

Train

To train the model using XE loss

make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20

If you want to change the input features, modify the FEATS variable in above commands.

Citation

@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgements

Pytorch implementation of CST
PyTorch implementation of SCST

Syntax-Aware Action Targeting for Video Captioning

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Dependencies

Data

Test

Train

Citation

Acknowledgements

Owner

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

Codebase for INVASE: Instance-wise Variable Selection - 2019 ICLR

An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

A curated list of awesome Active Learning

Repo for EchoVPR: Echo State Networks for Visual Place Recognition

Pre-training of Graph Augmented Transformers for Medication Recommendation

Meta Learning Backpropagation And Improving It (VSML)

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Only valid pull requests will be allowed. Use python only and readme changes will not be accepted.

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

A rule learning algorithm for the deduction of syndrome definitions from time series data.

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

Static-test - A playground to play with ideas related to testing the comparability of the code

Pytorch implementation of Learning Rate Dropout.

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

AFL binary instrumentation

DIP-football - A football video analyse system based on Yolov5, alphapose, Qt6