Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Last update: Dec 27, 2022

Related tags

Overview

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL)

This repository is for Zero-shot Natural Language Video Localization. (ICCV 2021, Oral)

We first propose a novel task of zero-shot natural language video localization. The proposed task setup does not require any paired annotation cost for NLVL task but only requires easily available text corpora, off-the-shelf object detector, and a collection of videos to localize. To address the task, we propose a Pseudo-Supervised Video Localization method, called PSVL, that can generate pseudo-supervision for training an NLVL model. Benchmarked on two widely used NLVL datasets, the proposed method exhibits competitive performance and performs on par or outperforms the models trained with stronger supervision.

Environment

This repository is implemented base on PyTorch with Anaconda.
Refer to below instruction or use Docker (dcahn/psvl:latest).

Get the code

Clone this repo with git, please use:

git clone https://github.com/gistvision/PSVL.git

Make your own environment (If you use docker envronment, you just clone the code and execute it.)

conda create --name PSVL --file requirements.txt
conda activate PSVL

Working environment

RTX2080Ti (11G)
Ubuntu 18.04.5
pytorch 1.5.1

Download

Dataset & Pretrained model

This link is connected for downloading video features used in this paper.
: After downloading the video feature, you need to set the data path in a config file.
This link is connected for downloading pre-trained model.

Evaluating pre-trained models

If you want to evaluate the pre-trained model, you can use below command.

python inference.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH" --pre_trained "YOUR MODEL PATH"

Training models from scratch

To train PSVL, run train.py with below command.

# Training from scratch
python train.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH"
# Evaluation
python inference.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH" --pre_trained "YOUR MODEL PATH"

Lisence

MIT Lisence

Citation

If you use this code, please cite:

@inproceedings{nam2021zero,
  title={Zero-shot Natural Language Video Localization},
  author={Nam, Jinwoo and Ahn, Daechul and Kang, Dongyeop and Ha, Seong Jong and Choi, Jonghyun},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={1470-1479},
  year={2021}
}

Contact

If you have any questions, please send e-mail to me ([email protected], [email protected])

Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Related tags

Overview

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL)

Environment

Get the code

Working environment

Download

Dataset & Pretrained model

Evaluating pre-trained models

Training models from scratch

Lisence

Citation

Contact

Owner

Computer Vision Lab. @ GIST

시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)

Deep Q Learning with OpenAI Gym and Pokemon Showdown

Code for Learning to Segment The Tail (LST)

Code for Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task

Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

A lightweight tool to get an AI Infrastructure Stack up in minutes not days.

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL)

The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Revisiting Self-Training for Few-Shot Learning of Language Model.

Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

A web application that provides real time temperature and humidity readings of a house.

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Data cleaning, missing value handle, EDA use in this project

Code for CPM-2 Pre-Train