Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Overview

Pytorch Code for VideoLT

[Website][Paper]

Updates

  • [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at fudan.edu.cn
  • [09/28/2021] Features uploaded to Aliyun Drive(deprecated), for access please send us an e-mail: zhangxing18 at fudan.edu.cn
  • [08/23/2021] Checkpoint links uploaded, sorry we are handling campus network bandwidth limitation, dataset will be released in this weeek.
  • [08/15/2021] Code released. Dataset download links and checkpoints links will be updated in a week.
  • [07/29/2021] Dataset released, visit https://videolt.github.io/ for downloading.
  • [07/23/2021] VideoLT is accepted by ICCV2021.

concept

Overview

VideoLT is a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. We provide VideoLT dataset and long-tailed baselines in this repo including:

Data Preparation

Please visit https://videolt.github.io/ to obtain download links. We provide raw videos and extracted features.

For using extracted features, please modify dataset/dutils.py and set the correct path to features.

Model Zoo

The baseline scripts and checkpoints are provided in MODELZOO.md.

FrameStack

FrameStack is simple yet effective approach for long-tailed video recognition which re-samples training data at the frame level and adopts a dynamic sampling strategy based on knowledge learned by the network. The rationale behind FrameStack is to dynamically sample more frames from videos in tail classes and use fewer frames for those from head classes.

framestack

Usage

Requirement

pip install -r requirements.txt

Prepare Data Path

  1. Modify FEATURE_NAME, PATH_TO_FEATURE and FEATURE_DIM in dataset/dutils.py.

  2. Set ROOT in dataset/dutils.py to labels folder. The directory structure is:

    labels
    |-- count-labels-train.lst
    |-- test.lst
    |-- test_videofolder.txt
    |-- train.lst
    |-- train_videofolder.txt
    |-- val_videofolder.txt
    `-- validate.lst

Train

We provide scripts for training. Please refer to MODELZOO.md.

Example training scripts:

FEATURE_NAME='ResNet101'

export CUDA_VISIBLE_DEVICES='2'
python base_main.py  \
     --augment "mixup" \
     --feature_name $FEATURE_NAME \
     --lr 0.0001 \
     --gd 20 --lr_steps 30 60 --epochs 100 \
     --batch-size 128 -j 16 \
     --eval-freq 5 \
     --print-freq 20 \
     --root_log=$FEATURE_NAME-log \
     --root_model=$FEATURE_NAME'-checkpoints' \
     --store_name=$FEATURE_NAME'_bs128_lr0.0001_lateavg_mixup' \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Note: Set args.resample, args.augment and args.loss_func can apply multiple long-tailed stratigies.

Options:

    args.resample: ['None', 'CBS','SRS']
    args.augment : ['None', 'mixup', 'FrameStack']
    args.loss_func: ['BCELoss', 'LDAM', 'EQL', 'CBLoss', 'FocalLoss']

Test

We provide scripts for testing in scripts. Modify CKPT to saved checkpoints.

Example testing scripts:

FEATURE_NAME='ResNet101'
CKPT='VideoLT_checkpoints/ResNet-101/ResNet101_bs128_lr0.0001_lateavg_mixup/ckpt.best.pth.tar'

export CUDA_VISIBLE_DEVICES='1'
python base_test.py \
     --resume $CKPT \
     --feature_name $FEATURE_NAME \
     --batch-size 128 -j 16 \
     --print-freq 20 \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Citing

If you find VideoLT helpful for your research, please consider citing:

@misc{zhang2021videolt,
title={VideoLT: Large-scale Long-tailed Video Recognition}, 
author={Xing Zhang and Zuxuan Wu and Zejia Weng and Huazhu Fu and Jingjing Chen and Yu-Gang Jiang and Larry Davis},
year={2021},
eprint={2105.02668},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Owner
Skye
Soul Programmer & Science Enthusiast
Skye
Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

Thomas Vuillaume 1 Dec 10, 2021
Bringing sanity to world of messed-up data

Sanitize sanitize is a Python module for making sure various things (e.g. HTML) are safe to use. It was originally written by Mark Pilgrim and is dist

Alireza Savand 63 Oct 26, 2021
Train Yolov4 using NBX-Jobs

yolov4-trainer-nbox Train Yolov4 using NBX-Jobs. Use the powerfull functionality available in nbox-SDK repo to train a tiny-Yolo v4 model on Pascal VO

Yash Bonde 1 Jan 12, 2022
This repository contains the reference implementation for our proposed Convolutional CRFs.

ConvCRF This repository contains the reference implementation for our proposed Convolutional CRFs in PyTorch (Tensorflow planned). The two main entry-

Marvin Teichmann 553 Dec 07, 2022
AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

paddle-wechaty-Zodiac AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人 12星座若穿越科幻剧,会拥有什么超能力呢?快来迎接你的专属超能力吧! 现在很多年轻人都喜欢看科幻剧,像是复仇者系列,里面有很多英雄、超

105 Dec 22, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ra

EnliteAI GmbH 222 Dec 24, 2022
Heterogeneous Deep Graph Infomax

Heterogeneous-Deep-Graph-Infomax Parameter Setting: HDGI-A: Node-level dimension: 16 Attention head: 4 Semantic-level attention vector: 8 learning rat

52 Oct 31, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Multiview Orthographic Feature Transformation for 3D Object Detection Multiview 3D object detection on MultiviewC dataset through moft3d. Introduction

Jiahao Ma 20 Dec 21, 2022
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

CoVA: Context-aware Visual Attention for Webpage Information Extraction Abstract Webpage information extraction (WIE) is an important step to create k

Keval Morabia 41 Jan 01, 2023
Misc YOLOL scripts for use in the Starbase space sandbox videogame

starbase-misc Misc YOLOL scripts for use in the Starbase space sandbox videogame. Each directory contains standalone YOLOL scripts. They don't really

4 Oct 17, 2021
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022
Predicting a person's gender based on their weight and height

Logistic Regression Advanced Case Study Gender Classification: Predicting a person's gender based on their weight and height 1. Introduction We turn o

1 Feb 01, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
Time should be taken seer-iously

TimeSeers seers - (Noun) plural form of seer - A person who foretells future events by or as if by supernatural means TimeSeers is an hierarchical Bay

279 Dec 26, 2022
DeepMReye: magnetic resonance-based eye tracking using deep neural networks

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

73 Dec 21, 2022
Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis Abstract: This work targets at using a general deep lea

163 Dec 14, 2022
Towards Fine-Grained Reasoning for Fake News Detection

FinerFact This is the PyTorch implementation for the FinerFact model in the AAAI 2022 paper Towards Fine-Grained Reasoning for Fake News Detection (Ar

Ahren_Jin 15 Dec 15, 2022