Video Contrastive Learning with Global Context

Overview

Video Contrastive Learning with Global Context (VCLR)

This is the official PyTorch implementation of our VCLR paper.

Install dependencies

  • environments
    conda create --name vclr python=3.7
    conda activate vclr
    conda install numpy scipy scikit-learn matplotlib scikit-image
    pip install torch==1.7.1 torchvision==0.8.2
    pip install opencv-python tqdm termcolor gcc7 ffmpeg tensorflow==1.15.2
    pip install mmcv-full==1.2.7

Prepare datasets

Please refer to PREPARE_DATA to prepare the datasets.

Prepare pretrained MoCo weights

In this work, we follow SeCo and use the pretrained weights of MoCov2 as initialization.

cd ~
git clone https://github.com/amazon-research/video-contrastive-learning.git
cd video-contrastive-learning
mkdir pretrain && cd pretrain
wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_200ep/moco_v2_200ep_pretrain.pth.tar
cd ..

Self-supervised pretraining

bash shell/main_train.sh

Checkpoints will be saved to ./results

Downstream tasks

Linear evaluation

In order to evaluate the effectiveness of self-supervised learning, we conduct a linear evaluation (probing) on Kinetics400 dataset. Basically, we first extract features from the pretrained weight and then train a SVM classifier to see how the learned features perform.

bash shell/eval_svm.sh
  • Results

    Arch Pretrained dataset Epoch Pretrained model Acc. on K400
    ResNet50 Kinetics400 400 Download link 64.1

Video retrieval

bash shell/eval_retrieval.sh

Action recognition & action localization

Here, we use mmaction2 for both tasks. If you are not familiar with mmaction2, you can read the official documentation.

Installation

  • Step1: Install mmaction2

    To make sure the results can be reproduced, please use our forked version of mmaction2 (version: 0.11.0):

    conda activate vclr
    cd ~
    git clone https://github.com/KuangHaofei/mmaction2
    
    cd mmaction2
    pip install -v -e .
  • Step2: Prepare the pretrained weights

    Our pretrained backbone have different format with the backbone of mmaction2, it should be transferred to mmaction2 format. We provide the transferred version of our K400 pretrained weights, TSN and TSM. We also provide the script for transferring weights, you can find it here.

    Moving the pretrained weights to checkpoints directory:

    cd ~/mmaction2
    mkdir checkpoints
    wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth
    wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth

Action recognition

Make sure you have prepared the dataset and environments following the previous step. Now suppose you are in the root directory of mmaction2, follow the subsequent steps to fine tune the TSN or TSM models for action recognition.

For each dataset, the train and test setting can be found in the configuration files.

  • UCF101

    • config file: tsn_ucf101.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_ucf101.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_ucf101.py \
        work_dirs/vclr/ucf101/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • HMDB51

    • config file: tsn_hmdb51.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_hmdb51.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_hmdb51.py \
        work_dirs/vclr/hmdb51/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • SomethingSomethingV2: TSN

    • config file: tsn_sthv2.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_sthv2.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_sthv2.py \
        work_dirs/vclr/tsn_sthv2/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • SomethingSomethingV2: TSM

    • config file: tsm_sthv2.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsm/vclr/tsm_sthv2.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsm/vclr/tsm_sthv2.py \
        work_dirs/vclr/tsm_sthv2/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • ActivityNet

    • config file: tsn_activitynet.py
    • train command:
      ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_activitynet.py 8 \
        --validate --seed 0 --deterministic
    • test command:
      python tools/test.py configs/recognition/tsn/vclr/tsn_activitynet.py \
        work_dirs/vclr/tsn_activitynet/latest.pth \
        --eval top_k_accuracy mean_class_accuracy --out result.json
  • Results

    Arch Dataset Finetuned model Acc.
    TSN UCF101 Download link 85.6
    TSN HMDB51 Download link 54.1
    TSN SomethingSomethingV2 Download link 33.3
    TSM SomethingSomethingV2 Download link 52.0
    TSN ActivityNet Download link 71.9

Action localization

  • Step 1: Follow the previous section, suppose the finetuned model is saved at work_dirs/vclr/tsn_activitynet/latest.pth

  • Step 2: Extract ActivityNet features

    cd ~/mmaction2/tools/data/activitynet/
    
    python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
      --data-list /home/ubuntu/data/ActivityNet/anet_train_video.txt \
      --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
      --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
    
    python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
      --data-list /home/ubuntu/data/ActivityNet/anet_val_video.txt \
      --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
      --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
    
    python activitynet_feature_postprocessing.py \
      --rgb /home/ubuntu/data/ActivityNet/rgb_feat \
      --dest /home/ubuntu/data/ActivityNet/mmaction_feat

    Note, the root directory of ActivityNey is /home/ubuntu/data/ActivityNet/ in our case. Please replace it according to your real directory.

  • Step 3: Train and test the BMN model

    • train
      cd ~/mmaction2
      ./tools/dist_train.sh configs/localization/bmn/bmn_acitivitynet_feature_vclr.py 2 \
        --work-dir work_dirs/vclr/bmn_activitynet --validate --seed 0 --deterministic --bmn
    • test
      python tools/test.py configs/localization/bmn/bmn_acitivitynet_feature_vclr.py \
        work_dirs/vclr/bmn_activitynet/latest.pth \
        --bmn --eval [email protected] --out result.json
  • Results

    Arch Dataset Finetuned model AUC [email protected]
    BMN ActivityNet Download link 65.5 73.8

Feature visualization

We provide our feature visualization code at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

EfficientNetV2-with-TPU - Cifar-10 case study

EfficientNetV2-with-TPU EfficientNet EfficientNetV2 adalah jenis jaringan saraf convolutional yang memiliki kecepatan pelatihan lebih cepat dan efisie

Sultan syach 1 Dec 28, 2021
Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

1.7k Dec 31, 2022
PyTorch ,ONNX and TensorRT implementation of YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4

4.2k Jan 01, 2023
Lane assist for ETS2, built with the ultra-fast-lane-detection model.

Euro-Truck-Simulator-2-Lane-Assist Lane assist for ETS2, built with the ultra-fast-lane-detection model. This project was made possible by the amazing

36 Jan 05, 2023
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Troyanskaya Laboratory 323 Jan 01, 2023
Deep Sketch-guided Cartoon Video Inbetweening

Cartoon Video Inbetweening Paper | DOI | Video The source code of Deep Sketch-guided Cartoon Video Inbetweening by Xiaoyu Li, Bo Zhang, Jing Liao, Ped

Xiaoyu Li 37 Dec 22, 2022
Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

66 Dec 15, 2022
Wenzhou-Kean University AI-LAB

AI-LAB This is Wenzhou-Kean University AI-LAB. Our research interests are in Computer Vision and Natural Language Processing. Computer Vision Please g

WKU AI-LAB 10 May 05, 2022
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

Keon Lee 59 Dec 06, 2022
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"

Introduction This repository contains research code for the ACL 2021 paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual

AdapterHub 20 Aug 04, 2022
Implementation of FitVid video prediction model in JAX/Flax.

FitVid Video Prediction Model Implementation of FitVid video prediction model in JAX/Flax. If you find this code useful, please cite it in your paper:

Google Research 62 Nov 25, 2022
The code for "Deep Level Set for Box-supervised Instance Segmentation in Aerial Images".

Deep Levelset for Box-supervised Instance Segmentation in Aerial Images Wentong Li, Yijie Chen, Wenyu Liu, Jianke Zhu* This code is based on MMdetecti

sunshine.lwt 112 Jan 05, 2023
Use AI to generate a optimized stock portfolio

Use AI, Modern Portfolio Theory, and Monte Carlo simulation's to generate a optimized stock portfolio that minimizes risk while maximizing returns. Ho

Greg James 30 Dec 22, 2022
Predictive Maintenance LSTM

Predictive-Maintenance-LSTM - Predictive maintenance study for Complex case study, we've obtained failure causes by operational error and more deeply by design mistakes.

Amir M. Sadafi 1 Dec 31, 2021
Tensorflow implementation of soft-attention mechanism for video caption generation.

SA-tensorflow Tensorflow implementation of soft-attention mechanism for video caption generation. An example of soft-attention mechanism. The attentio

Paul Chen 153 Nov 14, 2022
Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact

NVIDIA Research Projects 10.6k Jan 01, 2023
NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

NOD (Night Object Detection) Dataset NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset, BM

Igor Morawski 17 Nov 05, 2022
Fairness Metrics: All you need to know

Fairness Metrics: All you need to know Testing machine learning software for ethical bias has become a pressing current concern. Recent research has p

Anonymous2020 1 Jan 17, 2022
Fusion-in-Decoder Distilling Knowledge from Reader to Retriever for Question Answering

This repository contains code for: Fusion-in-Decoder models Distilling Knowledge from Reader to Retriever Dependencies Python 3 PyTorch (currently tes

Meta Research 323 Dec 19, 2022
Transformer model implemented with Pytorch

transformer-pytorch Transformer model implemented with Pytorch Attention is all you need-[Paper] Architecture Self-Attention self_attention.py class

Mingu Kang 12 Sep 03, 2022