Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Last update: Dec 07, 2022

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

This repository is the official implementation of "Relational Self-Attention: What's Missing in Attention for Video Understanding" by Manjin Kim*, Heeseung Kwon*, Chunyu Wang, Suha Kwak, and Minsu Cho (*equal contribution).

Requirements

Python: 3.7.9
Pytorch: 1.6.0
TorchVision: 0.2.1
Cuda: 10.1
Conda environment environment.yml

To install requirements:

    conda env create -f environment.yml
    conda activate rsa

Dataset Preparation

Download Something-Something v1 & v2 (SSv1 & SSv2) datasets and extract RGB frames. Download URLs: SSv1, SSv2
Make txt files that define training & validation splits. Each line in txt files is formatted as [video_path] [#frames] [class_label]. Please refer to any txt files in ./data directory.

Training

To train RSANet-R50 on SSv1 or SSv2 datasets in the paper, run this command:

    # For SSv1
    ./scripts/train_Something_v1.sh 
    
    
     
    # example: ./scripts/train_Something_v1.sh RSA_R50_SSV1_16frames 16
    
    # For SSv2
    ./scripts/train_Something_v2.sh 
      
      
       
    # example: ./scripts/train_Something_v2.sh RSA_R50_SSV2_16frames 16

Evaluation

To evaluate RSANet-R50 on SSv2 dataset in the paper, run:

    # For SSv1
    ./scripts/test_Something_v1.sh 
    
     
     
      
    # example: ./scripts/test_Something_v1.sh RSA_R50_SSV1_16frames resnet_rgb_model_best.pth.tar 16
    
    # For SSv2
    ./scripts/test_Something_v2.sh 
       
        
        
          # example: ./scripts/test_Something_v2.sh RSA_R50_SSV2_16frames resnet_rgb_model_best.pth.tar 16

Results

Our model achieves the following performance on Something-Something-V1 and Something-Something-V2:

model	dataset	frames	top-1 / top-5	logs	checkpoints
RSANet-R50	SSV1	16	54.0 % / 81.1 %	[log]	[checkpoint]
RSANet-R50	SSV2	16	66.0 % / 89.9 %	[log]	[checkpoint]

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

Requirements

Dataset Preparation

Training

Evaluation

Results

Qualitative Results

Owner

mandos

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Tensorflow Tutorials using Jupyter Notebook

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

diablo2 resurrected loot filter

The Submission for SIMMC 2.0 Challenge 2021

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

CKD - Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding

Implementation of Deep Deterministic Policy Gradiet Algorithm in Tensorflow

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

Pytorch implementation of "Forward Thinking: Building and Training Neural Networks One Layer at a Time"

This is the winning solution of the Endocv-2021 grand challange.

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

unofficial pytorch implement of "Squareplus: A Softplus-Like Algebraic Rectifier"

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.