Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Last update: Dec 21, 2022

Overview

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

Labels4Free: Unsupervised Segmentation using StyleGAN

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Robot Reinforcement Learning on the Constraint Manifold

Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks.

A real world application of a Recurrent Neural Network on a binary classification of time series data

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

PyTorch implementation of PNASNet-5 on ImageNet

Group-Free 3D Object Detection via Transformers

The code for "Deep Level Set for Box-supervised Instance Segmentation in Aerial Images".

This repo contains implementation of different architectures for emotion recognition in conversations.

Find-Lane-Line - Use openCV library and Python to detect the road-lane-line

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

Automatic Image Background Subtraction

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.