Transformer-in-Vision

A paper list of some recent Transformer-based CV works. If you find some ignored papers, please open issues or pull requests.

**Last updated: 2022/01/20

Update log

2021/April - update all of recent papers of Transformer-in-Vision.
2021/May - update all of recent papers of Transformer-in-Vision.
2021/June - update all of recent papers of Transformer-in-Vision.
2021/July - update all of recent papers of Transformer-in-Vision.
2021/August - update all of recent papers of Transformer-in-Vision.
2021/September - update all of recent papers of Transformer-in-Vision.
2021/October - update all of recent papers of Transformer-in-Vision.
2021/November - update all of recent papers of Transformer-in-Vision.
2021/December - update all of recent papers of Transformer-in-Vision.

Survey:

(arXiv 2022.01) Video Transformers: A Survey. [Paper]
(arXiv 2021.11) A Survey of Visual Transformers. [Paper]
(arXiv 2021.09) Survey: Transformer based Video-Language Pre-training. [Paper]
(arXiv 2021.03) Multi-modal Motion Prediction with Stacked Transformers. [Paper], [Code]
(arXiv 2021.03) Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision. [Paper]
(arXiv 2020.09) Efficient Transformers: A Survey. [Paper]
(arXiv 2020.01) Transformers in Vision: A Survey. [Paper]

Contact & Feedback

If you have any suggestions about this project, feel free to contact me.

[e-mail: yzhangcst[at]gmail.com]

Transformer in Computer Vision

Related tags

Overview

Transformer-in-Vision

Update log

Survey:

Recent Papers

Action

Active Learning

Anomaly Detection

Assessment

Captioning

Classification (Backbone)

Completion

Compression

Crowd Counting

Depth

Deepfake Detection

Dehazing

Detection

Face

Few-shot Learning

Fusion

GAN

Gaze

HOI

Hyperspectral

Incremental Learning

In-painting

Instance Segmentation

Layout

Matching

Medical

Motion

Multi-task/modal

Multi-view Stereo

NAS

Navigation

OCR

Octree

Panoptic Segmentation

Point Cloud

Pose

Planning

Pruning & Quantization

Recognition

Reconstruction

Re-identification

Restoration

Retrieval

Salient Object Detection

Scene

Self-supervised Learning

Semantic Segmentation

Shape

Super-Resolution

Synthesis

Tracking

Traffic

Texture

Transfer learning

Video

Visual Grounding

Visual Reasoning

Visual Relationship Detection

Voxel

Weakly Supervised Learning

Zero-Shot Learning

Others

Contact & Feedback

Owner

Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN"

A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Detector for Log4Shell exploitation attempts

Text completion with Hugging Face and TensorFlow.js running on Node.js

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

PuppetGAN - Cross-Domain Feature Disentanglement and Manipulation just got way better! 🚀