Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Last update: Dec 16, 2022

Overview

SETR - Pytorch

Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official code,I implemented SETR-Progressive UPsampling(SETR-PUP) using pytorch.

Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Vit

The Vit model is also implemented, and you can use it for image classification.

Usage SETR

from SETR.transformer_seg import SETRModel
import torch 

if __name__ == "__main__":
    net = SETRModel(patch_size=(32, 32), 
                    in_channels=3, 
                    out_channels=1, 
                    hidden_size=1024, 
                    num_hidden_layers=8, 
                    num_attention_heads=16, 
                    decode_features=[512, 256, 128, 64])
    t1 = torch.rand(1, 3, 256, 256)
    print("input: " + str(t1.shape))
    
    # print(net)
    print("output: " + str(net(t1).shape))

If the output size is (1, 1, 256, 256), the code runs successfully.

Usage Vit

from SETR.transformer_seg import Vit
import torch 

if __name__ == "__main__":
    model = Vit(patch_size=(7, 7), 
                    in_channels=1, 
                    out_class=10, 
                    hidden_size=1024, 
                    num_hidden_layers=1, 
                    num_attention_heads=16)
    print(model)
    t1 = torch.rand(1, 1, 28, 28)
    print("input: " + str(t1.shape))

    print("output: " + str(model(t1).shape))

The output shape is (1, 10).

current examples

task_mnist: The simplest example, using the Vit model to classify the minst dataset.
task_car_seg: The example is sample segmentation task. data download: https://www.kaggle.com/c/carvana-image-masking-challenge/data

More examples will be updated later.

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Related tags

Overview

SETR - Pytorch

Vit

Usage SETR

Usage Vit

current examples

more

Owner

zhaohu xing

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

PyTorch implementation of MulMON

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

Group-Free 3D Object Detection via Transformers

An investigation project for SISR.

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation（Unfinished）

Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

Backend code to use MCPI's python API to make infinite worlds with custom generation

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

An implementation for Neural Architecture Search with Random Labels (CVPR 2021 poster) on Pytorch.

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

SwinIR: Image Restoration Using Swin Transformer

PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.