(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"

Overview

(CVPR 2022) TokenCut

Pytorch implementation of Tokencut:

Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Yangtao Wang, Xi Shen, Shell Xu Hu, Yuan Yuan, James L. Crowley, Dominique Vaufreydaz

[Project page] [Paper] Colab demo Hugging Face Spaces

TokenCut teaser

If our project is helpful for your research, please consider citing :

@inproceedings{wang2022tokencut,
          title={Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut},
          author={Wang, Yangtao and Shen, Xi and Hu, Shell Xu and Yuan, Yuan and Crowley, James L. and Vaufreydaz, Dominique},
          booktitle={Conference on Computer Vision and Pattern Recognition}
          year={2022}
        }

Table of Content

1. Updates

03/10/2022 Creating a 480p Demo using Gradio. Try out the Web Demo: Hugging Face Spaces

Internet image results:

TokenCut visualizations TokenCut visualizations TokenCut visualizations TokenCut visualizations

02/26/2022 Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

02/26/2022 A simple TokenCut Colab Demo is available.

02/21/2022 Initial commit: Code of TokenCut is released, including evaluation of unsupervised object discovery, unsupervised saliency object detection, weakly supervised object locolization.

2. Installation

2.1 Dependencies

This code was implemented with Python 3.7, PyTorch 1.7.1 and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed :

pip install torch==1.7.1 torchvision==0.8.2

In order to install the additionnal dependencies, please launch the following command:

pip install -r requirements.txt

2.2 Data

We provide quick download commands in DOWNLOAD_DATA.md for VOC2007, VOC2012, COCO, CUB, ImageNet, ECSSD, DUTS and DUT-OMRON as well as DINO checkpoints.

3. Quick Start

3.1 Detecting an object in one image

We provide TokenCut visualization for bounding box prediction and attention map. Using all for all visualization results.

python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize pred
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize attn
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize all 

3.2 Segmenting a salient region in one image

We provide TokenCut segmentation results as follows:

cd unsupervised_saliency_detection 
python get_saliency.py --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --vit-arch small --patch-size 16 --img-path ../examples/VOC07_000036.jpg --out-dir ./output

4. Evaluation

Following are the different steps to reproduce the results of TokenCut presented in the paper.

4.1 Unsupervised object discovery

TokenCut visualizations TokenCut visualizations TokenCut visualizations

PASCAL-VOC

In order to apply TokenCut and compute corloc results (VOC07 68.8, VOC12 72.1), please launch:

python main_tokencut.py --dataset VOC07 --set trainval
python main_tokencut.py --dataset VOC12 --set trainval

If you want to extract Dino features, which corresponds to the KEY features in DINO:

mkdir features
python main_lost.py --dataset VOC07 --set trainval --save-feat-dir features/VOC2007

COCO

Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 58.8), following previous litterature. To be noted that the 20k images are a subset of the train set.

python main_tokencut.py --dataset COCO20k --set train

Different models

We have tested the method on different setups of the VIT model, corloc results are presented in the following table (more can be found in the paper).

arch pre-training dataset
VOC07 VOC12 COCO20k
ViT-S/16 DINO 68.8 72.1 58.8
ViT-S/8 DINO 67.3 71.6 60.7
ViT-B/16 DINO 68.8 72.4 59.0

Previous results on the dataset VOC07 can be obtained by launching:

python main_tokencut.py --dataset VOC07 --set trainval #VIT-S/16
python main_tokencut.py --dataset VOC07 --set trainval --patch_size 8 #VIT-S/8
python main_tokencut.py --dataset VOC07 --set trainval --arch vit_base #VIT-B/16

4.2 Unsupervised saliency detection

TokenCut visualizations TokenCut visualizations TokenCut visualizations

To evaluate on ECSSD, DUTS, DUT_OMRON dataset:

python get_saliency.py --out-dir ECSSD --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset ECSSD

python get_saliency.py --out-dir DUTS --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset DUTS

python get_saliency.py --out-dir DUT --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset DUT

This should give:

Method ECSSD DUTS DUT-OMRON
maxF IoU Acc maxF IoU Acc maxF IoU Acc
TokenCut 80.3 71.2 91.8 67.2 57.6 90.3 60.0 53.3 88.0
TokenCut + BS 87.4 77.2 93.4 75.5 62,4 91.4 69.7 61.8 89.7

4.3 Weakly supervised object detection

TokenCut visualizations TokenCut visualizations TokenCut visualizations

Fintune DINO small on CUB

To finetune ViT-S/16 on CUB on a single node with 4 gpus for 1000 epochs run:

python -m torch.distributed.launch --nproc_per_node=4 main.py --data_path /path/to/data --batch_size_per_gpu 256 --dataset cub --weight_decay 0.005 --pretrained_weights ./dino_deitsmall16_pretrain.pth --epoch 1000 --output_dir ./path/to/checkpoin --lr 2e-4 --warmup-epochs 50 --num_labels 200 --num_workers 16 --n_last_blocks 1 --avgpool_patchtokens true --arch vit_small --patch_size 16

Evaluation on CUB

To evaluate a fine-tuned ViT-S/16 on CUB val with a single GPU run:

python eval.py --pretrained_weights ./path/to/checkpoint --dataset cub --data_path ./path/to/data --batch_size_per_gpu 1 --no_center_crop

This should give:

Top1_cls: 79.12, top5_cls94.80, gt_loc: 0.914, top1_loc:0.723

Evaluate on Imagenet

To Evaluate ViT-S/16 finetuned on ImageNet val with a single GPU run:

python eval.py --pretrained_weights /path/to/checkpoint --classifier_weights /path/to/linear_weights--dataset imagenet --data_path ./path/to/data --batch_size_per_gpu 1 --num_labels 1000 --batch_size_per_gpu 1 --no_center_crop --input_size 256 --tau 0.2 --patch_size 16 --arch vit_small

5. Acknowledgement

TokenCut code is built on top of LOST, DINO, Segswap, and Bilateral_Sovlver. We would like to sincerely thanks those authors for their great works.

Owner
YANGTAO WANG
PhD, Computer Vision, Deep Learning
YANGTAO WANG
A large-scale database for graph representation learning

A large-scale database for graph representation learning

Scott Freitas 29 Nov 25, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Training CIFAR-10 with TensorFlow2(TF2) TensorFlow2 Classification Model Zoo. I'm playing with TensorFlow2 on the CIFAR-10 dataset. Architectures LeNe

Chia-Hung Yuan 16 Sep 27, 2022
Image De-raining Using a Conditional Generative Adversarial Network

Image De-raining Using a Conditional Generative Adversarial Network [Paper Link] [Project Page] He Zhang, Vishwanath Sindagi, Vishal M. Patel In this

He Zhang 216 Dec 18, 2022
Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

165 Dec 26, 2022
[ICML 2021] A fast algorithm for fitting robust decision trees.

GROOT: Growing Robust Trees Growing Robust Trees (GROOT) is an algorithm that fits binary classification decision trees such that they are robust agai

Cyber Analytics Lab 17 Nov 21, 2022
Repo for EchoVPR: Echo State Networks for Visual Place Recognition

EchoVPR Repo for EchoVPR: Echo State Networks for Visual Place Recognition Currently under development Dirs: data: pre-collected hidden representation

Anil Ozdemir 4 Oct 04, 2022
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.

mani 1.2k Jan 07, 2023
Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras

Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras which will then be used to generate residuals

Federico Lopez 2 Jan 14, 2022
Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

Frank Odom 27 Dec 21, 2022
An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

Joseph Lee 7 Dec 25, 2022
Telegram chatbot created with deep learning model (LSTM) and telebot library.

Telegram chatbot Telegram chatbot created with deep learning model (LSTM) and telebot library. Description This program will allow you to create very

1 Jan 04, 2022
A Pytorch implement of paper "Anomaly detection in dynamic graphs via transformer" (TADDY).

TADDY: Anomaly detection in dynamic graphs via transformer This repo covers an reference implementation for the paper "Anomaly detection in dynamic gr

Yue Tan 21 Nov 24, 2022
Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

Open nsfw model This repo contains code for running Not Suitable for Work (NSFW) classification deep neural network Caffe models. Please refer our blo

Yahoo 5.6k Jan 05, 2023
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 02, 2022
Weight initialization schemes for PyTorch nn.Modules

nninit Weight initialization schemes for PyTorch nn.Modules. This is a port of the popular nninit for Torch7 by @kaixhin. ##Update This repo has been

Alykhan Tejani 69 Jan 26, 2021
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

Graph Convolutional Gated Recurrent Neural Network (GCGRNN) Improved from Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF

Lei Lin 21 Dec 18, 2022
Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Anytime Autoregressive Model Anytime Sampling for Autoregressive Models via Ordered Autoencoding , ICLR 21 Yilun Xu, Yang Song, Sahaj Gara, Linyuan Go

Yilun Xu 22 Sep 08, 2022
codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

Eigenlearning This repo contains code for replicating the experiments of the paper A Theory of the Inductive Bias and Generalization of Kernel Regress

Jamie Simon 45 Dec 02, 2022