Deep ViT Features as Dense Visual Descriptors

Overview

dino-vit-features

[paper] [project page]

Official implementation of the paper "Deep ViT Features as Dense Visual Descriptors".

teaser

We demonstrate the effectiveness of deep features extracted from a self-supervised, pre-trained ViT model (DINO-ViT) as dense patch descriptors via real-world vision tasks: (a-b) co-segmentation & part co-segmentation: given a set of input images (e.g., 4 input images), we automatically co-segment semantically common foreground objects (e.g., animals), and then further partition them into common parts; (c-d) point correspondence: given a pair of input images, we automatically extract a sparse set of corresponding points. We tackle these tasks by applying only lightweight, simple methodologies such as clustering or binning, to deep ViT features.

Setup

Our code is developed in pytorch on and requires the following modules: tqdm, faiss, timm, matplotlib, pydensecrf, opencv, scikit-learn. We use python=3.9 but our code should be runnable on any version above 3.6. We recomment running our code with any CUDA supported GPU for faster performance. We recommend setting the running environment via Anaconda by running the following commands:

$ conda env create -f env/dino-vit-feats-env.yml
$ conda activate dino-vit-feats-env

Otherwise, run the following commands in your conda environment:

$ conda install pytorch torchvision torchaudio cudatoolkit=11 -c pytorch
$ conda install tqdm
$ conda install -c conda-forge faiss
$ conda install -c conda-forge timm 
$ conda install matplotlib
$ pip install opencv-python
$ pip install git+https://github.com/lucasb-eyer/pydensecrf.git
$ conda install -c anaconda scikit-learn

ViT Extractor

We provide a wrapper class for a ViT model to extract dense visual descriptors in extractor.py. You can extract descriptors to .pt files using the following command:

python extractor.py --image_path 
   
     --output_path 
    

    
   

You can specify the pretrained model using the --model flag with the following options:

  • dino_vits8, dino_vits16, dino_vitb8, dino_vitb16 from the DINO repo.
  • vit_small_patch8_224, vit_small_patch16_224, vit_base_patch8_224, vit_base_patch16_224 from the timm repo.

You can specify the stride of patch extracting layer to increase resolution using the --stride flag.

Part Co-segmentation Open In Colab

We provide a notebook for running on a single example in part_cosegmentation.ipynb.

To run on several image sets, arrange each set in a directory, inside a data root directory:


   
    
|
|_ 
    
     
|  |
|  |_ img1.png
|  |_ img2.png
|   
|_ 
     
      
   |
   |_ img1.png
   |_ img2.png
   |_ img3.png
...

     
    
   

The following command will produce results in the specified :

python part_cosegmentation.py --root_dir 
   
     --save_dir 
    

    
   

Note: The default configuration in part_cosegmentation.ipynb is suited for running on small sets (e.g. < 10). Increase amount of num_crop_augmentations for more stable results (and increased runtime). The default configuration in part_cosegmentation.py is suited for larger sets (e.g. >> 10).

Co-segmentation Open In Colab

We provide a notebook for running on a single example in cosegmentation.ipynb.

To run on several image sets, arrange each set in a directory, inside a data root directory:


   
    
|
|_ 
    
     
|  |
|  |_ img1.png
|  |_ img2.png
|   
|_ 
     
      
   |
   |_ img1.png
   |_ img2.png
   |_ img3.png
...

     
    
   

The following command will produce results in the specified :

python cosegmentation.py --root_dir 
   
     --save_dir 
    

    
   

Point Correspondences Open In Colab

We provide a notebook for running on a single example in correpondences.ipynb.

To run on several image pairs, arrange each image pair in a directory, inside a data root directory:


   
    
|
|_ 
    
     
|  |
|  |_ img1.png
|  |_ img2.png
|   
|_ 
     
      
   |
   |_ img1.png
   |_ img2.png
...

     
    
   

The following command will produce results in the specified :

python correspondences.py --root_dir 
   
     --save_dir 
    

    
   

Citation

If you found this repository useful please consider starring and citing :

@article{amir2021deep,
    author    = {Shir Amir and Yossi Gandelsman and Shai Bagon and Tali Dekel},
    title     = {Deep ViT Features as Dense Visual Descriptors},
    journal   = {arXiv preprint arXiv:2112.05814},
    year      = {2021}
}
Owner
Shir Amir
Graduate Student @ Weizmann Institute of Science
Shir Amir
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 03, 2023
"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation

Segmenter-based-on-OpenMMLab "Segmenter: Transformer for Semantic Segmentation, arxiv 2105.05633." reproduced via mmsegmentation. We reproduce Segment

EricKani 22 Feb 24, 2022
网络协议2天集训

网络协议2天集训 抓包工具安装 Wireshark wireshark下载地址 Tcpdump CentOS yum install tcpdump -y Ubuntu apt-get install tcpdump -y k8s抓包测试环境 查看虚拟网卡veth pair 查看

120 Dec 12, 2022
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Shoufa Chen 244 Dec 27, 2022
This repo generates the training data and the model for Morpheus-Deblend

Morpheus-Deblend This repo generates the training data and the model for Morpheus-Deblend. This is the active development repo for the project and as

Ryan Hausen 2 Apr 18, 2022
The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

REST The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies. Usage Download dataset Download

DMIRLAB 2 Mar 13, 2022
Short and long time series classification using convolutional neural networks

time-series-classification Short and long time series classification via convolutional neural networks In this project, we present a novel framework f

35 Oct 22, 2022
Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

benchmark_spaces Benchmarks of how well different two dimensional spaces work fo

Bram Cohen 6 May 07, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

77 Dec 16, 2022
CNNs for Sentence Classification in PyTorch

Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t

Shawn Ng 956 Dec 19, 2022
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

Vladimir Iashin 226 Jan 03, 2023
SSD-based Object Detection in PyTorch

SSD-based Object Detection in PyTorch 서강대학교 현대모비스 SW 프로그램에서 진행한 인공지능 프로젝트입니다. Jetson nano를 이용해 pre-trained network를 fine tuning시켜 차량 및 신호등 인식을 구현하였습니다

Haneul Kim 1 Nov 16, 2021
Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

Wenjing Wang 77 Dec 08, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrai

Hugging Face 77.4k Jan 05, 2023
Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

Xin Li 15 Nov 26, 2022
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Intro Real-time object detection and classification. Paper: version 1, version 2. Read more about YOLO (in darknet) and download weight files here. In

Trieu 6.1k Dec 30, 2022
Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

ASAPP Research 49 Oct 09, 2022
A High-Quality Real Time Upscaler for Anime Video

Anime4K Anime4K is a set of open-source, high-quality real-time anime upscaling/denoising algorithms that can be implemented in any programming langua

15.7k Jan 06, 2023
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022