Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Overview

Patch2Pix for Accurate Image Correspondence Estimation

This repository contains the Pytorch implementation of our paper accepted at CVPR2021: Patch2Pix: Epipolar-Guided Pixel-Level Correspondences. [Paper] [Video].

Overview To use our code, first download the repository:

git clone [email protected]:GrumpyZhou/patch2pix.git

Setup Running Environment

The code has been tested on Ubuntu (16.04&18.04) with Python 3.7 + Pytorch 1.7.0 + CUDA 10.2.
We recommend to use Anaconda to manage packages and reproduce the paper results. Run the following lines to automatically setup a ready environment for our code.

conda env create -f environment.yml
conda activte patch2pix

Download Pretrained Models

In order to run our examples, one needs to first download our pretrained Patch2Pix model. To further train a Patch2Pix model, one needs to download the pretrained NCNet. We provide the download links in pretrained/download.sh. To download both, one can run

cd pretrained
bash download.sh

Evaluation

❗️ NOTICE ❗️ : In this repository, we only provide examples to estimate correspondences using our Patch2Pix implemenetation.

To reproduce our evalutions on HPatches, Aachen and InLoc benchmarks, we refer you to our toolbox for image matching: image-matching-toolbox. There, you can also find implementation to reproduce the results of other state-of-the-art methods that we compared to in our paper.

Matching Examples

In our notebook examples/visualize_matches.ipynb , we give examples how to obtain matches given a pair of images using both Patch2Pix (our pretrained) and NCNet (our adapted). The example image pairs are borrowed from D2Net, one can easily replace it with your own examples.

Training

Notice the followings are necessary only if you want to train a model yourself.

Data preparation

We use MegaDepth dataset for training. To keep more data for training, we didn't split a validation set from MegaDepth. Instead we use the validation splits of PhotoTourism. The following steps describe how to prepare the same training and validation data that we used.

Preapre Training Data

  1. We preprocess MegaDepth dataset following the preprocessing steps proposed by D2Net. For details, please checkout their "Downloading and preprocessing the MegaDepth dataset" section in their github documentation.

  2. Then place the processed MegaDepth dataset under data/ folder and name it as MegaDepth_undistort (or create a symbolic link for it).

  3. One can directly download our pre-computred training pairs using our download script.

cd data_pairs
bash download.sh

In case one wants to generate pairs with different settings, we provide notebooks to generate pairs from scratch. Once you finish step 1 and 2, the training pairs can be generated using our notebook data_pairs/prep_megadepth_training_pairs.ipynb.

Preapre Validation Data

  1. Use our script to dowload and extract the subset of train and val sequences from the PhotoTourism dataset.
cd data
bash prepare_immatch_val_data.sh
  1. Precompute image pairwise overlappings for fast loading of validation pairs.
# Under the root folder: patch2pix/
python -m data_pairs.precompute_immatch_val_ovs \
		--data_root data/immatch_benchmark/val_dense

Training Examples

To train our best model:

python -m train_patch2pix --gpu 0 \
    --epochs 25 --batch 4 \
    --save_step 1 --plot_counts 20 --data_root 'data' \
    --change_stride --panc 8 --ptmax 400 \
    --pretrain 'pretrained/ncn_ivd_5ep.pth' \
    -lr 0.0005 -lrd 'multistep' 0.2 5 \
    --cls_dthres 50 5 --epi_dthres 50 5  \
    -o 'output/patch2pix' 

The above command will save the log file and checkpoints to the output folder specified by -o. Our best model was trained on a 48GB GPU. To train on a smaller GPU, e.g, with 12 GB, one can either set --batch 1 or --ptmax 250 which defines the maximum number of match proposals to be refined for each image pair. However, those changes might also decrease the training performance according to our experience. Notice, during the testing, our network only requires 12GB GPU.

Usage of Visdom Server Our training script is coded to monitor the training process using Visdom. To enable the monitoring, one needs to:

  1. Run a visdom sever on your localhost, for example:
# Feel free to change the port
python -m visdom.server -port 9333 \
-env_path ~/.visdom/patch2pix
  1. Append options -vh 'localhost' -vp 9333 to the commands of the training example above.

BibTeX

If you use our method or code in your project, please cite our paper:

@inproceedings{ZhouCVPRpatch2pix,
        author       = "Zhou, Qunjie and Sattler, Torsten and Leal-Taixe, Laura",
        title        = "Patch2Pix: Epipolar-Guided Pixel-Level Correspondences",
        booktitle    = "CVPR",
        year         = 2021,
}
Owner
Qunjie Zhou
PhD Candidate at the Dynamic Vision and Learning Group.
Qunjie Zhou
Understanding Hyperdimensional Computing for Parallel Single-Pass Learning

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning Authors: Tao Yu* Yichi Zhang* Zhiru Zhang Christopher De Sa *: Equal Contri

Cornell RelaxML 4 Sep 08, 2022
TensorFlow (Python API) implementation of Neural Style

neural-style-tf This is a TensorFlow implementation of several techniques described in the papers: Image Style Transfer Using Convolutional Neural Net

Cameron 3.1k Jan 02, 2023
Code and data for "TURL: Table Understanding through Representation Learning"

TURL This Repo contains code and data for "TURL: Table Understanding through Representation Learning". Environment and Setup Data Pretraining Finetuni

SunLab-OSU 63 Nov 23, 2022
This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Vietnamese sign lagnuage recognition using MHI and CNN This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm

Phat Pham 3 Feb 24, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 06, 2023
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python=3.7 pytorch=1.6.0 torchvision=0.8

Yunfan Li 210 Dec 30, 2022
Specificity-preserving RGB-D Saliency Detection

Specificity-preserving RGB-D Saliency Detection Authors: Tao Zhou, Huazhu Fu, Geng Chen, Yi Zhou, Deng-Ping Fan, and Ling Shao. 1. Preface This reposi

Tao Zhou 35 Jan 08, 2023
Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Utkarsh Agiwal 1 Feb 03, 2022
Training DiffWave using variational method from Variational Diffusion Models.

Variational DiffWave Training DiffWave using variational method from Variational Diffusion Models. Quick Start python train_distributed.py discrete_10

Chin-Yun Yu 26 Dec 13, 2022
Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Official code of Team Yao at Multi-Modal-Fact-Verification-2022 A Multi-Modal Fact Verification dataset released as part of the De-Factify workshop in

Wei-Yao Wang 11 Nov 15, 2022
[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

SSVC The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning] samples of the

7 Oct 26, 2022
Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative Adversarial Neural Networks

ForecastingNonverbalSignals This is the implementation for the paper Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative A

1 Feb 10, 2022
Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

SymEngine Python Wrappers Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library. Installation Pip See License section

136 Dec 28, 2022
Exploring whether attention is necessary for vision transformers

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet Paper/Report TL;DR We replace the attention layer in a v

Luke Melas-Kyriazi 461 Jan 07, 2023
Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

FAR Planner uses a dynamically updated visibility graph for fast replanning. The planner models the environment with polygons and builds a global visi

Fan Yang 346 Dec 30, 2022
A Streamlit component to render ECharts.

Streamlit - ECharts A Streamlit component to display ECharts. Install pip install streamlit-echarts Usage This library provides 2 functions to display

Fanilo Andrianasolo 290 Dec 30, 2022
这是一个yolox-pytorch的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤

Bubbliiiing 613 Jan 05, 2023
Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

README The code is based on the ILswiss. To run the code, use python run_experiment.py --nosrun -e your YAML file -g gpu id Generally, run_experim

ApexRL 12 Mar 19, 2022
A robotic arm that mimics hand movement through MediaPipe tracking.

La-Z-Arm A robotic arm that mimics hand movement through MediaPipe tracking. Hardware NVidia Jetson Nano Sparkfun Pi Servo Shield Micro Servos Webcam

Alfred 1 Jun 05, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022