Image-to-image translation with conditional adversarial nets

Overview

pix2pix

Project | Arxiv | PyTorch

Torch implementation for learning a mapping from input images to output images, for example:

Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
CVPR, 2017.

On some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days.

Note: Please check out our PyTorch implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable to or better than this Torch version.

Setup

Prerequisites

  • Linux or OSX
  • NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)

Getting Started

luarocks install nngraph
luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
  • Clone this repo:
git clone [email protected]:phillipi/pix2pix.git
cd pix2pix
bash ./datasets/download_dataset.sh facades
  • Train the model
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua
  • (CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables gpu=0 cudnn=0 forces CPU only
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua
  • (Optionally) start the display server to view results as the model trains. ( See Display UI for more details):
th -ldisplay.start 8000 0.0.0.0
  • Finally, test the model:
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA phase=val th test.lua

The test results will be saved to an html file here: ./results/facades_generation/latest_net_G_val/index.html.

Train

DATA_ROOT=/path/to/data/ name=expt_name which_direction=AtoB th train.lua

Switch AtoB to BtoA to train translation in opposite direction.

Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir=your_dir in train.lua).

See opt in train.lua for additional training options.

Test

DATA_ROOT=/path/to/data/ name=expt_name which_direction=AtoB phase=val th test.lua

This will run the model named expt_name in direction AtoB on all images in /path/to/data/val.

Result images, and a webpage to view them, are saved to ./results/expt_name (can be changed by passing results_dir=your_dir in test.lua).

See opt in test.lua for additional testing options.

Datasets

Download the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data.

bash ./datasets/download_dataset.sh dataset_name

Models

Download the pre-trained models with the following script. You need to rename the model (e.g., facades_label2image to /checkpoints/facades/latest_net_G.t7) after the download has finished.

bash ./models/download_model.sh model_name
  • facades_label2image (label -> facade): trained on the CMP Facades dataset.
  • cityscapes_label2image (label -> street scene): trained on the Cityscapes dataset.
  • cityscapes_image2label (street scene -> label): trained on the Cityscapes dataset.
  • edges2shoes (edge -> photo): trained on UT Zappos50K dataset.
  • edges2handbags (edge -> photo): trained on Amazon handbags images.
  • day2night (daytime scene -> nighttime scene): trained on around 100 webcams.

Setup Training and Test data

Generating Pairs

We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A:

Create folder /path/to/data with subfolders A and B. A and B should each have their own subfolders train, val, test, etc. In /path/to/data/A/train, put training images in style A. In /path/to/data/B/train, put the corresponding images in style B. Repeat same for other data splits (val, test, etc).

Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g., /path/to/data/A/train/1.jpg is considered to correspond to /path/to/data/B/train/1.jpg.

Once the data is formatted this way, call:

python scripts/combine_A_and_B.py --fold_A /path/to/data/A --fold_B /path/to/data/B --fold_AB /path/to/data

This will combine each pair of images (A,B) into a single image file, ready for training.

Notes on Colorization

No need to run combine_A_and_B.py for colorization. Instead, you need to prepare some natural images and set preprocess=colorization in the script. The program will automatically convert each RGB image into Lab color space, and create L -> ab image pair during the training. Also set input_nc=1 and output_nc=2.

Extracting Edges

We provide python and Matlab scripts to extract coarse edges from photos. Run scripts/edges/batch_hed.py to compute HED edges. Run scripts/edges/PostprocessHED.m to simplify edges with additional post-processing steps. Check the code documentation for more details.

Evaluating Labels2Photos on Cityscapes

We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed caffe (and pycaffe) in your system. If not, see the official website for installation instructions. Once caffe is successfully installed, download the pre-trained FCN-8s semantic segmentation model (512MB) by running

bash ./scripts/eval_cityscapes/download_fcn8s.sh

Then make sure ./scripts/eval_cityscapes/ is in your system's python path. If not, run the following command to add it

export PYTHONPATH=${PYTHONPATH}:./scripts/eval_cityscapes/

Now you can run the following command to evaluate your predictions:

python ./scripts/eval_cityscapes/evaluate.py --cityscapes_dir /path/to/original/cityscapes/dataset/ --result_dir /path/to/your/predictions/ --output_dir /path/to/output/directory/

Images stored under --result_dir should contain your model predictions on the Cityscapes validation split, and have the original Cityscapes naming convention (e.g., frankfurt_000001_038418_leftImg8bit.png). The script will output a text file under --output_dir containing the metric.

Further notes: Our pre-trained FCN model is not supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are then upsampled to 1024x2048 during training. The purpose of the resizing during training was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need to change the standard FCN training code and the architecture for Cityscapes. During test time, you need to synthesize 256x256 results. Our test code will automatically upsample your results to 1024x2048 before feeding them to the pre-trained FCN model. The output is at 1024x2048 resolution and will be compared to 1024x2048 ground truth labels. You do not need to resize the ground truth labels. The best way to verify whether everything is correct is to reproduce the numbers for real images in the paper first. To achieve it, you need to resize the original/real Cityscapes images (not labels) to 256x256 and feed them to the evaluation code.

Display UI

Optionally, for displaying images during training and test, use the display package.

  • Install it with: luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
  • Then start the server with: th -ldisplay.start
  • Open this URL in your browser: http://localhost:8000

By default, the server listens on localhost. Pass 0.0.0.0 to allow external connections on any interface:

th -ldisplay.start 8000 0.0.0.0

Then open http://(hostname):(port)/ in your browser to load the remote desktop.

L1 error is plotted to the display by default. Set the environment variable display_plot to a comma-separated list of values errL1, errG and errD to visualize the L1, generator, and discriminator error respectively. For example, to plot only the generator and discriminator errors to the display instead of the default L1 error, set display_plot="errG,errD".

Citation

If you use this code for your research, please cite our paper Image-to-Image Translation Using Conditional Adversarial Networks:

@article{pix2pix2017,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  journal={CVPR},
  year={2017}
}

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection:
[Github] [Webpage]

Acknowledgments

Code borrows heavily from DCGAN. The data loader is modified from DCGAN and Context-Encoder.

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文 Mobile AI Compute Engine (or MACE for short) is a deep learning i

Xiaomi 4.7k Dec 29, 2022
Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

kenro515 3 Jan 04, 2023
Establishing Strong Baselines for TripClick Health Retrieval; ECIR 2022

TripClick Baselines with Improved Training Data Welcome 🙌 to the hub-repo of our paper: Establishing Strong Baselines for TripClick Health Retrieval

Sebastian Hofstätter 3 Nov 03, 2022
Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction on how to make use of the S

RISHABH MISHRA 1 Feb 13, 2022
High-fidelity 3D Model Compression based on Key Spheres

High-fidelity 3D Model Compression based on Key Spheres This repository contains the implementation of the paper: Yuanzhan Li, Yuqi Liu, Yujie Lu, Siy

5 Oct 11, 2022
Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer

AdaConv Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer from "Adaptive Convolutions for Structure-

65 Dec 22, 2022
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system

Recommender-Systems Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system So the data

Yash Kumar 0 Jan 20, 2022
Pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks."

alpha-GAN Unofficial pytorch implementation of Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks." arXi

Victor Shepardson 78 Dec 08, 2022
MNIST, but with Bezier curves instead of pixels

bezier-mnist This is a work-in-progress vector version of the MNIST dataset. Samples Here are some samples from the training set. Note that, while the

Alex Nichol 15 Jan 16, 2022
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 05, 2023
AI drive app that can help user become beautiful.

爱美丽 Beauty 简体中文 Features Beauty is an AI drive app that can help user become beautiful. it contain those functions: face score cheek face beauty repor

Starved Midnight 1 Jan 30, 2022
Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

Topographic Variational Autoencoder Paper: https://arxiv.org/abs/2109.01394 Getting Started Install requirements with Anaconda: conda env create -f en

T. Andy Keller 69 Dec 12, 2022
This repository contains all code and data for the Inside Out Visual Place Recognition task

Inside Out Visual Place Recognition This repository contains code and instructions to reproduce the results for the Inside Out Visual Place Recognitio

15 May 21, 2022
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 05, 2023
Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

SCAPT-ABSA Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training" Overvie

Zhengyan Li 66 Dec 04, 2022
Code base for reproducing results of I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics. NeurIPS (2021)

Learning to Execute (L2E) Official code base for completely reproducing all results reported in I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learnin

3 May 18, 2022
Simple embedding based text classifier inspired by fastText, implemented in tensorflow

FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of

Alan Patterson 306 Dec 02, 2022
Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit

streamlit-manim Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit Installation I had to install pango with sudo apt-get

Adrien Treuille 6 Aug 03, 2022