ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Overview

ImageBART

NeurIPS 2021

teaser
Patrick Esser*, Robin Rombach*, Andreas Blattmann*, Björn Ommer
* equal contribution

arXiv | BibTeX | Poster

Requirements

A suitable conda environment named imagebart can be created and activated with:

conda env create -f environment.yaml
conda activate imagebart

Get the Models

We provide pretrained weights and hyperparameters for models trained on the following datasets:

Download the respective files and extract their contents to a directory ./models/.

Moreover, we provide all the required VQGANs as a .zip at https://ommer-lab.com/files/vqgan.zip, which contents have to be extracted to ./vqgan/.

Get the Data

Running the training configs or the inpainting script requires a dataset available locally. For ImageNet and FFHQ, see this repo's parent directory taming-transformers. The LSUN datasets can be conveniently downloaded via the script available here. We performed a custom split into training and validation images, and provide the corresponding filenames at https://ommer-lab.com/files/lsun.zip. After downloading, extract them to ./data/lsun. The beds/cats/churches subsets should also be placed/symlinked at ./data/lsun/bedrooms/./data/lsun/cats/./data/lsun/churches, respectively.

Inference

Unconditional Sampling

We provide a script for sampling from unconditional models trained on the LSUN-{bedrooms,bedrooms,cats}- and FFHQ-datasets.

FFHQ

On the FFHQ dataset, we provide two distinct pretrained models, one with a chain of length 4 and a geometric noise schedule as proposed by Sohl-Dickstein et al. [1] , and another one with a chain of length 2 and a custom schedule. These models can be started with

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/ffhq/<config>

LSUN

For the models trained on the LSUN-datasets, use

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/lsun/<config>

Class Conditional Sampling on ImageNet

To sample from class-conditional ImageNet models, use

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/imagenet/<config>

Image Editing with Unconditional Models

We also provide a script for image editing with our unconditional models. For our FFHQ-model with geometric schedule this can be started with

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/inpaint_imagebart.py configs/sampling/ffhq/ffhq_4scales_geometric.yaml

resulting in samples similar to the following. teaser

Training

In general, there are two options for training the autoregressive transition probabilities of the reverse Markov chain: (i) train them jointly, taking into account a weighting of the individual scale contributions, or (ii) train them independently, which means that each training process optimizes a single transition and the scales must be stacked after training. We conduct most of our experiments using the latter option, but provide configurations for both cases.

Training Scales Independently

For training scales independently, each transition requires a seperate optimization process, which can started via

CUDA_VISIBLE_DEVICES=
   
     python main.py --base configs/
    /
     
      .yaml -t --gpus 0, 

     
   

We provide training configs for a four scale training of FFHQ using a geometric schedule, a four scale geometric training on ImageNet and various three-scale experiments on LSUN. See also the overview of our pretrained models.

Training Scales Jointly

For completeness, we also provide a config to run a joint training with 4 scales on FFHQ. Training can be started by running

CUDA_VISIBLE_DEVICES=
   
     python main.py --base configs/ffhq/ffhq_4_scales_joint-training.yaml -t --gpus 0, 

   

Shout-Outs

Many thanks to all who make their work and implementations publicly available. For this work, these were in particular:

teaser

References

[1] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S.. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. Proceedings of the 32nd International Conference on Machine Learning

Bibtex

@article{DBLP:journals/corr/abs-2108-08827,
  author    = {Patrick Esser and
               Robin Rombach and
               Andreas Blattmann and
               Bj{\"{o}}rn Ommer},
  title     = {ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive
               Image Synthesis},
  journal   = {CoRR},
  volume    = {abs/2108.08827},
  year      = {2021}
}
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles This repository contains a method to generate 3D conformer ensembles direct

127 Dec 20, 2022
Bringing Characters to Life with Computer Brains in Unity

AI4Animation: Deep Learning for Character Control This project explores the opportunities of deep learning for character animation and control as part

Sebastian Starke 5.5k Jan 04, 2023
CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields Paper | Supplementary | Video | Poster If you find our code or paper useful, please

26 Nov 29, 2022
A more easy-to-use implementation of KPConv

A more easy-to-use implementation of KPConv This repo contains a more easy-to-use implementation of KPConv based on PyTorch. Introduction KPConv is a

Zheng Qin 35 Dec 14, 2022
Histocartography is a framework bringing together AI and Digital Pathology

Documentation | Paper Welcome to the histocartography repository! histocartography is a python-based library designed to facilitate the development of

155 Nov 23, 2022
Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

Pano3D A Holistic Benchmark and a Solid Baseline for 360o Depth Estimation Pano3D is a new benchmark for depth estimation from spherical panoramas. We

Visual Computing Lab, Information Technologies Institute, Centre for Reseach and Technology Hellas 50 Dec 29, 2022
Python binding for Khiva library.

Khiva-Python Build Documentation Build Linux and Mac OS Build Windows Code Coverage README This is the Khiva Python binding, it allows the usage of Kh

Shapelets 46 Oct 16, 2022
Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering

Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering This repository provides the source code of "Consensus Learning

SeongKu-Kang 6 Apr 29, 2022
StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking Datasets You can download datasets that have been pre-pr

25 May 29, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

PeterZhouSZ 49 Oct 31, 2022
General Assembly Capstone: NBA Game Predictor

Project 6: Predicting NBA Games Problem Statement Can I predict the results of NBA games from the back-half of a season from the opening half of the s

Adam Muhammad Klesc 1 Jan 14, 2022
Pytorch-3dunet - 3D U-Net model for volumetric semantic segmentation written in pytorch

pytorch-3dunet PyTorch implementation 3D U-Net and its variants: Standard 3D U-Net based on 3D U-Net: Learning Dense Volumetric Segmentation from Spar

Adrian Wolny 1.3k Dec 28, 2022
This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Swin Transformer This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8. Introd

maggiez 87 Dec 21, 2022
Random Walk Graph Neural Networks

Random Walk Graph Neural Networks This repository is the official implementation of Random Walk Graph Neural Networks. Requirements Code is written in

Giannis Nikolentzos 38 Jan 02, 2023
Data Augmentation with Variational Autoencoders

Documentation Pyraug This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging con

112 Nov 30, 2022
Repo for the ACMMM20 submission: "Personalized breath based biometric authentication with wearable multimodality".

personalized-breath Repo for the ACMMM20 submission: "Personalized breath based biometric authentication with wearable multimodality". Guideline To ex

Manh-Ha Bui 2 Nov 15, 2021
[ICML 2022] The official implementation of Graph Stochastic Attention (GSAT).

Graph Stochastic Attention (GSAT) The official implementation of GSAT for our paper: Interpretable and Generalizable Graph Learning via Stochastic Att

85 Nov 27, 2022
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

94 Nov 22, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022