A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.

Related tags

Deep LearningLACE
Overview

Controllable and Compositional Generation with Latent-Space Energy-Based Models

Python 3.8 pytorch 1.7.1 Torchdiffeq 0.2.1

Teaser image Teaser image

Official PyTorch implementation of the NeurIPS 2021 paper:
Controllable and Compositional Generation with Latent-Space Energy-Based Models
Weili Nie, Arash Vahdat, Anima Anandkumar
https://nvlabs.github.io/LACE

Abstract: Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1 high-end NVIDIA GPU with at least 24 GB of memory. We have done all testing and development using a single NVIDIA V100 GPU with memory size 32 GB.
  • 64-bit Python 3.8.
  • CUDA=10.0 and docker must be installed first.
  • Installation of the required library dependencies with Docker:
    docker build -f lace-cuda-10p0.Dockerfile --tag=lace-cuda-10-0:0.0.1 .
    docker run -it -d --gpus 0 --name lace --shm-size 8G -v $(pwd):/workspace -p 5001:6006 lace-cuda-10-0:0.0.1
    docker exec -it lace bash

Experiments on CIFAR-10

The CIFAR10 folder contains the codebase to get the main results on the CIFAR-10 dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., the latent code and label pairs) from here and unzip it to the CIFAR10 folder. Or you can go to the folder CIFAR10/prepare_data and follow the instructions to generate the data.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

In the script run_clf.sh, the variable x can be specified to w or z, representing that the latent classifier is trained in the w-space or z-space of StyleGAN, respectively.

Sampling

To get the conditional sampling results with the ODE or Langevin dynamics (LD) sampler, you can run:

# ODE
bash scripts/run_cond_ode_sample.sh

# LD
bash scripts/run_cond_ld_sample.sh

By default, we set x to w, meaning we use the w-space classifier, because we find our method works the best in w-space. You can change the value of x to z or i to use the classifier in z-space or pixel space, for a comparison.

To compute the conditional accuracy (ACC) and FID scores in conditional sampling with the ODE or LD sampler, you can run:

# ODE
bash scripts/run_cond_ode_score.sh

# LD
bash scripts/run_cond_ld_score.sh

Note that:

  1. For the ACC evaluation, you need a pre-trained image classifier, which can be downloaded as instructed here;

  2. For the FID evaluation, you need to have the FID reference statistics computed beforehand. You can go to the folder CIFAR10/prepare_data and follow the instructions to compute the FID reference statistics with real images sampled from CIFAR-10.

Experiments on FFHQ

The FFHQ folder contains the codebase for getting the main results on the FFHQ dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., 10k pairs of latent variables and labels) from here (originally from StyleFlow) and unzip it to the FFHQ folder.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

Note that each att_name (i.e., glasses) in run_clf.sh corresponds to a separate attribute classifier.

Sampling

First, you have to get the pre-trained StyleGAN2 (config-f) by following the instructions in Convert StyleGAN2 weight from official checkpoints.

Conditional sampling

To get the conditional sampling results with the ODE or LD sampler, you can run:

# ODE
bash scripts/run_cond_ode_sample.sh

# LD
bash scripts/run_cond_ld_sample.sh

To compute the conditional accuracy (ACC) and FID scores in conditional sampling with the ODE or LD sampler, you can run:

# ODE
bash scripts/run_cond_ode_score.sh

# LD
bash scripts/run_cond_ld_score.sh

Note that:

  1. For the ACC evaluation, you need to train an FFHQ image classifier, as instructed here;

  2. For the FID evaluation, you need to have the FID reference statistics computed beforehand. You can go to the folder FFHQ/prepare_models_data and follow the instructions to compute the FID reference statistics with the StyleGAN generated FFHQ images.

Sequential editing

To get the qualitative and quantitative results of sequential editing, you can run:

# User-specified sampling
bash scripts/run_seq_edit_sample.sh

# ACC and FID
bash scripts/run_seq_edit_score.sh

Note that:

  • Similarly, you first need to train an FFHQ image classifier and get the FID reference statics to compute ACC and FID score by following the instructions, respectively.

  • To get the face identity preservation (ID) score, you first need to download the pre-trained ArcFace network, which is publicly available here, to the folder FFHQ/pretrained/metrics.

Compositional Generation

To get the results of zero-shot generation on novel attribute combinations, you can run:

bash scripts/run_zero_shot.sh

To get the results of compositions of energy functions with logical operators, we run:

bash scripts/run_combine_energy.sh

Experiments on MetFaces

The MetFaces folder contains the codebase for getting the main results on the MetFaces dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., 10k pairs of latent variables and labels) from here and unzip it to the MetFaces folder. Or you can go to the folder MetFaces/prepare_data and follow the instructions to generate the data.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

Note that each att_name (i.e., yaw) in run_clf.sh corresponds to a separate attribute classifier.

Sampling

To get the conditional sampling and sequential editing results, you can run:

# conditional sampling
bash scripts/run_cond_sample.sh

# sequential editing
bash scripts/run_seq_edit_sample.sh

Experiments on AFHQ-Cats

The AFHQ folder contains the codebase for getting the main results on the AFHQ-Cats dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., 10k pairs of latent variables and labels) from here and unzip it to the AFHQ folder. Or you can go to the folder AFHQ/prepare_data and follow the instructions to generate the data.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

Note that each att_name (i.e., breeds) in run_clf.sh corresponds to a separate attribute classifier.

Sampling

To get the conditional sampling and sequential editing results, you can run:

# conditional sampling
bash scripts/run_cond_sample.sh

# sequential editing
bash scripts/run_seq_edit_sample.sh

License

Please check the LICENSE file. This work may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact [email protected].

Citation

Please cite our paper, if you happen to use this codebase:

@inproceedings{nie2021controllable,
  title={Controllable and compositional generation with latent-space energy-based models},
  author={Nie, Weili and Vahdat, Arash and Anandkumar, Anima},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2021}
}
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
YOLOv4-v3 Training Automation API for Linux

This repository allows you to get started with training a state-of-the-art Deep Learning model with little to no configuration needed! You provide your labeled dataset or label your dataset using our

BMW TechOffice MUNICH 626 Dec 31, 2022
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
Fully Connected DenseNet for Image Segmentation

Fully Connected DenseNets for Semantic Segmentation Fully Connected DenseNet for Image Segmentation implementation of the paper The One Hundred Layers

Somshubra Majumdar 84 Oct 31, 2022
Annotate with anyone, anywhere.

h h is the web app that serves most of the https://hypothes.is/ website, including the web annotations API at https://hypothes.is/api/. The Hypothesis

Hypothesis 2.6k Jan 08, 2023
Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

1 Jan 16, 2022
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

Manli 3 Oct 15, 2022
Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

THESIS_CAIRONE_FIORENTINO Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques" GENERATE TOKE

cairone_fiorentino97 1 Dec 10, 2021
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022
Face Recognition Attendance Project

Face-Recognition-Attendance-Project In This Project You will learn how to mark attendance using face recognition, Hello Guys This is Gautam Kumar, Thi

Gautam Kumar 1 Dec 03, 2022
The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis (NeurIPS 2021) Project Page | Paper Xudong Xu, Xingang Pan, Dahua Lin and Bo Dai GOF

xuxudong 97 Nov 10, 2022
Flaxformer: transformer architectures in JAX/Flax

Flaxformer is a transformer library for primarily NLP and multimodal research at Google.

Google 116 Jan 05, 2023
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery This is a public code repository for the publication: i-SpaSP: Structured Neural Pruning

Cameron Ronald Wolfe 5 Nov 04, 2022
The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

ISC-Track1-Submission The codes and related files to reproduce the results for Image Similarity Challenge Track 1. Required dependencies To begin with

Wenhao Wang 115 Jan 02, 2023
This program writes christmas wish programmatically. It is using turtle as a pen pointer draw christmas trees and stars.

Introduction This is a simple program is written in python and turtle library. The objective of this program is to wish merry Christmas programmatical

Gunarakulan Gunaretnam 1 Dec 25, 2021
Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019) Introduction Official implementation of Dynamic Multi-scale Filters for Semant

23 Oct 21, 2022
The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

Wentao Xu 24 Dec 05, 2022
🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Microsoft Edge TTS for Home Assistant This component is based on the TTS service of Microsoft Edge browser, no need to apply for app_key. Install Down

152 Dec 31, 2022
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.

16 Dec 22, 2022