Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

Related tags

Deep LearningPS-SC
Overview

PS-SC GAN

trav_animation

This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction constraints) introduced in the paper Where and What? Examining Interpretable Disentangled Representations. The code for computing the TPL for model checkpoints from disentanglemen_lib can be found in this repository.

Abstract

Capturing interpretable variations has long been one of the goals in disentanglement learning. However, unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting. In this paper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted? A latent code is easily to be interpreted if it would consistently impact a certain subarea of the resulting generated image. We thus propose to learn a spatial mask to localize the effect of each individual latent dimension. On the other hand, interpretability usually comes from latent dimensions that capture simple and basic variations in data. We thus impose a perturbation on a certain dimension of the latent code, and expect to identify the perturbation along this dimension from the generated images so that the encoding of simple variations can be enforced. Additionally, we develop an unsupervised model selection method, which accumulates perceptual distance scores along axes in the latent space. On various datasets, our models can learn high-quality disentangled representations without supervision, showing the proposed modeling of interpretability is an effective proxy for achieving unsupervised disentanglement.

Requirements

  • Python == 3.7.2
  • Numpy == 1.19.1
  • TensorFlow == 1.15.0
  • This code is based on StyleGAN2 which relies on custom TensorFlow ops that are compiled on the fly using NVCC. To test that your NVCC installation is working correctly, run:
nvcc test_nvcc.cu -o test_nvcc -run
| CPU says hello.
| GPU says hello.

Preparing datasets

CelebA. To prepare the tfrecord version of CelebA dataset, first download the original aligned-and-cropped version from http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, then use the following code to create tfrecord dataset:

python dataset_tool.py create_celeba /path/to/new_tfr_dir /path/to/downloaded_celeba_dir

For example, the new_tfr_dir can be: datasets/celeba_tfr.

FFHQ. We use the 512x512 version which can be directly downloaded from the Google Drive link using browser. Or the file can be downloaded using the official script from Flickr-Faces-HQ. Put the xxx.tfrecords file into a two-level directory such as: datasets/ffhq_tfr/xxx.tfrecords.

Other Datasets. The tfrecords versions of DSprites and 3DShapes datasets can be produced

python dataset_tool.py create_subset_from_dsprites_npz /path/to/new_tfr_dir /path/to/dsprites_npz

and

python dataset_tool.py create_subset_from_shape3d /path/to/new_tfr_dir /path/to/shape3d_file

See dataset_tool.py for how other datasets can be produced.

Training

architecture

Pretrained models are shared here. To train a model on CelebA with 2 GPUs, run code:

CUDA_VISIBLE_DEVICES=0,1 \
    python run_training_ps_sc.py \
    --result-dir /path/to/results_ps_sc/celeba \
    --data-dir /path/to/datasets \
    --dataset celeba_tfr \
    --metrics fid1k,tpl_small_0.3 \
    --num-gpus 2 \
    --mirror-augment True \
    --model_type ps_sc_gan \
    --C_lambda 0.01 \
    --fmap_decay 1 \
    --epsilon_loss 3 \
    --random_seed 1000 \
    --random_eps True \
    --latent_type normal \
    --batch_size 8 \
    --batch_per_gpu 4 \
    --n_samples_per 7 \
    --return_atts True \
    --I_fmap_base 10 \
    --G_fmap_base 9 \
    --G_nf_scale 6 \
    --D_fmap_base 10 \
    --fmap_min 64 \
    --fmap_max 512 \
    --topk_dims_to_show -1 \
    --module_list '[Const-512, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-id-2]'

Note that for the dataset directory we need to separate the path into --data-dir and --dataset tags. The --model_type tag only specifies the PS-loss, and we need to use the C_spgroup-n_squares-n_codes in the --module_list tag to specify where to insert the Spatial Constriction modules in the generator. The latent traversals and metrics will be logged in the resulting directory. The --C_lambda tag is the hyper-parameter for modulating the PS-loss.

Evaluation

To evaluate a trained model, we can use the following code:

CUDA_VISIBLE_DEVICES=0 \
    python run_metrics.py \
    --result-dir /path/to/evaluate_results_dir \
    --network /path/to/xxx.pkl \
    --metrics fid50k,tpl_large_0.3,ppl2_wend \
    --data-dir /path/to/datasets \
    --dataset celeba_tfr \
    --include_I True \
    --mapping_nodup True \
    --num-gpus 1

where the --include_I is to indicate the model should be loaded with an inference network, and --mapping_nodup is to indicate that the loaded model has no W space duplication as in stylegan.

Generation

We can generate random images, traversals or gifs based on a pretrained model pkl using the following code:

CUDA_VISIBLE_DEVICES=0 \
    python run_generator_ps_sc.py generate-images \
    --network /path/to/xxx.pkl \
    --seeds 0-10 \
    --result-dir /path/to/gen_results_dir

and

CUDA_VISIBLE_DEVICES=0 \
    python run_generator_ps_sc.py generate-traversals \
    --network /path/to/xxx.pkl \
    --seeds 0-10 \
    --result-dir /path/to/traversal_results_dir

and

python run_generator_ps_sc.py \
    generate-gifs \
    --network /path/to/xxx.pkl \
    --exist_imgs_dir git_repo/PS-SC/imgs \
    --result-dir /path/to/results/gif \
    --used_imgs_ls '[sample1.png, sample2.png, sample3.png]' \
    --used_semantics_ls '[azimuth, haircolor, smile, gender, main_fringe, left_fringe, age, light_right, light_left, light_vertical, hair_style, clothes_color, saturation, ambient_color, elevation, neck, right_shoulder, left_shoulder, background_1, background_2, background_3, background_4, right_object, left_object]' \
    --attr2idx_dict '{ambient_color:35, none1:34, light_right:33, saturation:32, light_left:31, background_4:30, background_3:29, gender:28, haircolor:27, background_2: 26, light_vertical:25, clothes_color:24, azimuth:23, right_object:22, main_fringe:21, right_shoulder:20, none4:19, background_1:18, neck:17, hair_style:16, smile:15, none6:14, left_fringe:13, none8:12, none9:11, age:10, shoulder:9, glasses:8, none10:7, left_object: 6, elevation:5, none12:4, none13:3, none14:2, left_shoulder:1, none16:0}' \
    --create_new_G True

A gif generation script is provided in the shared pretrained FFHQ folder. The images referred in --used_imgs_ls is provided in the imgs folder in this repository.

Attributes Editing

We can conduct attributes editing with a disentangled model. Currently we only use generated images for this experiment due to the unsatisfactory quality of the real-image projection into disentangled latent codes.

attr_edit

First we need to generate some images and put them into a directory, e.g. /path/to/existing_generated_imgs_dir. Second we need to assign the concepts to meaningful latent dimensions using the --attr2idx_dict tag. For example, if the 23th dimension represents azimuth concept, we add the item {azimuth:23} into the dictionary. Third we need to which images to provide source attributes. We use the --attr_source_dict tag to realize it. Note that there could be multiple dimensions representing a single concept (e.g. in the following example there are 4 dimensions capturing the background information), therefore it is more desirable to ensure the source images provide all these dimensions (attributes) as a whole. A source image can provide multiple attributes. Finally we need to specify the face-source images with --face_source_ls tag. All the face-source and attribute-source images should be located in the --exist_imgs_dir. An example code is as follows:

python run_editing_ps_sc.py \
    images-editing \
    --network /path/to/xxx.pkl \
    --result-dir /path/to/editing_results \
    --exist_imgs_dir git_repo/PS-SC/imgs \
    --face_source_ls '[sample1.png, sample2.png, sample3.png]' \
    --attr_source_dict '{sample1.png: [azimuth, smile]; sample2.png: [age,fringe]; sample3.png: [lighting_right,lighting_left,lighting_vertical]}' \
    --attr2idx_dict '{ambient_color:35, none1:34, light_right:33, saturation:32, light_left:31, background_4:30, background_3:29, gender:28, haircolor:27, background_2: 26, light_vertical:25, clothes_color:24, azimuth:23, right_object:22, main_fringe:21, right_shoulder:20, none4:19, background_1:18, neck:17, hair_style:16, smile:15, none6:14, left_fringe:13, none8:12, none9:11, age:10, shoulder:9, glasses:8, none10:7, left_object: 6, elevation:5, none12:4, none13:3, none14:2, left_shoulder:1, none16:0}' \

Accumulated Perceptual Distance with 2D Rotation

fringe_vs_background

If a disentangled model has been trained, the accumulated perceptual distance figures shown in Section 3.3 (and Section 8 in the Appendix) can be plotted using the model checkpoint with the following code:

# Celeba
# The dimension for concepts: azimuth: 9; haircolor: 19; smile: 5; hair: 4; fringe: 11; elevation: 10; back: 18;
CUDA_VISIBLE_DEVICES=0 \
    python plot_latent_space.py \
    plot-rot-fn \
    --network /path/to/xxx.pkl \
    --seeds 1-10 \
    --latent_pair 19_5 \
    --load_gan True \
    --result-dir /path/to/acc_results/rot_19_5

The 2D latent traversal grid can be presented with code:

# Celeba
# The dimension for concepts: azimuth: 9; haircolor: 19; smile: 5; hair: 4; fringe: 11; elevation: 10; back: 18;
CUDA_VISIBLE_DEVICES=0 \
    python plot_latent_space.py \
    generate-grids \
    --network /path/to/xxx.pkl \
    --seeds 1-10 \
    --latent_pair 19_5 \
    --load_gan True \
    --result-dir /path/to/acc_results/grid_19_5

Citation

@inproceedings{Xinqi_cvpr21,
author={Xinqi Zhu and Chang Xu and Dacheng Tao},
title={Where and What? Examining Interpretable Disentangled Representations},
booktitle={CVPR},
year={2021}
}
Owner
Xinqi/Steven Zhu
Xinqi/Steven Zhu
Some useful blender add-ons for SMPL skeleton's poses and global translation.

Blender add-ons for SMPL skeleton's poses and trans There are two blender add-ons for SMPL skeleton's poses and trans.The first is for making an offli

犹在镜中 154 Jan 04, 2023
Implementation of FSGNN

FSGNN Implementation of FSGNN. For more details, please refer to our paper Experiments were conducted with following setup: Pytorch: 1.6.0 Python: 3.8

19 Dec 05, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022
TransGAN: Two Transformers Can Make One Strong GAN

[Preprint] "TransGAN: Two Transformers Can Make One Strong GAN", Yifan Jiang, Shiyu Chang, Zhangyang Wang

VITA 1.5k Jan 07, 2023
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images.

SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images (IEEE GRSL 2021) Code (based on mmdetection) for SSPNet: Scale Selec

Italian Cannon 37 Dec 28, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
Optical machine for senses sensing using speckle and deep learning

# Senses-speckle [Remote Photonic Detection of Human Senses Using Secondary Speckle Patterns](https://doi.org/10.21203/rs.3.rs-724587/v1) paper Python

Zeev Kalyuzhner 0 Sep 26, 2021
This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

GreaseLM: Graph REASoning Enhanced Language Models This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language

137 Jan 02, 2023
CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

In the light of feature distributions: Moment matching for Neural Style Transfer (CVPR 2021) This repository provides code to recreate results present

Nikolai Kalischek 49 Oct 13, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

Python HPC Optimizaciones incrementales de N-Body (all-pairs) con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámb

Andrés Milla 12 Aug 04, 2022
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation By Qing Xu, Wenting Duan and Na He Requirements pytorch==1.1

Qing Xu 20 Dec 09, 2022
Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

IROS21 information To test the code and reproduce the experiments, follow the installation steps in Installation.md. Afterwards, follow the steps in E

11 Oct 29, 2022
Snscrape-jsonl-urls-extractor - Extracts urls from jsonl produced by snscrape

snscrape-jsonl-urls-extractor extracts urls from jsonl produced by snscrape Usag

1 Feb 26, 2022
This package implements THOR: Transformer with Stochastic Experts.

THOR: Transformer with Stochastic Experts This PyTorch package implements Taming Sparsely Activated Transformer with Stochastic Experts. Installation

Microsoft 45 Nov 22, 2022
MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

MultiMix This repository contains the implementation of MultiMix. Our publications for this project are listed below: "MultiMix: Sparingly Supervised,

Ayaan Haque 27 Dec 22, 2022
Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

PL-Marker Source code for Pack Together: Entity and Relation Extraction with Levitated Marker. Quick links Overview Setup Install Dependencies Data Pr

THUNLP 173 Dec 30, 2022
AdelaiDepth is an open source toolbox for monocular depth prediction.

AdelaiDepth is an open source toolbox for monocular depth prediction.

Adelaide Intelligent Machines (AIM) Group 743 Jan 01, 2023