VAE animeface
Variational autoencoder for anime face reconstruction
Introduction
This repository is an exploratory example to train a variational autoencoder to extract meaningful feature representations of anime girl face images.
The code architecture is mostly borrowed and modified from Yann Dubois's disentangling-vae repository. It has nice summarization and comparison of the different VAE model proposed recently.
Dataset
Anime Face Dataset contains 63,632 anime faces. (all rescaled to 64x64 in training)
Model
The model used is the one proposed in the paper Understanding disentangling in β-VAE, which is summarized below:
I used laplace
as the target distribution to calculate the reconstruction loss. From Yann's code, it suggests that bernoulli
would generally a better choice, but it looks it converge slowly in my case. (I didn't do a fair comparison to be conclusive)
Loss function used is β-VAEH
from β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework.
Result
Latent feature number is set to 20 (10 gaussian mean, 10 log gaussian variance). VAE model is trained for 100 epochs. All data is used for training, no validation and testing applied.
Face reconstruction
Prior space traversal
Based on the face reconstruction result while traversing across the latent space, we may speculate the generative property of each latent as following:
- Hair shade
- Hair length
- Face orientation
- Hair color
- Face rotation
- Bangs, face color
- Hair glossiness
- Unclear
- Eye size & color
- Bangs
Original faces clustering
Original anime faces are clustered based on latent features (selected feature is either below 1% (left 5) or above 99% (right 5) among all data points, while the rest latent features are closeto each other). Visulization of the original images mostly confirms the speculation above.
Latent feature diagnosis
Learned latent features are all close to standard normal distribution, and show minimum correlation.