Implementation of a Variational Autoencoder with Gaussian likelihood and Gaussian prior latent distribution (which implies that the true posterior and the posterior approximated by the encoder are also Gaussian). The loss function used to train this model is the weighted ELBO:
where
To test our implementation, we used the famous CelebA dataset which contains ~200 images of human faces (it can be found on Kaggle).
We used the following architecture:
First we show some of obtained reconstructions after training.
The simplest form of sampling is to detach the encoder entirely, take a sample
The degree of realism is highly correlated to the portion of the latent space we end up in by sampling. More often than not, in the output we get a blurry reconstruction instead of something that resembles a realistic face.
The flaws of previous approach originate from the fact that we've detached the encoder entirely, and hence we've lost the reparameterization trick (affine transformation in case of Gaussian distributions). To circumvent this, we propose to first take a batch of
Afterwards we compute the weighted average of obtained vectors:
where
Finally, we pass
- Refactor passing hyperparameters through config.py.
- A lot of unnecessary commits because of issues with math rendering in Markdown. Squish them into 1.