https://towardsdatascience.com/variational-autoencoders-as-generative-models-with-keras-e0c79415a7eb

- An autoencoder is basically a neural network that takes a high dimensional data point as input, converts it into a lower-dimensional feature vector(ie., latent vector), and later reconstructs the original input sample just utilizing the latent vector representation without losing valuable information.

- One issue with the ordinary autoencoders is that they encode each input sample independently.

→ This means that the samples belonging to the same class (or the samples belonging to the same distribution) might learn very different(distant encodings in the latent space) latent embeddings.

→ the latent features of the same class should be somewhat similar (or closer in latent space) - This happens because we are not explicitly forcing the neural network to learn the distributions of the input dataset. Due to this issue, our network might not very good at reconstructing related unseen data samples (or less generalizable).

- Instead of directly learning the latent features from the input samples, it actually learns the distribution of latent features.
- The latent features of the input data are assumed to be following a standard normal distribution.

→ This means that the learned latent vectors are supposed to be zero centric and they can be represented with two statistics-mean and variance

→ VAEs calculate the mean and variance of the latent vectors(instead of directly learning latent features) for each sample and forces them to follow a standard normal distribution. - bottleneck part of the network is used to learn mean and variance for each sample, we will define two different fully connected(FC) layers to calculate both.
- VAEs ensure that the points that are very close to each other in the latent space, are representing very similar data samples(similar classes of data). We are going to prove this fact in this tutorial.

- enforcing a standard normal distribution on the latent features of the input dataset. This can be accomplished using KL-divergence statistics.
- KL-divergence is a statistical measure of the difference between two probabilistic distributions.
- Thus, we will utilize KL-divergence value as an objective function(along with the reconstruction loss) in order to ensure that the learned distribution is very similar to the true distribution, which we have already assumed to be a standard normal distribution.

Objective = Reconstruction Loss + KL-Loss

- This further means that the distribution is centered at zero and is well-spread in the space.

https://taeu.github.io/paper/deeplearning-paper-vae/

https://excelsior-cjh.tistory.com/187

https://medium.com/datadriveninvestor/latent-variable-models-and-autoencoders-97c44858caa0