1. Introduction
1-1 Difficulty of Mean-field Approach*
- assumes that all variables of the model(data, latent, parameters) are independent (but they are not!)
- simplifies the calculation, but can have poor approximations for complex models with dependent variables, which requires solving intractable expectations
*mean-field approach?
👉 commonly used method in VB for choosing the form of the approximate posterior
1-2 Auto-Encoding Variational Bayes (AEVB)
- in order to overcome such intractability, the paper suggests new algorithm called AEVB
- enables efficient, differentiable, and unbiased estimation of the variational lower bound via Stochastic Gradient Variational Bayes (SGVB) estimator
- simplifies posterior inference and model learning, avoiding costly iterative schemes like MCMC.
2. Method
2-1 Problem Scenario
Assumptions
- value z(i) generated from prior distribution pθ∗(z) <- prior
- value x(i) generated from conditional distribution pθ∗(x∣z) <- likelihood
- PDF of prior & likelihood distribution are differentiable almost everywhere w.r.t. θ and z
- true parameters and latent variables are unknown
- do not simplify the marginal / posterior probabilities
Main Contributions
- Efficient approximate ML or MAP estimation for the parameters θ
- Efficient approximate posterior inference of the latent variable z given an observed value x for a choice of parameters θ
- Efficient approximate marginal inference of the variable x
2-2 Variational Bound
-
Let marginal likelihood*
logpθ(x(i))=DKL(qϕ(z∣x(i))∣∣pθ(z∣x(i)))+L(θ,ϕ;x(i))
-
Since the value of KL-Divergence is always non-negative,
L(θ,ϕ;x(i)) becomes the lower bound.
-
Lower bound on the marginal likelihood of datapoint i can be re-written as :
L(θ,ϕ;x(i))=−DKL(qϕ(z∣x(i))∣∣pθ(z))+Eqϕ(z∣x(i))[logpθ(x(i)∣z)]
Gradient of the lower bound w.r.t. ϕ, can lead to gradient estimator exhibiting high variance, thus impractical!
==> Importance of SGVB estimator
*marginal likelihood (evidence)?
👉 represents the probability of the observed data given the prior distribution of the model parameters
👉 When optimizing, since the marginal likelihood can make the computation intractable, we use variational lower bound.
2-3 Reparametrization Trick
-
Let z be a continuous random variable, z∼qϕ(z∣x) be some conditional distribution.
-
Then z can be expressed as:
z=gϕ(ϵ,x), where ϵ is a random variable following simple, known distribution
-
∫qϕ(z∣x)f(z)dz=∫p(ϵ)f(z)dϵ=∫p(ϵ)f(gϕ(ϵ,x))dϵ
- ∫qϕ(z∣x)f(z)dz :
- an expectation of a function f(z) under the distribution qϕ(z∣x), represented by the integral of f(z) times the PDF of z (qϕ(z∣x)), over all possible values of z
- ∫p(ϵ)f(z)dϵ :
- changing the random variable of interest in the expectation calculation from z to ϵ, which follows the distribution qϕ(z∣x) and p(ϵ) respectively.
- Since ϵ does not depend on the parameters ϕ, it is possible to compute the gradient of the expectation w.r.t. ϕ
2-4 SGVB estimator and AEVB algorithm
-
After applying reparametrization trick of section 2-3, estimates of expectation of some function f(z) w.r.t. qϕ(z∣x) can be formed as :
Eqϕ(z∣x(i))[f(z)]=Ep(ϵ)[f(gϕ(ϵ,x(i)))]≃L1Σl=1Lf(gϕ(ϵ(l),x(i)))
-
Yield generic Stochastic Gradient Variational Bayes estimator by applying technique in 1.
L~A(θ,ϕ;x(i))=L1Σl=1Llogpθ(x(i),z(i,l))−logqϕ(z(i,l)∣x(i))
where z(i,l)=gϕ(ϵ(i,l),x(i)) and ϵ(l)∼p(ϵ)
Below is the AEVB algorithm that utilizes above estimator.
![](https://velog.velcdn.com/images/flatfish_selfish/post/3dff46a5-f273-442b-bb47-0a7dbbb2fff8/image.png)
3. Example : VAE
3-1 Variational approximate posterior
logqϕ(z∣x(i))=logN(z;μ(i),σ2(i)I)
- mean and s.d. of approximate posterior μ(i),σ2(i) : parameters learned from encoder
- ϕ : variational parameters
- the main goal of VAE is to find good approximate posterior logqϕ(z∣x(i)) over the latent variables
3-2 Estimator for VAE and datapoint x(i)
L(θ,ϕ;x(i))≃21Σj=1J(1+log((σj(i))2)−(μj(i))2−(σj(i))2)+L1Σl=1Llogpθ(x(i)∣z(i,l))
where z(i,l)=μ(i)+σ(i)⊙ϵ(l) and ϵ(l)∼N(0,I)
3-3 Architecture
Auto-encoder vs Variational Auto Encoder
![](https://velog.velcdn.com/images/flatfish_selfish/post/51a659d5-6fa4-493a-a62a-3658bc8708af/image.png)
Source : https://data-science-blog.com/blog/2022/04/19/variational-autoencoders/
4. Conclusion
- SGVB is a novel estimator of the variational lower bound, resolving intractibility when parameters are optimized
- Since SGVB is differentiable and can be optimized straight forward, it can lead to efficient approximate inference with continuous latent variables.
- For the case of i.i.d. datasets and continuous latent variables per datapoint we introduce an efficient algorithm called Auto-Encoding VB (AEVB), learning an approximate inference model using the SGVB estimator.
잘 보고 갑니다