We present high quality image synthesis results using diffusion probabilistic models,
a class of latent variable models inspired by considerations from nonequilibrium
thermodynamics.
이 페이퍼의 핵심은 열역학의 '확산' 개념에서 출발한 diffusion probabilistic model을 통해 high-quality의 이미지 generation이 가능하다는 것을 이론적으로, 경험적으로 보이는 것이다.
Introduction
Deep Generative Models
GANs, autoregressive models, flows, VAEs.
Diffusion Probabilistic Models
Parameterized Markov chain trained using variational inference
Transitions are learned to reverse a diffusion process
We hope to learn to reverse of diffusion process.
Diffusion Process?
Markov chain that gradually adds noise to the data in the opposite direction of sampling.
Small Gaussian noise for diffusion => Conditional Gaussian for sampling.
Contribution
First demonstration of generating high quiality samples using diffusion.
certain parameterization of diffusion models reveals an equivalence
with denoising score matching over multiple noise levels during training
and with annealed Langevin dynamics during sampling
BackGround
Diffusion models
Forward process (diffusion process)
Approximate posterior q(x1:T∣x0)
fixed Markov chain that gradually adds Gaussian noise
Fixed(for this paper) variance schedule β1,⋯,βT
Reverse process
Joint distribution pθ(x0:T)
Markov chain with learned Gaussian transitions starting at p(xT)=N(xT;0,I)
Optimization goal
usual variational bound on negative log likelihood
Related to VAEs (Next topic)
Efficient training
Forward process variances βT
Can be learned by reparameterization, but can be held constant
if βT is small, the fuctional form of reverse and forward process is same.
Forward process's property
αt:=1−βt , αtˉ:=∏s=1tαs
Rewriting L
using KL divergence to directly compare pθ(xt−1∣xt) against forward process posteriors
tractable when conditioned on x0
Can be calculated in a Rao-Blackwellized fashion with closed form expressions
(since all KL divergences are comparisions between Gaussians)
Diffusion models and denoising autoencoders
Large number of freedom
βt of forward process
model architecture
Gaussian distribution parameterization of reverse process
3.1 Forward process and LT
βt fixed
q (approximate posterior) has no learnable parameter
LT is constant
3.2 Reverse process and L1:T−1
The choices in pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),∑θ(xt,t)).
1. ∑θ(xt,t)=σt2I
to untrained time dependent constants.
σt2=βt vs σt2=βt~=1−αˉt1−αˉt−1βt
empirically similar results.
First one is optimal for x0∼N(0,I)
Second one is optimal for x0 deterministically set to one point
2. μθ(xt,t)
specific parameterization motivated by the analysis of Lt.
this means that we should predict μ~t, but using following equations,
Equation (10) reveals that μθ must predict αt1(xt−1−αˉtβtϵ), meaning that we can train the model to predict ϵ.
The sampling process => resembles Langevin dynamics(랑주뱅 동역학)
분자 시스템 움직임의 수학적 모델링과 유사한 점을 발견
Eq(10) simplifies to
=> resembles denoising score matching over multiple noise scales indexed by t
유익한 자료 감사합니다.