Cs236 Lecture16

JInwoo·2025년 2월 17일

cs236

목록 보기

14/15

Diffusion Models

Iterative Noising Process

$\mathbf{x}_0$ 부터 $\mathbf{x}_T$ 까지 점차 noise를 더해간다. $\mathbf{x}_0$ 는 perturbed 되지 않은 density를 가지며 $p_{data}(\mathbf{x_0})$ 와 같다. $\mathbf{x}_T$ 는 pure noise density를 가지며 $\pi(\mathbf{x}_T)$ 와 같다. 아래 그림의 $q$ 는 noise를 더하는 kernel 정도로 생각할 수 있다.

time step $t$ 는 이전 step에 의해서만 영향을 받으며 다음과 같은 관계식을 같는다.

$q(\mathbf{x_t|x_{t-1}})=\mathcal{N}(\mathbf{x}_t;\sqrt{1-\beta_t}0\mathbf{x}_{t-1},\beta_tI)$ , $\beta_t$ 는 rescale term.

위 관계에 따라 joint distribution은 $q(\mathbf{x_{1:T}|x_0})=\underset{t=1}{\overset{T}{\prod}}q(\mathbf{x_t|x_{t-1}})$ 로 정의 된다.

각 step $t$ 의 transition들은 모두 Gaussian 이다. 따라서 아무 step $t$ 에 대해 closed form으로 distribution을 구할 수 있다.

$q(\mathbf{x_t|x_0})=\mathcal{N}(\mathbf{x_t};\sqrt{\bar{\alpha_t}}\mathbf{x_0}, (1-\bar{\alpha_t})I)$ , $\bar{\alpha_t}=\underset{s=1}{\overset{t}{\prod}}(1-\beta_s)$

따라서 특정 step $t$ 에 대한 perturbed distribution을 얻기 위해 모든 noising process를 거치지 않아도 왼다.

Iterative Denosing

Denosing process는 noising process의 정확히 반대로 수행된다. $\mathbf{x_T}$ 를 $\pi(\mathbf{x_T)}$ 로 부터 sampling 하여 반복적으로 $q(\mathbf{x_{t-1}|x_t})$ 를 수행한다. 이때 문제는 $q(\mathbf{x_{t-1}|x_t})$ 가 unknown이라는 것이다. 따라서 이 문제를 해결하기 위해 $q(\mathbf{x_{t-1}|x_t})$ 에 대한 variational approximation을 수행한다.

$q(\mathbf{x_{t-1}|x_t})\approx p_\theta(\mathbf{x_{t-1}|x_t})$ 를 modeling 한다. $p_\theta(\mathbf{x_{t-1}|x_t})=\mathcal{N}(\mathbf{x_{t-1};\mu_\theta(x_t},t)0,\sigma^2I)$ 를 만족하며, $p_\theta$ 로 부터 sampling을 반복한다. joint distribution은 $p_\theta(\mathbf{x_{0:T}})=p(\mathbf{x_T})\underset{t=1}{\overset{T}{\prod}}p_\theta(\mathbf{x_{t-1}|x_t})$ 로 정의 된다.

Diffusion Model as a Hierarchical VAE

Diffusion model을 계층적 구조를 가진 VAE의 연속으로 볼 수 있다. noise를 더하는 과정을 encoder로 보고, de-nosing 과정을 decoder로 보면 VAE 형태가 된다. 이 때, VAE와 달리 encoder는 고정된 형태로 학습이 불가능하다.

Encoder(fixed): $q(\mathbf{x_{1:T}|\mathbf{x_0})=\underset{t=1}{\overset{T}{\prod}}}q(\mathbf{x_t|\mathbf{x}_{t-1})}$
Decoder(learnable): $p_\theta(\mathbf{x_{0:T}})=p(\mathbf{x_T)\underset{t=1}{\overset{T}{\prod}}}p_\theta(\mathbf{x_{t-1}|x_t)}$