디퓨전 elbo 증명

pyross·2025년 1월 1일

공부

목록 보기
1/5

세미나 자료 준비하면서 정리

우선 간단한 정리
p(x0,x1,...,xT)=p(x0x1)p(x1x0)...p(xT1xT)p(xT)=p(xT)i1p(xi1xi)p(x_0,x_1,...,x_T)=p(x_0|x_1)p(x_1|x_0)...p(x_{T-1}|x_T)p(x_T)\\=p(x_T)\prod_{i\ge1}p(x_{i-1}|x_i)

p(x1,x2,...,xTx0)=p(x1x0)p(x2x1)...p(xTxT1)=i1p(xtxt1)p(x_1,x_2,...,x_T|x_0)=p(x_1|x_0)p(x_2|x_1)...p(x_T|x_{T-1})\\=\prod_{i\ge1}p(x_{t}|x_{t-1})
라고 하자
-log likelihood를 간접적으로 상승시키는 것이 목적

logpθ(x0)=logx1:Tpθ(x0,x1,...,xT)dx1:T-\log p_\theta(x_0)=-\log\int_{x_{1:T}}p_\theta(x_0,x_1,...,x_T)dx_{1:T}

=logx1:Tpθ(x0:T)q(x1:Tx0)q(x1:Tx0)dx1:T=-\log\int_{x_{1:T}}p_\theta(x_{0:T})\frac{q(x_{1:T}|x_0)}{q(x_{1:T}|x_0)}dx_{1:T}

=logEq(x1:Tx0)[pθ(x0:T)q(x1:Tx0)]=-\log\mathbb{E}_{q(x_{1:T}|x_0)}[\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}]
from Jensen's Inequality
logpθ(x0)Eq(x1:Tx0)[logpθ(x0:T)q(x1:Tx0)]-\log p_\theta(x_0)\le\mathbb E_{q(x_{1:T}|x_0)}[-\log\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}]

=Eq(x1:Tx0)[logpθ(xT)i1pθ(xi1xi)i1q(xtxt1)]=\mathbb E_{q(x_{1:T}|x_0)}[-\log\frac{p_\theta(x_T)\prod_{i\ge1}p_\theta(x_{i-1}|x_i)}{\prod_{i\ge1}q(x_{t}|x_{t-1})}]

=Eq(x1:Tx0)[logpθ(xT)i1logpθ(xt1xt)q(xtxt1)]=\mathbb E_{q(x_{1:T}|x_0)}[-\log p_\theta(x_T)-\sum_{i\ge1}\log\frac{p_\theta(x_{t-1}|x_t)}{q(x_t|x_{t-1})}]

=Eq(x1:Tx0)[logpθ(xT)i>1logpθ(xt1xt)q(xtxt1)logpθ(x0x1)q(x1x0)]=\mathbb E_{q(x_{1:T}|x_0)}[-\log p_\theta(x_T)-\sum_{i>1}\log\frac{p_\theta(x_{t-1}|x_t)}{q(x_t|x_{t-1})}-\log\frac{p_\theta(x_0|x_1)}{q(x_1|x_0)}]

이때 q(xtxt1)=q(xtxt1,x0)=q(xt1xt,x0)q(xtx0)q(xt1x0)q(x_t|x_{t-1})=q(x_t|x_{t-1},x_0)=\frac{q(x_{t-1}|x_t,x_0)q(x_t|x_0)}{q(x_{t-1}|x_0)}

=Eq(x1:Tx0)[logpθ(xT)i>1logpθ(xt1xt)q(xt1xt,x0)q(xt1x0)q(xtx0)logpθ(x0x1)q(x1x0)]=\mathbb E_{q(x_{1:T}|x_0)}[-\log p_\theta(x_T)-\sum_{i>1}\log\frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}\frac{q(x_{t-1}|x_0)}{q(x_t|x_0)}-\log\frac{p_\theta(x_0|x_1)}{q(x_1|x_0)}]

=Eq(x1:Tx0)[logpθ(xT)q(xTx0)i>1logpθ(xt1xt)q(xt1xt,x0)logpθ(x0x1)]=\mathbb E_{q(x_{1:T}|x_0)}[-\log \frac{p_\theta(x_T)}{q(x_T|x_0)}-\sum_{i>1}\log\frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log p_\theta(x_0|x_1)]

pθ(xT):=p(xT)p_\theta(x_T):=p(x_T)학습 x

=Eq(x1:Tx0)[logp(xT)q(xTx0)i>1logpθ(xt1xt)q(xt1xt,x0)logpθ(x0x1)]=\mathbb E_{q(x_{1:T}|x_0)}[-\log \frac{p(x_T)}{q(x_T|x_0)}-\sum_{i>1}\log\frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log p_\theta(x_0|x_1)]

결국
logpθ(x0)Eq(x1:Tx0)[DKL(q(xTx0)p(xT))+i>1DKL(q(xt1xt,x0)pθ(xt1xt))logpθ(x0x1)]-\log p_\theta(x_0)\le \mathbb E_{q(x_{1:T}|x_0)}[D_{KL}(q(x_T|x_0)||p(x_T))+\sum_{i>1}D_{KL}(q(x_{t-1}|x_t,x_0)||p_\theta(x_{t-1}|x_t))-\log p_\theta(x_0|x_1)]

오른쪽 elbo를 줄이면 logpθ(x0)-\log p_\theta(x_0)가 줄어들어서 likelihood가 증가

0개의 댓글