Cs236 Lecture5

JInwoo·2024년 12월 23일

cs236

목록 보기

3/15

Latent Variable Models

Motivation

사람 얼굴 이미지는 endor, eye color, hair color 등 다양한 factor들에 의해 서로 다르게 나타난다. 이러한 다양한 요인들은 이미지에 이러한 요인들이 annotation 되어 있지 않는 한, explitly 이용하기가 어렵다. latent variable models는 이러한 요인들을 latent variables $\mathbf{z}$ 를 사용하여 modeling 하는 것을 main idea로 한다. 이 때, $\mathbf{z}$ 는 unobserved variables로 dataset에 포함되지 않는다.

Latent variable models의 가장 큰 장점은 latent variables $\mathbf{z}$ 를 잘 설정하면 $p(\mathbf{x})$ 를 modeling 하는 것 보다 $p(\mathbf{x|z})$ 이 더 단순 할 수 있다는 점이다. $\mathbf{z}$ 를 직접 지정하는 것은 매우 어렵기 때문에 일반적으로는 $\mathbf{z}$ 를 modeling 하는데 DNN을 이용한다.

$\mathbf{z}\sim\mathcal{N}(0, I)$
$p(\mathbf{x|z})=\mathcal{N}(\mu_{\theta}(\mathbf{z}), \Sigma_{\theta}(\mathbf{z}))$ , where $\mu_{\theta},\Sigma_{\theta}$ 는 DNN

Latent variable models 는 학습이 끝나면, $\mathbf{z}$ 가 유의미한 latent를 갖기를 기대하며, $p(\mathbf{z|x})$ 를 통해 구할 수 있도록 학습된다.

Mixture of Gaussian

가장 단순한 latent variable model로 Mixture of Gaussina을 예시로 들 수 있다.

$\mathbf{z}\sim\mathrm{Categorical}(1,\cdots,K)$
$p(\mathbf{x|z=k)}=\mathcal{N}(\mu_{k}, \Sigma_{k})$

Generative process는 다음과 같이 진행된다.

$\mathbf{z}$ 를 sampling 하여 gaussian distribution을 선택
선택된 gaussian으로 부터 data를 generation.

Latent variable models는 $\mathbf{z}$ 를 도입하여 modeling을 좀 더 simple하게 할 수 있다. 하지만 $p(\mathbf{x|z})$ simple 하다고 해서 $p(\mathbf{x})$ 가 simple 하지는 않다. 즉, complex 하고 flexible 한 $p(\mathbf{x})$ 를 latent variable을 이용하여 $p(\mathbf{x|z})$ 로 simple하게 modeling 할 수 있다는 것이 latent variable models의 큰 장점이다.

Marginal Likelihood

Latent variable models의 특정 datapoint $\mathbf{\bar{x}}$ 의 probability는 다음과 같다.

$p(\mathbf{X=\bar{x};\theta})=\underset{\mathbf{z}}{\int}p(\mathbf{\bar{x}, z;\theta)}d\mathbf{z}$

위 식에 따라 maximum likelihood learning은 다음과 같이 쓸 수 있다.

$\log\underset{\mathbf{x}\in\mathcal{D}}{\prod}p(\mathbf{x};\theta)=\underset{\mathbf{x}\in\mathcal{D}}{\sum}\log\underset{\mathbf{z}}{\sum}p(\mathbf{x,z;\theta)}$

일반적으로 위 likelihood 식은 모든 $\mathbf{z}$ 를 고려해야 하기 떄문에 매우 비용이 크고, 따라서 $\sum_{\mathbf{z}}p(\mathbf{x,z;\theta)}$ 는 intracable 할 수 있다. 따라서 maximum likelihood 방식으로 learning을 하기 위해서는 approximation이 필요하다.

가장 쉽게 떠올릴 수 있는 approximation 방식은 monte carlo 방식이다. monte carlo 방식은 $\mathbf{z}$ 를 uniform distribution으로 부터 sampling하여 approximation 한다.

$\underset{\mathbf{z}}{\sum}p_\theta(\mathbf{x,z)}\approx|\mathcal{Z}|\frac{1}{\mathbf{k}}\underset{j=1}{\overset{\mathbf{k}}{\sum}}p_\theta(\mathbf{x,z}^{(j)})$

Monte carlo 방식은 간단하게 approximation 할 수 있지만, practical 하지는 않다. 왜냐하면 대부분의 $p_\theta(\mathbf{x,z})$ 가 매우 낮을 것이기 때문이다. 따라서 $\mathbf{z}$ 를 uniformly 하게 sampling 하는 것이 아닌 다른 방식이 필요하다.

Montel carlo 방식 외에 다른 방식으로 importance sampling을 생각해 볼 수 있다. $\mathbf{z}$ 를 uniformly sampling 하는 것이 아니라 distribution $q(\mathbf{z})$ 로 부터 sampling 하는 것이다.

$\underset{\mathbf{z}}{\sum}p_\theta(\mathbf{x,z)}\approx \frac{1}{k}\underset{j=1}{\overset{k}{\sum}}\frac{p_{\theta}(\mathbf{x, z^{(j)})}}{q(\mathbf{z}^{(j)})}$

문제는 maximum log likelihood를 이용하여 학습을 하려 할 때 발생한다. 위 식에 log를 씌우는 순간 biased estimation이 되어버리기 때문이다.

$E_{\mathbf{z}\sim q(\mathbf{z})}[\log\frac{p_{\theta}(\mathbf{x, z})}{q(\mathbf{z})}]\neq\log(E_{\mathbf{z}\sim q(\mathbf{z)}}[\frac{p_{\theta}(\mathbf{x,z})}{q(\mathbf{z})}])$

log likelihood $\log p_{\theta}(\mathbf{x})$ 를 바로 구할 수는 없지만 log가 concave function인 것과 Jense Inequality를 이용하면 log likelihood의 lower bound를 구할 수 있다.

$\log(E_{\mathbf{z}\sim q(\mathbf{z})}[f(\mathbf{z})]=\log(\underset{\mathbf{z}}{\sum}q(\mathbf{z})f(\mathbf{z}))\ge\underset{z}{\sum}q(\mathbf{z})\log f(\mathbf{z})=E_{\mathbf{z}\sim q(\mathbf{z})}[\log f(\mathbf{z})]$

위 식에서 $f(\mathbf{z})$ 를 $\frac{p_{\theta}(\mathbf{x,z)}}{q(\mathbf{z})}$ 로 대체하면, 아래와 같은 식을 얻는다.

$\log(E_{\mathbf{z}\sim q(\mathbf{z})}[\frac{p_{\theta}(\mathbf{x, z)}}{q(\mathbf{z})}])\ge E_{\mathbf{z}\sim q(\mathbf{z})}[\log\frac{p_{\theta}\mathbf{(x,z)}}{q(\mathbf{z})}]$

위 식의 오른 쪽 식을 ELBO(Evidence Lower Bound) 라고 부른다.

Bound Tight

ELBO가 log likelihood와 같아지는 경우는 $q(\mathbf{z})=p(\mathbf{z|x, \theta)}$ 인 경우 뿐이다. 즉, distribution $q$ 가 posterior $p(\mathbf{z|x)}$ 와 가까워 질수록 ELBO는 likelihood와 tight 해진다.

Reference

cs236 Lecture 5

JInwoo

Jr. AI Engineer

이전 포스트

Cs236 Lecture4

다음 포스트