Cs236 Lecture8

JInwoo·2025년 1월 18일

cs236

목록 보기

6/15

A Flow of Transformations

Normalizing flow의 normalizing은 invertible transformations을 거치면 normalized density를 얻는 다는 것을 나타내며, flow는 invertible transformations들이 결합되어 연속적으로 수행된다는 것을 나타낸다.

$\mathbf{z}_m=\mathbf{f}^m_\theta\circ\cdots\mathbf{f}^1\theta(\mathbf{z}_0)=\mathbf{f}_\theta^m(\mathbf{f}^{m-1}_\theta(\cdots(\mathbf{f}^1_\theta(\mathbf{z_0}))))\triangleq\mathbf{f}_\theta(\mathbf{z}_0)$

위 식의 맨 마지막 $\mathbf{f}$ 는 모든 $\mathbf{f}^m$ 을 합친 것을 의미하며, 각각의 $\mathbf{f}^m$ 에 대한 $\theta$ 는 다 다르다. normalizing flow models의 경우 $\mathbf{z}_m=\mathbf{x}$ 가 되며 change of variables를 적용하면 아래와 같은 식을 얻는다.

$P_X(\mathbf{x};\theta)=P_Z(\mathbf{f}^{-1}_\theta(\mathbf{x}))\underset{m=1}{\overset{M}{\prod}}|\det(\frac{\partial(\mathbf{f}^m_\theta)^{-1}(\mathbf{z}_m)}{\partial(\mathbf{z}_m)})|$ (product의 determinant는 determinant의 product와 같다. $\det\prod=\prod\det$ )

Learning and Inference

앞서 살펴본 식으로 maximum likelihood를 적용하면 다음과 같은 식을 얻는다.

$\underset{\theta}{\max}\log P_X(\mathcal{D};\theta)=\underset{\mathbf{x}\in\mathcal{D}}{\sum}\log P_Z(\mathbf{f}^{-1}_\theta(\mathbf{x}))+\log|\det(\frac{\partial\mathbf{f}^{-1}_\theta(\mathbf{x})}{\partial\mathbf{x}})|$

정확한 likelihood eavluation은 inverse transformation $\mathbf{x\mapsto z}$ 를 통해 가능하다. 반면 sampling은 forward transformation $\mathbf{z\mapsto x}$ 를 통해 가능하다.

$\mathbf{x}=\mathbf{f}_\theta(\mathbf{z})$
$\mathbf{z}=\mathbf{f}^{-1}_\theta(\mathbf{x})$

학습과 추론을 효율적으로 하기 위해서는 다음 세 가지 연산이 쉬워야 한다.

Likelihood evaluation: efficient $\mathbf{x\mapsto z}$
Sampling: efficient $\mathbf{z\mapsto x}$
Determinant of Jacobian

Jacobian matrix는 variable의 dimension이 n인 경우, n $\times$ n matrix가 된다. 이는 determinant 계산에 매우 많은 연산량을 필요로 한다. 만약 determinant를 triangular matrix로 변환 할 수 있으면 determinant 계산에 필요한 연산량을 크게 줄일 수 있다. triangular matrix의 determinant는 diagonal elements의 곱이기 때문이다. 많은 normalizing flow models 가 determinant가 diagonal이 되도록 하는 trick을 사용한다.

Reference

cs236 Lecture 8

JInwoo

Jr. AI Engineer

이전 포스트

Cs236 Lecture7

다음 포스트