Iterative Bregman projection for entropic OT regularization problem.

woozins·2025년 2월 26일

Optimal Transport

목록 보기

3/4

이 포스트는 Bernamou et al.(2015), Iterative Bregman Projections for regularized transportation problems, SIAM 논문의 초반부를 읽고 참고한 것이다.

Optimal transport 문제에서 optimal coupling을 찾는 것은 많은 컴퓨팅을 필요로 하고, iterative bregman projection을 기반으로 한 sinkhorn algorithm 등이 많이 쓰인다. 이 포스팅에서는 iterative bregman projection의 entropic OT regularization에 대하여 설명한다.

Entropic OT problems

OT 문제에서 목적함수로 우리는 주로 entropic regularization을 고려한다.

$\arg\min_{\gamma \in \Pi(\mu, \nu)}<\gamma, C>_{\mathcal{F}} + \epsilon H(\gamma)$ ,
where $H(\gamma) = \sum_{ij}\gamma_{ij}log{\gamma_{ij}}$

이러한 entropic regularization term은 $\gamma$ 의 non-sparsity(smoothness)를 유도한다.

하지만 이와 더불어, solution을 구하는 데에 있어 추가적인 이점을 주기 때문에 많이 쓰이는 것도 있다.

우선, 위의 Loss는 entropy term이 추가됨으로써, strongly convex하게 된다. 즉, unique minimizer $\gamma^*$ 가 존재한다.

이 때 제약조건 $\gamma \in \Pi(\mu, \nu)$ 가 없는 경우를 고려하면, $\gamma^* = e^{-\frac{C}{\epsilon}}$ 임을 구할 수 있다.

위의 최적화 문제를 약간 다른 관점에서 생각해보자.

"위의 최적화 문제의 해는 제약조건이 없는 경우의 해 $\gamma^*$ 를 벡터공간 $\mathcal{S} = \{\gamma | \gamma \in \Pi(\mu, \nu)\}$ 에 KL divergence에 대하여 projection 한 것이다."

즉, 위의 최적화 문제는 다음과 같이 쓸 수 있다.

$\arg\min_{\gamma \in \Pi(p,q)}KL(\gamma|\xi)$ where $\xi = e^{-\frac{C}{\epsilon}}$

왜 이와 같이 표현될 수 있는가 하면,

$KL(\gamma|\xi) = \sum_{ij}\gamma_{ij}log\frac{\gamma_{ij}}{e^{-\frac{C}{\epsilon}}} =\sum_{ij}\gamma_{ij}log\gamma_{ij} + \frac{1}{\epsilon}\sum_{ij}\gamma_{ij}C_{ij} =H(\gamma)+\frac{1}{\epsilon}<\gamma, C>_{\mathcal{F}}$ 이기 때문이다.

이제 다음과 같은 projection을 정의하자.
$P_\mathcal{C}^{KL}(\xi) := \arg\min_{\gamma \in \mathcal{C}}KL(\gamma|\xi)$

이러한 projection의 해를 구하는 방법이 Iterative Bregman projection이다.

Iterative Bregman Projection

Setting

Iterative Bregman projection에서는 다음의 상황을 다룬다.

$\min_{\gamma \in \mathcal{C}}KL(\gamma|\xi))$ where $\xi$ is a given point in $R_+^{N \times N}$ , and $\mathcal{C}$ is a intersection of closed convex sets
$\mathcal{C} = \cap_{\mathcal{l}= 1}^L \mathcal{C}_l$ s.t. $\mathcal{C}$ has nonempty intersecton with $R_+^{N \times N}$ .
추가적으로, set $\mathcal{C}_l$ 의 index의 확장을 위해 $\forall n \in N, \mathcal{C}_{n+L} = \mathcal{C}_n$ 이라고 한다. 이는 지속적인 iteration을 위함이다.

Iterative Bregman Projections

(이하 IBP)

구하고자 하는 최적화 문제의 해는 아래의 알고리즘을 통해 구할 수 있다

이 때 $C_l$ 은 convex affine subspace이다.

Let $\gamma^{(0)} = \xi$

define $\forall n > 0, \gamma^{(n)} = P_{C_n}^{KL}(\gamma^{(n-1)})$

Then, $\gamma^{(n)} \rightarrow P_{C}^{KL}(\xi)$ as $n \rightarrow \infin$

Dykstra's Algorithm

이 때 $C_l$ 이 affine subspace이 아니라면, 수정된 버전인 Dykstra's algorithm을 사용해야 한다.

Solving entropic OT problem via iterative bregman projection

위의 IBP를 그래도 entropic ot 문제를 푸는 데 적용 할 수 있다.

이는 $\xi$ 를 $C_1 = \{\gamma ; \gamma 1 = p\}$ , $C_2 = \{\gamma \in \gamma^T = q\}$ 의 intersection 위에 projection 하는 것이고, 이 때 $C_1, C_2$ 는 모두 convex affine subspace임으로, 그대로 적용이 가능하다

참고로, $P_{C_1}^{KL}(\gamma) = diag(\frac{p}{\bar\gamma 1})\bar\gamma$ , $P_{C_2}^{KL}(\gamma) = \bar\gamma diag(\frac{q}{\bar\gamma^T 1})$ 로 계산된다.

woozins

통계학과 대학원생입니다.

이전 포스트

Joint distribution optimal transportation for domain adaptation(NIPS 2017)

다음 포스트