SDS (Score Distillation Sampling) Loss

채병주·2024년 4월 13일

Terminology

목록 보기

1/1

Google Research에서 연구한 DreamFusion(ICLR ‘23)에서 처음 제안됨
Text-to-3D Generation Model에서 사용하는 Diffusion Model 기반의 score Loss
2D Diffusion Model이 parametric image generator의 prior로 작동하는 Probability Density distillation에 기반한 Loss
DreamFusion의 architecture에서 오른쪽 부분과 관련된 Loss Function

3D Data Synthesis를 위해서는 1. 학습을 위한 큰 규모의 labeled 3D data 2. 3D Denoising architecture가 필요하지만 둘 다 존재 X
기존 Diffusion Model : 학습 데이터와 같은 종류/차원(Pixel Space)에서 Sampling을 수행함
Goal : Create 3D Models that look like good images when rendered from random angles
- 3D Model로는 DIP(입력받은 $\theta$ 를 이미지로 만들어주는 generator)를 사용
3D Model의 output $x=g(\theta)$ 이 diffusion model의 샘플처럼 보이도록 Diffusion Model의 구조를 이용하여 parameter $\theta$ 를 최적화함

Data point $x=g(\theta)$ 에 대해 loss를 최소화하는 방법
- $\theta^*=argmin_{\theta}\left[\mathcal{L}_{diff}(\phi,x=g(\theta))\right]$ ( $\mathcal{L}_{diff}$ : Diffusion Model의 Loss)
BUT, 실험 결과 realistic한 sample을 생성하지 못함
Gradient of $\mathcal{L}_{diff}$
- U-Net Jacobian Term은 계산 비용도 너무 크고 작은 noise 차이에 따라 좌우된다
  → 생략하는 것이 DIP를 최적화하는 데 효과적임

$\nabla_\theta \mathcal{L}_{SDS}(\phi, x=g(\theta)) \triangleq \mathbb{E}_{t, \epsilon} \left[ w(t) (\hat{\epsilon}_\phi(z_t;y,t)-\epsilon)\frac{\partial x}{\partial\theta} \right]$
- $x$ : NeRF로 생성된 이미지 (입력 이미지 X)
- $g(\theta)$ : NeRF parameter θ를 가진 differentiable generator
- $y$ : Text Embedding 값 (Image-to-3D의 경우 입력 이미지를 넣으면 됨) → condition이라고 볼 수 있음
- $z_t = \alpha_t g(\theta) + \sigma_t\epsilon$
Diffusion Model에서 학습된 score function을 이용한 weighted probability density distillation loss(논문)의 gradient라고 볼 수 있음 (Appendix A.4)

개발 외의 일들에 더 흥미를 가지는 개발자. Interested in Web, Generative AI, UI/UX.