NeuS

김민솔·2024년 11월 17일

NeuS는 surface reconstruction을 Neural network로 수행하는, implicit representation 모델입니다. NeRF가 volume density로 렌더링하기 때문에, surface recon에서는 약한 모습을 보입니다. 렌더링 방정식에 SDF를 결합하여 scene reconstruction에서 3D geometry 정보를 잘 담아내었습니다.

Preliminary

Surface rendering, Volume rendering 한계

1️⃣ NeuS 이전 연구였던 IDR에선 edge에서나, 울퉁불퉁한 depth 변화에서 surface recon을 제대로 수행하지 못하는 모습을 보입니다.
2️⃣ NeRF는 volume rendering으로 ray에 존재하는 여러 points를 샘플링하여, 각 지점의 color에 대한 $\alpha$ -composition을 수행합니다. 이때 위의 Fig를 보면, near surface(노랑)에 대해서는 일관적이지 않은 색들을 가지고, far surface에 대해서만 정확한 recon을 수행하는 것을 확인할 수 있습니다.

SDF

SDF(Signed Distance Field)는 각 포인트에서 주어진 물체의 표면까지의 최단거리를 표현한 필드입니다. 그림 (c)에서는 표면을 0으로, inside(object)는 음의 값으로, outside는 양의 값을 갖습니다.

Eikonal equation

f(x) = \begin{cases} d(x, \partial{\Omega}) & \text{if} \ x \in \Omega \\ -d(x, \partial{\Omega}) & \text{if} \ x \in \Omega^c \end{cases}

SDF는 위와 같이 정의됩니다. 이때 $d(x, \partial{\Omega})$ 는 boundary $\partial{\Omega}$ 와의 distance를 표현한 식입니다. 이때, Euclidean space에서 SDF의 gradient는 eikonal equation $||\nabla f_{SDF}|| = 1$ 을 만족합니다.

Methods

Rendering Procedure

Scene representation

\mathcal{S} = \{\mathbf{x} \in \mathbb{R}^{3}|f(\mathbf{x}) = 0\}

$\mathcal{S}$ : surface
$\mathbf{x}$ : spatial position (input)
surface에서 SDF인 $f(\mathbf{x})$ 가 0의 값을 갖도록 합니다.

\phi_{s}(x) = se^{-sx}/(1+e^{-sx})^{2}

$\phi_{s}(x)$ : logistic density distribution
S-density(pdf)를 logistic 밀도 분포로 정의합니다. 우리가 아는 center가 0에 위치한 종 모양의 unimodality라고 언급합니다. 또한, surface 근처일 수록 높은 값을 갖게 됩니다.

S-density는 Sigmoid function의 deriavative와 동일한 값을 가집니다.
$\phi_s(x)$ 의 std는 $1/s$ 로, trainable parameter입니다. 학습이 수렴하면 해당 값이 0에 가깝게 됩니다.

Rendering

C(\mathbf{o}, \mathbf{v}) = \int^\infty_{0}w(t)c(\mathbf{p}(t), \mathbf{v})dt

$\mathbf{p}(t) = \mathbf{o} + t\mathbf{v}$ : ray from pixel
$\mathbf{o}$ : camera center
$\mathbf{v}$ : viewing direction
위는 NeRF에서도 사용하는 렌더링 방정식입니다. pixel에 대한 output color와 weight와의 가중치 합으로 구해집니다. 이때 weight function을 정의하는 데에 필요 조건이 존재합니다.

Requirements on weight function

(1) Unbiased

camera ray $\mathbf{p}(t)$ 에 대해 weight function $w(t)$ 는 surface 교차 point( $f(\mathbf{p}(t^*))=0$ )에서 locally 최댓값을 가져야 합니다.

(2) Occlusion-aware

depth value $t_0$ , $t_1$ 에 대해 $f(t_0)=f(t_{1}), w(t_{0}) > 0, w(t_{1})>0$ 과 $t_{0} < t_{1}$ 일 때 $w(t_{0})>w(t_{1})$ 인 점이 있어야 한다는 조건을 만족해야 합니다.
즉, 두 지점이 같은 SDF 값을 가지면 view point에 가까운 포인트가 output color에 더 많은 영향을 주어야 합니다.

Naive solution (before, NeRF)

w(t) = T(t)\sigma(t)

$T = \exp(-\int^t_0\sigma(u)du)$
기존에는 volume density와 Transmittance 축적 값으로 weight function을 사용했습니다. (2) occlusion-aware 조건은 만족하지만, surface point에 도달하기 전에 local maximum을 갖기 때문에 (1) unbiased 조건을 만족하지 못합니다.

NeuS solution (after, NeuS)

w(t) = \frac{\phi_{s}(f(\mathbf{p}(t)))}{\int^{\infty}_{0}\phi_{s}(f(\mathbf{p}(u)))du}

w(t) = T(t)\rho(t)

$T = \exp(-\int^t_0\rho(u)du)$
SDF를 first-order 근사하여 (1), (2) 조건을 모두 만족하는 weight function을 디자인했습니다. volume density 대신, opaque density function $\rho(t)$ 를 사용하였습니다.

opaque density 유도

1️⃣ single plane instersection의 경우에서, SDF는 $f(\mathbf{p}(t))=- |\cos(\theta)|\cdot (t-t^{*})$ 로 정의됩니다. surface인 $f(\mathbf{p}(t^{*}))$ 에서는 0의 값을 가집니다. 표면이 plane으로 가정되어, $|\cos(\theta)|$ 는 상수 값을 가집니다.

$\theta$ : view dir과 surface normal 사이의 각도

\begin{aligned} w(t) &= \lim_{t^{*}\rightarrow\infty} \frac{\phi_{s}(f(\mathbf{p}(t)))}{\int^{\infty}_{0}\phi_{s}(f(\mathbf{p}(u)))du} \\ &= \lim_{t^{*}\rightarrow\infty} \frac{\phi_{s}(f(\mathbf{p}(t)))}{\int^{\infty}_{0}\phi_{s}(-|\cos(\theta)|(u-t^{*}))du} \\ &= \lim_{t^{*}\rightarrow\infty} \frac{\phi_{s}(f(\mathbf{p}(t)))}{\int^{\infty}_{-t^{*}}\phi_{s}(-|\cos(\theta)|u^{*})du^{*}} \\ &= \lim_{t^{*}\rightarrow\infty} \frac{\phi_{s}(f(\mathbf{p}(t)))}{|\cos(\theta)|^{-1}\int^{\infty}_{-|\cos(\theta)|t^{*}}\phi_{s}(\hat{u})d\hat{u}} \\ &= |\cos(\theta)|{\phi_{s}(f(\mathbf{p}(t)))} \end{aligned}

2️⃣ 앞에서 정의한(NeuS solution) weight function으로 수식을 정리하면 다음과 같습니다. SDF를 수식에 대입 후 변수 정리를 하다보면, 적분에서 bound가 +무한대와 -무한대로 향하므로, PDF의 넓이 값의 극한인 1로 수렴합니다. 해당 적분 식이 사라지면서, $|\cos(\theta)|{\phi_{s}(f(\mathbf{p}(t)))}$ 이 유도됩니다.

3️⃣ 이때 $w(t) = T(t)\rho(t)$ 를 이용하여 $T(t)\rho(t) = |\cos(\theta)|\phi_{s}(f(\mathbf{p}(t)))$ 로 둘 수 있고, $T = \exp(-\int^t_0\rho(u)du)$ 를 이용하여 $T(t)\rho(t) = p(t)\exp(-\int^t_0\rho(t)dt)=-\frac{dT}{dt}(t)$ 로 정리할 수 있습니다. (chain rule을 역으로 이용한 것입니다.)

4️⃣ 마지막으로 $|\cos(\theta)|\phi_{s}(f(\mathbf{p}(t))) = - \frac{d\Phi_{s}}{dt}(f(\mathbf{p}(t)))$ 인 점을 활용하여 $\frac{dT}{dt}(t) = \frac{d\Phi_{s}}{dt}(f(\mathbf{p}(t))) \rightarrow T(t) = \Phi_{s}(f(\mathbf{p}(t)))$ 로 정리할 수 있습니다.

opaque density 정의

\begin{aligned} \int^{t}_{0}\rho(u)du &= -\ln\Phi_{s}(f(\mathbf{p}(t))) \\ \rightarrow \rho(t) &= \frac {-\frac{d\Phi_{s}}{dt}(f(\mathbf{p}(t)))}{\Phi_{s}(f(\mathbf{p}(t)))} \end{aligned}

T에 대해 opaque를 적분 식으로 표현하고, 해당 방정식의 양변을 미분하여 opaque를 정의하였습니다.
multiple 교차평면에서 $-\frac{d\Phi_{s}}{dt}(f(\mathbf{p}(t)))$ 부분이 음수가 되므로, 최소 값을 0으로 clip하였습니다. (general case)

Discretization

ray에서 discrete하게 샘플링해야 하므로, 렌더링 수식은 아래와 같이 정의됩니다.

\hat{C}=\sum\limits^n_{i=1}T_{i}\alpha_{i}c_{i}

이때 $\alpha$ 는 $1 - \exp(-\int^{t_{i+1}}_{t_{i}}p(t)dt)$ 로 정의됩니다. $T=\prod^{i-1}_{j=1}(1-\alpha_j)$ 는 NeRF와 동일하게 정의됩니다.

Visualization

위의 ray를 따라 두 개의 object가 위치합니다.
- ray와 각 object의 위치 관계가 중요합니다.
SDF를 표현하면, surface는 0의 값을, inside와 outside는 다른 부호 값을 갖게 됩니다.
weight function은 표면에서 가장 큰 값을 가집니다.(local maximum)
- 이때, ray와 가까운 object의 표면은 거리가 먼 ray보다 더 큰 value를 갖게 됩니다.

Training

\mathcal{L} = \mathcal{L}_{color}+\alpha\mathcal{L}_{reg}+\beta\mathcal{L}_{mask}

최종 loss는 위와 같습니다. 각 term별로 살펴보겠습니다.

Color loss

\mathcal{L}_{color}=\frac{1}{m} \sum\limits_{k}\mathcal{R}(\hat{C}_k,C_k)

IDR과 동일하게, outlier에 강하고 training이 안정적인 L1 loss를 사용했습니다.

Reg loss

\mathcal{L}_{reg} = \frac{1}{nm}\sum\limits_{k,i}(||\nabla f(\hat{\mathbf{p}}_{k,i})||_{2}-1)^{2}

SDF를 규제하기 위해서 Eikonal term을 사용했습니다. eikonal equation $||\nabla f_{SDF}|| = 1$ 을 통해 eikonal loss인 $\mathcal{L}_{reg} = \frac{1}{nm}\sum\limits_{k,i}(||\nabla f(\hat{\mathbf{p}}_{k,i})||_{2}-1)^{2}$ 가 유도됩니다.

Mask loss (optional)

\mathcal{L}_{mask}=\text{BCE}(M_{k},\hat{O}_{k})

mask가 있는 경우에, $\hat{O}_{k}=\sum\limits^n_{i=1}T_{k,i}\alpha_{k,i}$ 와 mask 간의 BCE로 최적화하였습니다.

Hierarchical sampling

NeuS에서는 coarse와 fine network를 동시에 최적화하기 때문에, 하나의 network만 사용합니다. coarse sampling에서의 확률이 S-density $\phi_s(f(\mathbf{x}))$ 에서 fixed std로 계산되고, fine samplingd에서의 확률이 learned std $s$ 에서 계산되기 때문입니다.