RobustNeRF

김민솔·2024년 10월 16일

NeRF

목록 보기

6/10

가짜연구소 NeRF with Real-World에서 진행한 RobustNeRF 리뷰 영상입니다.

https://youtu.be/1FwdNqhCDB4?si=G41MQOqZ7e2XZzkZ

1. Introduction

view-dependent 효과와 distractor를 구분하는 모델링은 Neural Scene Renderion 분야에서 challenge로 다뤄지고 있습니다.
RobustNeRF에서는, distractor를 outliers로 취급하여 NeRF 최적화함으로, 효과적인 렌더링을 구현하였습니다.

vs. mip-NeRF 360

mip-NeRF 360 모델은 unbounded scene에 대해 카메라로 360도의 환경을 렌더링하는 모델입니다. mip-NeRF 360은 사진에서 볼 수 있듯이, distractor를 view-dependent요소로 파악하여 렌더링하여, floater를 생성하는 것을 확인할 수 있습니다. RobustNeRF의 경우, Robust loss를 적용하여 distractor를 구분하여 더 정확한 렌더링을 구현하였습니다.

NeRF

NeRF 개념에 관한 정보는 제 블로그 포스트를 참고해주시면 감사하겠습니다!
링크: https://velog.io/@rlaalsthf02/NeRF
해당 포스트는 NeRF를 알고 계시다는 전제로 작성되었습니다.

Previous works

RobustNeRF 논문에서는 기존에 수행되던 연구들을 다음과 같이 분류하였습니다.

1) 특정 class에 속한 경우, segmentation model로 제거
-> generalization X
2) time data를 모델링하여 scene을 static과 dynamic 요소로 분리
-> video data에만 적용 가능
3) distractor를 transient 현상으로 따로 모델링 (NeRF-W)
-> loss tuning에서 어려움을 겪음

3. Method

3.1 Sensitivity to outliers

위의 사진을 보면, camera 1과 camera 3에서의 distractor가 reconsturction 시에 영향을 주는 것을 확인할 수 있습니다. 즉, 모델이 outlier와 view-dependent 요소를 구분하지 못하는 상황임을 알 수 있습니다. 다음의 문제는 NeRF 모델을 uncontrolled setting 이미지에 적용할 때 발생하는 문제입니다. (~ NeRF-W)

3.2 Robustness to outliers

oulier에 강건한 모델을 만드는 loss에는 크게 두 가지가 있습니다. (1) segmentation oracle을 적용하는 방법과, (2) robust kernel을 적용하는 방법입니다.

(1) Robust via semantic segmentation

\mathcal{L}^{\mathbf{r},i}_{\text{oracle}} (\theta) = \mathbf{S}_{i}(\mathbf{r}) \cdot ||\mathbf{C}(\mathbf{r};\theta) - \mathbf{C}_{i}(\mathbf{r})||^{2}_{2}

pixel $\mathbf{r}$ 이 outlier로 주어졌을 때, pretrained segmentation network $\mathcal{S}$ 를 통해 oracle $\mathbf{S}$ 를 얻어냅니다. 해당 기술의 문제점은 oracle이 임의의 distractors를 구분해야 한다는 점입니다. (disable for unknown class)

(2) Robust estimators

\mathcal{L}^{\mathbf{r},i}_{\text{robust}}(\theta) = \kappa(||\mathbf{C}(\mathbf{r};\theta) - \mathbf{C}_{i}(\mathbf{r})||_{2})

$\kappa(\cdot)$ : robust kernel
L2 loss를 robust loss로 교체하여 photometrically-inconsistent 요소들(=outliers)을 낮은 가중치로 학습시킬 수 있습니다. 이를 통해 NeRF 모델이 robustness 특징을 갖게 됩니다.

Family of robust kernels

$\alpha == 2$ : L2 loss
$\alpha == 1$ : Charbonnier loss (Mip-NeRF)

위에는 robust kernel을 $\alpha$ 값에 따라 분류한 그래프이며, 아래는 Geman-McClure loss의 $\alpha$ 값을 조절하며 얻은 결과입니다. Geman-McClure를 강하게 적용할 경우( $\alpha = -2$ ), outlier와 high-frequency 정보가 모두 날라가는 현상이 발생합니다. (right picture) 반대로 Geman-McClure를 약하게 적용할 경우( $\alpha = 0$ ), outlier를 제거하지 못하는 현상이 발생합니다. (left picture) 즉, $\alpha$ 값에 따른 trade-off를 겪게 됩니다.

3.3 Robustness via Trimmed Least Squares

따라서, 우리는 robust kernel family 중 하나를 사용하지 않고, 아래의 Trimmed Least Sqaure를 적용하여 Robust loss를 얻게 됩니다.

Iteratively Reweighted Least-Squares

IRLS는 robust estimation에 주로 사용되는 weighted LS 해결법입니다. weights는 outlier의 영향을 줄이는 방향으로 학습됩니다. 자세한 설명은 Appendix에 남겨두었습니다.

\mathcal{L}^{\mathbf{r},i}_{\text{robust}}(\theta^{(t)}) = \omega(\epsilon^{(t-1)}(\mathbf{r})) \cdot ||\mathbf{C}(\mathbf{r};\theta^{(t)}) - \mathbf{C}_{i}(\mathbf{r})||_{2}^{2}

\epsilon^{(t-1)}(\mathbf{r}) = ||\mathbf{C}(\mathbf{r};\theta^{(t-1)}) - \mathbf{C}_{i}(\mathbf{r})||_{2}

$\omega(\epsilon)=\epsilon^{-1} \cdot \partial{\kappa(\epsilon)}/\partial{\epsilon}$ : weight functions (original IRLS) -> local minima
original weight function을 사용하게 되면, 위에서 보았던 local minima에 빠지는 현상이 발생합니다. 따라서 weight function을 발전시킨 Trimmed Robust Kernel이 제시되었습니다.

Trimmed Robust Kernels

\tilde{\omega}(\mathbf{r}) = \epsilon(\mathbf{r}) \le \mathcal{T}_{\epsilon}

$\mathcal{T}_{\epsilon} = \text{Median}_\mathbf{r}\{\epsilon(\mathbf{r})\}$ : 50% percentile
residual을 sorting 후, 특정 percentile(median) 내에 드는 residual을 inliers로 분류합니다.
위의 그림에선 전체 이미지에 대한 weight function 적용 과정이 시각화되었지만, 실제로는 patch 단위로 샘플링하여 학습되었습니다.

\mathcal{W}(\mathbf{r}) = \tilde{\omega}(\mathbf{r})|(\tilde{\omega}(\mathbf{r}) \times \mathcal{B}_{3\times3}) \ge \mathcal{T}_*

$\mathcal{T}_*$ : 0.5
outlier가 이미지 내에서 spatial smoothness 특성을 가진다는 inductive bias를 적용하였습니다. outlier의 spatial smoothness를 포착하기 위해, inlier/outlier labels $\omega$ 에 3x3 box kernel $\mathcal{B}_{3\times3}$ 를 diffuse합니다. 해당 수식은 high-frequency 정보가 outlier로 오분류되는 것을 방지하는 역할을 합니다.

\omega(\mathcal{R}_{8}(\mathbf{r})) = \mathcal{W}(\mathbf{r})|\mathbb{E}_{\mathbf{s}\sim\mathcal{R}_{16}(\mathbf{r})} [\mathcal{W}(\mathbf{s})] \ge \mathcal{T}_\mathcal{R}

$\mathcal{T}_\mathcal{R}$ : 0.6
마지막으로, outlier detection 정보를 16x16 neighborhood에 종합하는 과정입니다. 학습 초기(coarse-grained structure)에 high-frequency 정보를 오분류하는 현상을 방지하기 위해 추가되었습니다. (많은 iteration을 통해 high-frequency 정보를 얻을 수도 있지만, 해당 논문에선 오히려 강한 inductive bias를 적용했을 때 효율적으로 해결할 수 있었습니다.)

16x16 패치 안에 8x8 inner patch를 생성하고, inner patch의 주변 정보를 활용하여 outlier/inlier 분류합니다. 위의 수식들을 통해 최종적인 weight function을 얻게 됩니다.

4. Experiments

natural scene과 synthetic scene을 나눈 데이터셋을 사용하여, 각 scene에 대한 평가를 따로 진행하였습니다. 평가에는 RobustNeRF와 mip-NeRF 360에 다른 loss들(L2, L1, Charbonnier)을 적용한 비교와 D2 NeRF와의 비교가 포함되었습니다.

Evaluation on Natural Scenes

Quantitative

각 scene에 대해 RobustNeRF가 가장 높은 지표를 보이고 있습니다.

Qualitative

정량 지표에서도 확인할 수 있지만, distractor가 없을 때는 mip-NeRF 360이 RobustNeRF보다 뛰어난 성능을 보입니다.
그럼에도, RobustNeRF 활용 시 distractor를 잘 구분하여 렌더링이 구현됨을 확인할 수 있습니다.

Evaluation on Synthetic Scenes

Synthetic scene에 대해서는, RobustNeRF가 quantitative 및 qualitative 측면에서 모두 가장 좋은 성능을 보였습니다.

Ablations

Robust loss만 적용했을 시에는 성능이 더 감소합니다.
이때 smoothing을 추가하여 fine-grained detail을 파악하여 성능이 향상됨을 파악할 수 있습니다.
patching 샘플링 적용 시에는, LPIPS와 PSNR에서 더 좋은 성능을 얻을 수 있었습니다.

5. Conclusion

Contribution

distractor를 더 잘 구분하는 NeRF 최적화를 구현하였습니다.
IRLS를 활용한 trimmed LS, inductive bias 적용하여 fine-grained control이 가능합니다.
synthetic 데이터셋에 대해서 SOTA를 기록하였고, 다른 데이터셋에 대해서도 좋은 성능을 보였습니다.

Limitation

loss -> statistical inefficiency
- 많은 Training iteration 소요한다는 단점을 가집니다.
clean data에 대해서 좋지 않은 렌더링
- RobustNeRF는 outlier가 존재한다고 미리 가정되어 발생하는 문제입니다.

Appendix

IRLS(Newton-Raphson)

IRLS(Newton-Raphson): E’(x)=0의 해를 찾기 위해, 반복적으로 해당 수식 적용하는 최적화 방법입니다.
- x1의 접선이 축과 만나는 점인 x2를 구하기
- 위의 과정 반복 -> 최적해 찾기

error function(least-sqaure) 최소화합니다.

Reference

[1] RobustNeRF, ETH Zu ̈rich/Microsoft, https://robustnerf.github.io/

[2] mip-NeRF 360, Jonathan T. Barron, https://jonbarron.info/mipnerf360/

[3] Newton-Raphson 최적화 설명, https://darkpgmr.tistory.com/142

[4] Newton-Raphson image, https://big-dream-world.tistory.com/33

[5] Patch sampling 사진 사용, NeRF On-the-go, https://arxiv.org/pdf/2405.18715

김민솔

Interested in Vision, Generative, Neural Rendering

이전 포스트

NeRF for Outdoor Scene Relighting

다음 포스트

RobustNeRF

NeRF

1. Introduction

vs. mip-NeRF 360

2. Related Work

NeRF

Previous works

3. Method

3.1 Sensitivity to outliers

3.2 Robustness to outliers

(1) Robust via semantic segmentation

(2) Robust estimators

Family of robust kernels

3.3 Robustness via Trimmed Least Squares

Iteratively Reweighted Least-Squares

Trimmed Robust Kernels

4. Experiments

Evaluation on Natural Scenes

Quantitative

Qualitative

Evaluation on Synthetic Scenes

Ablations

5. Conclusion

Contribution

Limitation

Appendix

IRLS(Newton-Raphson)

Reference

NeRF for Outdoor Scene Relighting

NeRF On-the-go

0개의 댓글