잡생각 (0715 updated)

우병주·2024년 7월 11일

Recent works have shown that neural networks tend to make high confidence predictions even for completely unrecognizable (Nguyen et al., 2015) or irrelevant inputs (Hendrycks & Gimpel, 2017; Szegedy et al., 2014; Moosavi-Dezfooli et al., 2017) It has been well documented (Amodei et al., 2016) that it is important for classifiers to be aware of uncertainty when shown new kinds of inputs, i.e., out-ofdistribution examples. Therefore, being able to accurately detect out-of-distribution examples can be practically important for visual recognition tasks (Krizhevsky et al., 2012; Farabet et al., 2013; Ji
et al., 2013).

from ODIN

What the Model Cannot Know => model awareness of its limitations (gpt) / unavailability

novel view synthesis에서 adversarial attack도 가능할까?

Out-of-Distribution Detection: Methods like Maximum Softmax Probability (MSP), ODIN (Out-of-DIstribution detector for Neural networks), and adversarial examples can help detect when an input is out-of-distribution. / uncertainty가 높은 경우도 OOD 상황이라고 할 수는 있지만,,, 보통 uncertainty가 OOD에서 낮게 나오는게 문제긴 함. 닭 달걀

OOD Detection의 장점

DeCoOp의 경우, OOD이면 Zero shot CLIP을, ID이면 Prompt tuning을 쓰는 방식을 채택함. 그결과 OOD+ID의 성능이 가장 높음.
자율주행시 안전한 모드로 전환(저속 주행, 운전자 알림)하거나 의료 AI는 의료 전문가에게 의뢰

OOD에서 Uncertainty

OOD, Unavailable input에 대해서 일반적으로 모든 class의 확률이 uniform하게 예측되는게 ideal case라고 많이들 말함.

continual setting에서는 현재 GS와 정보차이가 많이 나는 image가 OOD이려나

unavailability?

3D uncertainty field

However, their simple way(BNN, Deep Ensemble, MC Dropout) of quantifying uncertainty by calculating the variance in the output space is not appropriate for identifying out-of-distribution data, particularly on 3D data.

"However, without the knowledge in the unknown scene regions, all these NeRF-based methods are unable to explicitly recognize them and
allocate high uncertainties there"

unobserved를 'density aware'에서 처음 다루긴 했지만, occlusion을 다루지는 않았음.

Density aware

문제의식: naive approach to ensembling often does not capture the epistemic uncertainty in parts of a scene that were unobserved during training
관찰: unobserved region에 대해 각 NeRF(앙상블 요소)는 low termination probability를 보인다는 특징이 있었음.
즉, summed termination probability along ray r 이 1에 가까우면 학습중에 본 영역이고, 0에 가까우면 보지 못한 영역임.
이것을 epistemic uncertainty로 해석: fundamental lack of knowledge about the scene geometry and appearance along r
-> GS에서도 이런 일이 일어날까?

적용: 따라서, density에 따라서 epistemic uncertainty라고 정의하고, ensemble로 정의된 RGB Uncertainty와 합으로 최종 uncertainty를 정의함
효과: 그냥 naive ensemble과 다르게 unobserved에서 uncertainty 높게 나오더라~
그냥 OOD Detection을 한거에 가깝다는 느낌

3D uncertainty field

However, their simple way(BNN, Deep Ensemble, MC Dropout) of quantifying uncertainty by calculating the variance in the output space is not appropriate for identifying out-of-distribution data, particularly on 3D data.
"However, without the knowledge in the unknown scene regions, all these NeRF-based methods are unable to explicitly recognize them and allocate high uncertainties there"
unobserved를 'density aware'에서 처음 다루긴 했지만, occlusion을 다루지는 않았음.
정보) LLFF는 forward-facing dataset이며, NeRF-360은 뱅글뱅글 도는 dataset이다.
우선 기본적으로는 DVGO를 probability distribution으로 표현

Recent Uncertainty Study

single forward pass에서 uncertainty를 예측하는 Deep Deterministic Uncertainty가 (비교적) 최근의 uncertainty (혁신적인) 트렌드라고 하더라.
Deep Ensemble, MC Dropout, Bayesian 방식에 비해 OOD Detection을 잘한다는 특징들이 있음.
논문 리스트
- DUQ: Uncertainty Estimation Using a Single Deep Deterministic Neural Network (ICML 2020/463 cite)
- SNGP: Simple and Principled Uncertainty Estimation with
Deterministic Deep Learning via Distance Awareness (NIPS 2020/ 425 cite)
- DDU: Deep Deterministic Uncertainty: A New Simple Baseline (CVPR2023/42 cite/21년 2월 아카이빙..?)
DUQ, DDU는 Yarin Gal 논문임.
인용수만 봐도 좀 처참한듯? 다들 CIFAR10 같은 데이터셋에서 큰 개선이 있었지만 다른 분야로의 확장이 더 많이 연구되지 더 깊이있게는 잘 안하는거 같기두. (태스크별 domain knowledge가 중요한거 같긴함). 물론 저 논문들을 인용한 GS논문은 전혀 없음.
DUQ:
- DUQ는 classification을 학습하면서 각 class의 feature vector (centroids) $E$ 들을 저장함. (EMA update)
- 모델을 RBF model(1998, LeCun)으로 설계; 모델의 output (feature vector)와 centroid사이의 distance를 RBF kernel distance로 정의해서 distance를 minimize하게 함.
- 그럼, uncertainty를 output인 feature vector와 가장 가까운 centroid의 distance로 측정가능하며, 기존 방식들(softmax)보다 OOD를 잘 인지한다고 함.
- 추가적으로 input 변화에 대한 민감도(sensitivity)를 보정하기 위해 그래디언트 페널티 (regualrization)을 적용. Feature collapse - OOD sample이 ID feature space로 mapping 되는 현상- 를 보정하기 위해 gradient가 너무 커지거나 작아지지 않도록 하는 normalization
- epistemic aleatoric 둘다 포함하는 uncertainty이며, formal하게 disentangle하기 어렵다고 함. 다만, data point가 feature space상에서 모든 centroid랑 멀다면 epistemic이라고 볼 수 있고(OOD느낌), centroid들 사이에서 여러 centroid와 모두 가깝다면(ambiguaty) aleatoric이라고 볼 수 있음.
SNGP:
- 위 그림처럼, Gaussian Process는 In-distribution data와 멀어지면 uncertainty가 증가하나 MC dropout/Deep Ensemble은 그렇지 않음.
- DNN의 마지막 layer를 Fully connected layer 대신 GP로 교체 & GP의 covariance matrix는 RBF kernel을 이용
- http://dsba.korea.ac.kr/review/?mod=document&uid=1413 <- 참고
- 가우시안 프로세스 잘 설명 : https://aistory4u.tistory.com/entry/%EA%B0%80%EC%9A%B0%EC%8B%9C%EC%95%88-%ED%94%84%EB%A1%9C%EC%84%B8%EC%8A%A4-%ED%9A%8C%EA%B7%80
DDU
https://stopspoon.tistory.com/78

Gaussian Process (GP)
https://www.youtube.com/watch?v=9NeDYW9BfpQ