[2021 ICCV] Deep Metric Learning for Open World Semantic Segmentation

yellofi·2022년 5월 25일

Anomaly semantic segmentation Incremental few-shot learning Open World Semantic Segmentation semantic segmentation

1. Introduction

classical close-set semantic segmentation network는 안전이 중요한 자율주행와 같은 application에서 중요한 Out-of-Distribution (OOD)를 detect하는 데에 한계를 지니고 있음

우리는 다음과 같은 두 개의 모듈을 가진 open world semantic segmentation system을 제안할 거야

1) in-distribution이랑 OOD object 둘 다 detect하는 open-set semantic segmentation 모듈
2) 기존의 knowledge base에 OOD object들을 점진적 (gradually)으로 통합하는 incremental few-shot learning 모듈

이 open-set semantic segmentation을 구현하기 위해 contrastive clustering과 함께 Deep Metric Learning Network (DMLNet)을 적용했어

keywords: out-of-distribution, deep metric learning, open-set semantic segmentation, anomaly semantic segmentation, incremental few-shot learning,

test set이 training set과 유사할 것이라는 close-set이라고 가정하지만, 실제는 open world다

1) know unknown 구분 (ood)
2) unknown object annotation (manual annotation)
3) incremental few-shot learning으로 classification range 증가 (input, label of unknown, known prediction)
4) few-shot learning으로 unknown object를 포함하여 더 정확한 prediction (enlarged domain)

Anomaly semantic segmentation

1) uncertainty estimation-based methods

uncertainty estimation의 baseline은 maximum softmax probability (MSP), softmax probability 대신에 maximum logit (MaxLogit)을 사용해 성능 개선을 하기도 했고 neural network를 확률 관점으로 설계해서 weight, output이 확률분포로 출력하는 bayesian network를 이용되기도 했었음

2) generative model-based methods

최근엔 SynthCP, DUIR과 같은 GAN resynthesis가 sota를 찍었는데, 그것들은 두 개 이상의 network를 필요로 하며 lightweight과는 거리가 멀다.

이 연구에선 contrastive clustering 기반 DMLNet이 inference 한 번만 해주면 되지만, better anomaly segmentation 성능을 보이는 걸 입증했다.

Deep metric learning networks

Euclidean, Mahalnobis, Mastusita distance를 이용한 metric space에서 embedded feature similarity를 계산하는 것이 핵심 아이디어

Convolutional prototype network와 DMLNet은 image-level OOD 샘플 탐지, semantic segmentation에서의 few-shot learning과 같은 특정 문데들을 해결하는 데 같이 사용되는데, 이 연구에서도 open world semantic segmentation을 위한 first DMLNet을 만드는 데 이런 조합을 따랐다

Open world classification and detection

Open world classification은 Nearest Non-Outlier (NNO) algorithm을 제안한 연구에서 처음 소개되었음

최근에 contrastive learning, an unknown-aware proposal network, and energy-based unknown-aweare identification criteria를 기반으로 한 open world detection system이 제안되었는데,

이 연구는 그것과 유사하게 가되, 두 가지가 다름

1) 그들의 open-set detection module은 class에 무관한 Region Proposal Network (RPN)이 label되지 않은 잠재적인 OOD object 또한 탐지될 수 있어서 이는 훈련에 영향을 미칠 수 있는데, 이 연구에서 in-distirbution label이 할당된 pixel만 training에 사용되고 OOD샘플들은 training에 추가될 수 없음

2) incremental learning module에서 그들은 novel class의 모든 labeled data를 사용하는데, 이 연구에서는 더 어려운 few-shot condition으로 접근한다. incremental few-shot learning.

3. Open world semantic segmentation

open-set semantic segmentation module도 두 가지 submodule로 나뉨
1) close-set semantic segmentation submodule
2) anomaly segmentation submodule

anomaluous probability map인 P_i,j를 threshold lambda를 통해 in-distribution과 out-of-distribution을 둘 다 구분한다

4. Approach

Deep Metric Learning Network

feature extractor와 classifier로 나뉘는 기존 CNN-based semantic segmentation은 known class에 모든 feature space를 할당하기 때문에 OOD class를 고려할수 없음

DMLNet에서는 저 classifier 부분이 t개의 prototype을 가진 Euclidean distance representation으로 교체된다. m_t는 클래스 C_in,t의 t-th prototype

feature vector와 metric space에서의 prototype과 같은 dimension으로 feature extractor를 설계

이걸 X가 t-th prototype일 확률로 표현하면

L = SUM(-log(p))로 discriminative cross entropy (DCE) loss를 정의하면

Y는 input image X의 label, 분자는 해당 label이 속할 prototype을 당기고, 분모는 관계없는 prototype과 멀어지는 것을 의미

같은 prototype에 해당하는 sample들은 compact하게 만드는 variance loss (VL)

Open-set Semantic segmentation module

1) close-set semantic segmentation submodule

input pixel X_i,j에 대하여 Eq2의 확률을 최대로 하는 prototype t가 predicted Y가 됨

2) anomaly segmenation submodule

metric-based maximum softmax probability (MMSP)와 Euclidean distance sum (EDS)를 이용해 anomalous probability를 측정하게 됨

1에서 어떤 prototype t에 속할 최대 확률을 빼준 게 MMSP

EDS가 모든 prototype과의 ED를 모두 합한 것으로 feature가 metric space의 중앙에 위치해야 EDS값이 더 작을 것이라는 아이디어

EDS가 크다면 어떤 prototype에 속해있을 확률이 큰 거고 어디에도 안 속해있으면 EDS가 작아서 anomalous한 확률이 높다고 보는 관점

그림을 보면 EDS은 모든 pixel 중에서 maximum distance sum에 관련된 비율이기 때문에 모든 이미지에서 높은 anomalous score area가 관찰됨

EDS에 따른 probability가 in-distribution에서도 서로 다름. OOD objects의 경우 확실히 더 작은 값으로 몰려서 큰 probability를 가짐

그래서 EDS에 MMSP를 결합해 실제 in-distribution에 해당하는 pixel에 대한 probability를 suppress해줌

Eq 5가 close-set semantic segmentation submodule이고 Eq 9가 anomalous
segmenation의 결과로 Eq 1을 통해 최종 open-set segmentation map을 생성한다.

Incremental few-shot learning module

open-set segmentation map이 labeler에게 전해져서 new class에 대한 annotation을 하는데, (1) 오직 new class만 annoatation하고 (2) 5 장보다 적게 annotation 한다.

두 가지 incremental few-shot learning 방법을 적용함

1) Pseudo Label Method (PLM)

여기서 DMLNet이 final branch head랑 backbone이랑 나뉘게 되는데 branch head가 old class에 대한 prediction map을 제공하고 labeler가 새로운 class에 대한 annotation을 제공한다. 모든 이미지들이 new branch head를 학습시키는 데에 사용된다.

open-set semantic segmentation으로 학습된 net에서 head in (in-distribution)의 prediction map을 시작해서 OOD object로 예측된 map이 각 head에 따라 나오면 N+t로 할당해주면서 새로운 class로 해당하면서 덮어주고 마지막 head에 도달하면 거기에는 새로운 class에 대해 annoatation 해준 Y_k+1을 덮어주는 방식

2) Novel Prototype Method (NPM)

특정 OOD novel class들의 feature vector들끼리도 서로 뭉쳐있을 것이다. novel 클래스 m_N+k는 다음과 같이 그 novel 클래스에 속한 모든 feature들의 평균으로 구함

Q는 annotated sample의 수, Y는 novel class C_out의 ground truth binary mask, F는 DMLNet의 feature map R^(NxHxW)

원래 in-distribution prototype들이 모두 one-hot vector들이기 때문에, 새로운 prototype set은 더 이상 evenly distributed하지 않는데 novel class에 새롭게 추가된 prototype은 그렇지 않음

novel class를 Eq 2로 classify할 수는 없음. 어떤 pixel이 C_out,k에 속할 것인지 two criteria를 통해 결정하기로 함

novel pixel은 해당하는 prototype에 충분히 가까워야하고 모든 prototype 중에서도 해당하는 prototype에 가장 가까워야하는 intuition

이는 뿐만 아니라 모델을 전혀 updated하지 않기 때문에 catastrophic forgetting도 굉장히 잘 handle한다.

5. Experiments

Open-set semantic segmentation

PSP Net을 baseline으로 사용

Incremental few-shot learning

Cityscapes dataset에서 PLM, NPM 두 가지 방법으로 평가되었는데, Car, truck, bus를 3 OOD classes 설정 (other 16 classes as in-distribution classes)

Open world semantic segmentation

6. Conclusion

OOD로 탐지된 novel class에 대한 annotation도 few-shot learning으로 cost를 대폭 줄였지만 여전히 bottleneck으로 작용할 것으로 생각됨

실제 open world를 가정하고 class를 하나씩 추가하면서 domain(class)을 enlarge한다고 하지만 실제 적용을 위한 성능까지는 한참 미치지 못할 것으로 예상된다.

하지만, 이와 같은 contribution을 지님

real-world 적용에 더 robust하고 practical한 open world semantic segmentation system을 처음 제안한 연구!
제안한 open-set semantic segmentation module로 SOTA (anomaly semantic segmentation)
-> close-set semantic seg. submodule + anomalous seg. submoule (MMSP + EDS)
catastrophic forgetting을 거의 방지할 수 있는 incremental few-shot learning module을 위한 방법을 제안 (PLM, NPM)
open-set semantic seg.과 incremental few-shot learning을 조합하여 open world semantic segmentation system 구축