[논문 리뷰] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs (DeepLab v1)

PROLCY·2024년 3월 15일

논문 인공지능

인공지능

목록 보기

18/37

오늘은 Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs (DeepLab v1)에 대한 간단한 리뷰이다.

Convolutional neural networks for dense image labeling

Efficient dense sliding window feature extraction with the hole algorithm

dense한 feature를 추출하기 위해 VGG-16의 fc layer를 conv layer로 변환했는데, 이것으로 충분하진 않아 hole algorithm 또는 atrous algorithm이라고 알려져있는 방식을 사용했다. filter 사이에 0을 넣어서 더 넓은 부분을 보지만 연산량이 늘어나지는 않는 알고리즘이다.

Controlling the receptive field size and accelerating dense computation with convolutional nets

또한 첫 번째 first layer를 4x4나 3x3의 사이즈로 subsampling하여서 receptive field를 줄임과 동시에 연산 시간도 감소시켰다.

Detailed bounddary recovery: fully-connected conditional random fields and multi-scale prediction

Fully-connected conditional random fields for accurate localization

기존의 conditional random fields(CRF)는 noisy한 segmentation map을 부드럽게 하기 위해서 사용되었다. 하지만 현대의 Deep CNN는 score map이 기존 방법론보다 비교적 꽤 smooth하기 때문에 smooth에 목적을 두는 기존의 CRF보다는 세부적인 local structure를 복원하는 새로운 CRF가 필요하다. 그래서 fully connected CRf model을 모델에 적용한다.

Multi-scale prediction

입력 이미지와 첫 네 개의 max pooling layer의 아웃풋에 two-layer MLP를 붙이고, 그것의 아웃풋과 모델의 마지막 feature map을 concat하여 softmax하였다. 성능은 fully-connected CRF만큼 dramatic하지는 않았다고 한다.

Experimental evaluation

성능은 위와 같고, Multi-scale, fully-connected CRF, atrous algorithm 모두를 적용한 모델이 가장 높은 성능을 보여주고 있다.

후기

DeepLab 시리즈의 시작을 알리는 논문이다. 뒤에 버전은 어떤 방법을 써서 성능을 높일지 기대가 된다. CRF 개념을 수학에서 처음 접했었는데, 이렇게 실제 ML에서 사용되는 것을 보니 신기하다는 생각이 든다.

PROLCY

이전 포스트

[논문 리뷰] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

다음 포스트