[Object Detection] Neck

HipJaengYiCat·2023년 5월 4일

DeepLearning

목록 보기

15/16

이번 강의에서는 이미지에서 Feature를 추출하는 Backbone과 Region Proposal Network(RPN)을 연결하는 Neck에 대해 알아봅니다. Neck은 왜 필요한지, Neck의 종류에는 어떤 것들이 있는지(FPN, PANet, DetectorRS, BiFPN, NASFPN, AugFPN) 훑어볼 것입니다. 각 구조가 어떤 아이디어에서 출발했는지, 어떤 구조를 갖는지, 또 어떻게 구현되어 있는지 꼼꼼히 학습해봅시다.

Neck은 왜 필요한지?

다양한크기의객체를더잘탐지하기위해서

• Low level의 feature는 semantic이 약하므로 상대적으로 semantic이 강한 high feature와의
교환이 필요

Neck의 종류에는 어떤 것들이 있는지

FPN, PANet, DetectorRS, BiFPN, NASFPN, AugFPN

FPN

high level에서 low level로 semantic 정보 전달 필요
따라서 top-down path way 추가
• Pyramid 구조를 통해서 high level 정보를 low level에 순차적으로 전달
• Low level = Early stage = Bottom
• High level = Late stage = Top

Bottom-up

Top-down

Lateral connections

Bottom-up
Top-down

** Nearest Neighbor Upsampling

Contribution

• 여러 scale의 물체를 탐지하기 위해 설계
• 이를 달성하기 위해서는 여러 크기의 feature를 사용해야할 필요가 있음

Summary

• Bottom up (backbone)에서 다양한 크기의 feature map 추출
• 다양한 크기의 feature map의 semantic을 교환하기 위해 top-down 방식 사용

PANet

backbone이 resnet으로 low level feature map이 다음 Layer로 bottom-up 하게되는 시간이 오래걸림, feature를 제대로 전달 될 수 없음

Bottom-up Path Augmentation

DetectoRS

Motivation
• Looking and thinking twice
• Region proposal networks (RPN) • Cascade R-CNN

Recursive Feature Pyramid (RFP)

ASPP

receptive field를 늘리기 위한 방법

Switchable Atrous Convolution (SAC)

BiFPN

EfficientDet: Scalable and Efficient Object Detection 에서 소개됌
Bi-directional Feature Pyramid

Weighted Feature Fusion

• FPN과 같이 단순 summation을 하는 것이 아니라 각 feature별로 가중치를 부여한 뒤 summation
• 모델 사이즈의 증가는 거의 없음
• feature별 가중치를 통해 중요한 feature를 강조하여 성능 상승

NASFPN

NAS-FPN : Learning Scalable Feature Pyramid Architecture for Object Detection 에서 제안됌
• 기존의 FPN, PANet : Top-down or bottom up pathway
• 단순 일방향(top->bottom or bottom ->top) summation 보다 좋은 방법이 있을까?
• 그렇다면 FPN 아키텍처를 NAS (Neural architecture search)를 통해서 찾자!

요약

• COCO dataset, ResNet기준으로 찾은 architecture, 범용적이지 못함 • Parameter가 많이 소요
• High search cost
• 다른 Dataset이나 backbone에서 가장 좋은 성능을 내는 architecture를 찾기 위해 새로운 search cost

AugFPN

AugFPN : Improving Multi-scale Feature Learning for Object Detection에서 제안됌
• Problems in FPN
• 서로 다른 level의 feature간의 semantic차이
• Highest feature map의 정보 손실
• 1개의 feature map에서 RoI 생성

• 주요구성
• Consistent Supervision
• Residual Feature Augmentation
• Soft RoI Selection

Residual Feature Augmentation

• Ratio-invariant Adaptive Pooling
• 다양한 scale의 feature map 생성
• 256 channels

동일한 size로 upsampling
Adaptivie Spatial Fusion : N 개의 feature에 대해 가중치를 두고 summation 하는 방법
3개의 feature map을 Concat하고 N x ( 1 x h x w ) 의 값을 구함
이 때 N x ( 1 x h x w ) 은 spatial weight를 의미
N x ( 1 x h x w ) 를 각 N 개의 feature에 곱해 가중 summation

Soft RoI Selection

• FPN과 같이 하나의 feature map에서 RoI를 계산하는 경우 sub-optimal
• 이를 해결하기 위해 PANet에서 모든 feature map을 이용했지만, max pool하여 정보 손실 가능성
• 이를 해결하기 위해 Soft RoI Selection을 설계