[CV] Feature Pyramid Networks for Object Detection(FPN) review

๊ฐ•๋™์—ฐยท2022๋…„ 1์›” 18์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
5/17
post-custom-banner

๐ŸŽˆ ๋ณธ ๋ฆฌ๋ทฐ๋Š” FPN ๋ฐ ๋ฆฌ๋ทฐ๋ฅผ ์ฐธ๊ณ ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Key words

๐ŸŽˆ Feature Pyramid Networks
๐ŸŽˆ Bottom-up, Top-down Pathways
๐ŸŽˆ High,low Resolution & Low,High-level Features
๐ŸŽˆ FPN with Faster R-CNN

Introduction

โœ” Feature Pyramid Networks(FPN)์˜ ๊ธฐ์กด์˜ feature pyramids์™€ ๋‹ค๋ฅธ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. FPN์€ ๋…์ž์ ์ธ Object detection ๋ชจ๋ธ์€ ์•„๋‹™๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” FPN์„ ์‚ฌ์šฉํ•œ Faster R-CNN์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

โœ” ๊ธฐ์กด์˜ feature pyramids๋Š” Fig 1 (a)์™€ ๊ฐ™์€ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ feature pyramids๋Š” "scale-invariant" ๋ผ๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. "scale-invariant" ๋ผ๋Š” ๊ฒƒ์€ ๊ฐ์ฒด์˜ scale์ด ๋ณ€ํ•จ์—๋„ ๊ฐ์ฒด์˜ ํŠน์ง•์ด ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ํŠน์ง•์ž…๋‹ˆ๋‹ค.

โœ” ๊ธฐ์กด์˜ feature pyramids๋Š” ๋„ˆ๋ฌด ๋ฌด๊ฒ๊ธฐ์— ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์— ๋Œ€์•ˆ์œผ๋กœ Fig 1 (b)์˜ ํ˜•ํƒœ์ธ ConvNets๊ฐ€ ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ConvNets๋„ ์—ญ์‹œ ๋งค์šฐ robustํ•˜๋‹ค๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ, ๊ธฐ์กด์˜ pyramids๊ฐ€ ๋” ์ข‹์€ ์ •ํ™•๋„๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ConvNets ์—ญ์‹œ ๋‚ด์žฌ๋œ mutli-scale. ์ฆ‰, ํ”ผ๋ผ๋ฏธ๋“œ ํ˜•ํƒœ๋ผ๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ High-resolution maps(=Low-level Feature)๊ณผ low-resolution maps(=High-level Feature)๋กœ ์ธํ•ด large sematic gap ์ฐจ์ด๊ฐ€ ๋‚ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ๋Š” ๋ฐฉ๊ธˆ ์–ธ๊ธ‰ํ•œ large sematic gap์„ ์ค„์—ฌ ๋ชจ๋“  scale์—์„œ strong semantic feature pyramid๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. Fig 1 (d)๊ฐ€ ๋ชจ๋“  level์—์„œ rich semantic์„ ์ด๋ฃจ๊ฒŒ ํ•ด์ฃผ๋Š” ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.

โœ” ๋ณธ ํ”ผ๋ผ๋ฏธ๋“œ ๊ตฌ์กฐ๋Š” ๋ชจ๋“  scale์—์„œ end-to-end ๊ตฌ์กฐ๊ฐ€ ๊ฐ€๋Šฅํ•˜๊ณ , memory-infeasible ํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ ์œ„์˜ ์‚ฌ์ง„์€ High-resolution maps(=Low-level Feature)๊ณผ low-resolution maps(=High-level Feature)์— ๋Œ€ํ•ด ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Low-level Feature์—์„œ๋Š” ์„ ๊ณผ edge์™€ ๊ฐ™์€ ์ผ๋ถ€๋งŒ ํŒŒ์•…ํ•œ๋‹ค๋ฉด, High-level์—์„œ๋Š” ๋ฐ”ํ€ด์™€ ๊ฐ™์€ ๊ตฌ์ฒด์ ์ธ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋ถ„์„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Feature Pyramid Networks

โœ” FPN์€ single-scale์˜ ์ž„์˜ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ , ์‚ฌ์ด์ฆˆ์— ๋น„๋ก€ํ•œ multiple level์˜ feature map์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ backbond Conv ๊ตฌ์กฐ์™€๋Š” ๋…๋ฆฝ์ ์ธ ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ResNet์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ํ”ผ๋ผ๋ฏธ๋“œ ๊ตฌ์กฐ๋Š” bottom-up pathway, top-down pathway and lateral connection 3๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค.

โœ” Bottom-up pathway๋Š” backbone ConvNet์„ ์ˆœ์ „ํŒŒ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ™์€ map size๋ฅผ outputํ•˜๋Š” layer๋“ค์€ ๊ฐ™์€ ๋„คํŠธ์›Œํฌ ์Šคํ…Œ์ด์ง€๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ ์Šคํ…Œ์ด์ฆˆ์˜ ๋งˆ์ง€๋ง‰ ์ธต์„ reference set of feature map์œผ๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋‹น์—ฐํ•œ๊ฒŒ ๊ฐ ์Šคํ…Œ์ด์ง€์—์„œ ๊ฐ€์žฅ ๊นŠ์€ ๋ ˆ์–ด์–ด๊ฐ€ ๊ฐ€์žฅ ๊ฐ•ํ•œ feature์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
โœ” Bottom-up pathway๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๊ฐ๊ฐ์˜ {C2,C3,C4,C5} last residul block์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” Top-down pathway๋Š” High-level feature์—์„œ upsampling๊ณผ lateral connection์„ ํ†ตํ•ด ๋‚ด๋ ค์˜ค๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. 1x1 conv layer์„ ํ†ตํ•ด 256 ์ฑ„๋„์„ ๋งž์ถฐ์ค๋‹ˆ๋‹ค. ๊ฐ pyramid level์˜ feature map์„ 2๋ฐฐ๋กœ upsamplingํ•ด์ฃผ๋ฉด ๋ฐ”๋กœ ์•„๋ž˜ level์˜ feature map์™€ ํฌ๊ธฐ๊ฐ€ ๊ฐ™์•„์ง‘๋‹ˆ๋‹ค. upsampling์˜ ๊ฒฝ์šฐ nearest neighbor upsampling์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. upsampling๋œ feature map๊ณผ 1x1 conv layer๋กœ ์‚ฌ์ด์ฆˆ์™€ ์ฑ„๋„์ด ๊ฐ™์•„์ง„ feature map์„ element-wise addtion ์—ฐ์‚ฐ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ latertal connection์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์ดํ›„ 3x3 Conv layer๋ฅผ ์—ฐ์‚ฐ์„ ์ ์šฉ์— upsampling์˜ alias๋ฅผ ์ค„์ด๊ณ , {p2, p3, p4, p5} ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. โœ” FPN์˜ ๋ฐฉ์‹์€ single-scale๋ฅผ input์œผ๋กœ ๋ฐ›๊ธฐ์—, ๊ธฐ์กด์˜ ํ”ผ๋ผ๋ฏธ๋“œ ๋ชจ๋ธ๋ณด๋‹ค ํšจ์œจ์ ์ด๋ฉฐ, multi-scale feature map์„ ์ถœ๋ ฅํ•˜๊ธฐ์— ๋†’์€ detection ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. FPN์€ High-resolution map์€ low-level feature์„ ๊ฐ€์ง€์ง€๋งŒ ๋†’์€ ํ•ด์ƒ๋„๋กœ ์ •ํ™•ํ•œ ์œ„์น˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ High-resolution map์˜ ํŠน์ง•์„ element-wise addition์„ ํ†ตํ•ด ์ „๋‹ฌํ•ด, ์•ž์—์„œ ์–ธ๊ธ‰ํ•œ semantic gap๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Applcations

Feature Pyramid Networks for RPN

โœ” RPN ๋””์ž์ธ์€, 3x3 sliding window๋กœ ์ด๋ฃจ์–ด์ง„ ์ž‘์€ ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค. RPN์€ region proposal์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ๋„คํŠธ์›Œํฌ๋กœ์จ, ์ž์„ธํ•œ ๋‚ด์šฉ์€ Fast R-CNN ๋…ผ๋ฌธ ์ฐธ๊ณ  ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค. RPN์—์„œ ์ˆ˜ํ–‰ํ•˜๋Š” object/non-object classification๊ณผ bounding box regression์€ 3x3 conv layer์™€ ๊ฐ๊ฐ 1x1 conv layer๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๊ณ  ์šฐ๋ฆฌ๋Š” ์ด๊ฒƒ์„ "head"๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

โœ” ๋จผ์ € Backbone Network์—์„œ FPN ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•ด {32232^2,64264^2,1262126^2,2562256^2,5122512^2}๋ฅผ ๊ฐ€์ง€๋Š”{p2,p3,p4,p5,p6}๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ RPN์€ single-scale ์ด๋ฏธ์ง€๋ฅผ input์œผ๋กœ ๋ฐ›์ง€๋งŒ, FPN์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ multi-scale ์ด๋ฏธ์ง€๋ฅผ input์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— 5๊ฐœ์˜ multi-scale๊ณผ {1:2, 1:1, 2:1}์ธ 3๊ฐœ์˜ aspect ratios ๊นŒ์ง€ ์ด 15๊ฐœ์˜ anchors์„ ๊ฐ€์ง„๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ์ดํ›„ ์•ž์—์„œ ์–ธ๊ธ‰ํ•œ "head"์˜ ๊ณผ์ •์„ ๊ฑฐ์น˜๊ณ , NMS ๊ณผ์ •๊นŒ์ง€ ๊ฑฐ์ฒ˜ top 1000 region proposals์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

Feature Pyramid Networks for Fast R-CNN

โœ” 1000๊ฐœ์˜ region proposals์„ ์‚ฌ์šฉํ•˜์—ฌ RoI pooling์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Fast R-CNN๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ FPN์„ ์ ์šฉํ•œ Faster R-CNN์€ multi-scale feature map์„ ์‚ฌ์šฉํ•˜๊ธฐ ๊ฐ๊ฐ์˜ region proposals๊ณผ feature map์„ ๋งค์นญ ์‹œ์ผœ์ค˜์•ผํ•ฉ๋‹ˆ๋‹ค.

โœ” ์œ„์˜ ์‹์— ๋”ฐ๋ผ์„œ K๋ฒˆ์งธ feature map๊ณผ region proposals์„ ๋งค์นญํ•ฉ๋‹ˆ๋‹ค. w,h๋Š” region proposal์˜ width, height์— ํ•ด๋‹นํ•˜๋ฉด , k๋Š” ํ”ผ๋ผ๋ฏธ๋“œ์˜ level์˜ index, k0๋Š” target level์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ ๋งํ•˜๊ธธ "์ง๊ด€์ ์œผ๋กœ region proposal์˜ scale์ด ์ž‘์•„์งˆ ์ˆ˜๋ก High-resolution feature map์— ํ• ๋‹นํ•œ๋‹ค(๋‚ฎ์€ ํ”ผ๋ผ๋ฏธ๋“œ level)"๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค.

โœ” RoI pooling์„ ํ†ตํ•ด ์–ป์€ ๊ณ ์ •๋œ feature map์„ ํ•™์Šต์‹œํ‚ค๊ณ  NMS์„ ์ ์šฉํ•ด ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ResNet์‚ฌ์šฉํ•œ FPN + Faster R-CNN์˜ ๊ฒฐ๊ณผ๋Š” FPN์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์„ ๋•Œ๋ณด๋‹ค AP 8% ์ด์ƒ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.


Reference

profile
Maybe I will be an AI Engineer?
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€