[CV] Faster R-CNN review

๊ฐ•๋™์—ฐยท2022๋…„ 1์›” 14์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
3/17

๐ŸŽˆ ๋ณธ ๋ฆฌ๋ทฐ๋Š” Faster R-CNN ๋…ผ๋ฌธ ๋ฐ ๋ฆฌ๋ทฐ ๋“ฑ์„ ์ฐธ๊ณ ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฐธ๊ณ ํ•œ reference๋Š” ๊ธ€ ๋งˆ์ง€๋ง‰ ๋‹จ์— ๊ฐœ์‹œํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Faster R-CNN์˜ ๊ตฌ์กฐ

Key Words

๐ŸŽˆ RPN
๐ŸŽˆ Anchor box
๐ŸŽˆ Share Features
๐ŸŽˆ time cost-efficient
๐ŸŽˆ Fast R-CNN
๐ŸŽˆ Loss Function

Introduction

โœ” ์ตœ๊ทผ Region-based CNN(R-CNN)์€ object detection ๋ถ„์•ผ์—์„œ ์„ฑ๊ณต์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค. Fast R-CNN์€ R-CNN์—์„œ ๋” ๋‚˜์•„๊ฐ€ ๊ฑฐ์˜ real-time rate์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ์„ฑ๊ณผ๋ฅผ ๋ณด์—ฌ์คฌ์Šต๋‹ˆ๋‹ค. (region-proposal์˜ ์‹œ๊ฐ„์€ ์ œ์™ธ) ๊ฒฐ๊ตญ ๋ฌธ์ œ๋Š” region-proposal์˜ ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊นŒ?. region-proposal์˜ ๊ฐ€์žฅ ํฐ ๋ฌธ์ œ๋Š” CPU๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ํ•ด๊ฒฐ๋ฐฉ์•ˆ์€ ๊ฐ„๋‹จํ•œ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค. region-proposal์„ GPU๋ฅผ ์‚ฌ์šฉํ•ด ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊ฒ๋‹ˆ๋‹ค.

โœ” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ํ•ด๊ฒฐ๋ฐฉ์•ˆ์€ RPN(Regions Proposal Network)์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํ•ต์‹ฌ์ด ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. RPN์€ Fast R-CNN์™€ conv layer์„ ๊ณต์œ ์„ ํ•จ์œผ๋กœ์จ ์‹œ๊ฐ„๋น„์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

โœ” Faster R-CNN = RPN + Fast R-CNN ๋ผ๊ณ  ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ๋ฐฉ๋ฒ•๋ก ์€ object detection์˜ ์ •ํ™•๋„์™€ ๋น„์šฉ ํšจ์œจ์ ์ธ๋ถ€๋ถ„์—์„œ ํšจ๊ณผ์ ์ด๋ผ๊ณ  ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค.

Faster R-CNN

  • Faster R-CNN์˜ 2๊ฐ€์ง€ ๊ตฌ์„ฑ์š”์†Œ
    โœ” Deep Fc layer that proposes regions
    โœ” Fast R-CNN detector that use proposes regions

Region Proposal Networks

** herbwood์˜ velog RPN์— ๋Œ€ํ•ด ๊น”๋”ํ•˜๊ฒŒ ์„ค๋ช…๋˜์–ด ์žˆ๋Š” ๋ธ”๋กœ๊ทธ ์ž…๋‹ˆ๋‹ค. ์ดํ•ด๊ฐ€ ์•ˆ๋˜์‹ ๋‹ค๋ฉด ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

  • Input: Image(any size)
  • Output: a set of retangular object proposals, each with an objectness score
  • RPN์€ Fast R-CNN๊ณผ ๊ณ„์‚ฐ์„ ๊ณต์œ  ํ•ด์•ผํ•˜๊ธฐ์— Fully Connet Layer๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

Anchors

โœ” CNN์— ๋ชจ๋ธ์—์„œ ๋„์ถœ๋œ feature map์˜ ํฌ๊ธฐ๋Š” ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ง์€ ์ฆ‰ ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. RPN์—์„œ๋Š” ์œ„์˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์„œ๋กœ ๋‹ค๋ฅธ 3 scales์™€ 3 aspect ratios์„ ๊ฐ€์ง„ Anchor boxes(=k)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” 9๊ฐœ์˜ Anchor boxes๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ „ํ˜•์ ์œผ๋กœ ์ด W x H x K๊ฐœ์˜ Anchor ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. (W x H = feature map์˜ size)

โœ” Anchor box๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ reg layer์—๋Š” 4k์˜ ouput๊ณผ cls layer์—๋Š” 2k์˜ output์ด ๋„์ถœ๋ฉ๋‹ˆ๋‹ค.

โœ” ๊ฐ Anchor์— ๋Œ€ํ•ด ๋‘ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

  • Positive: IoU๊ฐ€ 0.7๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜, Ground Truth box๋งˆ๋‹ค ๊ฐ€์žฅ ํฐ Anchor ํ•˜๋‚˜.
  • Negative: IoU๊ฐ€ 0.3๋ณด๋‹ค ์ž‘์œผ๋ฉด negative๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์™ธ์˜ ๋‚˜๋จธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

RPN Process

(1). VGG๋ฅผ ํ†ตํ•ด ํ•™์Šต๋œ Feature map(hxwx512)๋ฅผ ์–ป๊ณ , 3x3์˜ conv layer ์—ฐ์‚ฐ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ์‚ฐ ์ง„ํ–‰ ์‹œ Feature map์˜ ํฌ๊ธฐ๋ฅผ ์œ ์ง€์‹œํ‚ค๊ธฐ ์œ„ํ•ด padding์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

(2). ๊ฐ๊ฐ์˜ bbox regression layer์™€ classification layer์˜ ์—ฐ์‚ฐ์„ ์œ„ํ•ด 1 x 1 conv ์—ฐ์‚ฐ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์ถœ๋ ฅ๋˜๋Š” Feature map ์ปค๋„ ์ˆ˜๋Š” classification layer์˜ ๊ฒฝ์šฐ 2 x 9, bbox regression layer์˜ ๊ฒฝ์šฐ 4 x 9 ๊ฐ€ ๋˜๋„๋ก ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

(3). ๊ฒฐ๊ณผ์ ์œผ๋กœ W x H x K์˜ region-proposals์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค. (W x H = feature map์˜ size) ์ด ํ›„ predict๋œ ๊ฒฐ๊ณผ๋Š” Non-maximum Suppression & RoI sampling์„ ๊ฑฐ์ณ Fast R-CNN์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Multi-task Loss Function

ii: mini-batch ๋‚ด์˜ anchor์˜ index
pip_i: anchor ii๊ฐ€ Object์— ํฌํ•จ๋  ์˜ˆ์ธกํ™•๋ฅ 
piโˆ—p^*_i: anchor๊ฐ€ positive๋ฉด 1, negative๋ฉด 0
tit_i: ์˜ˆ์ธก bounding box์˜ ํŒŒ๋ผ๋ฏธํ„ฐํ™”๋œ ์ขŒํ‘œ
tiโˆ—t^*_i: ground truth box์˜ ํŒŒ๋ผ๋ฏธํ„ฐํ™”๋œ ์ขŒํ‘œ
LclsL_{cls}: Classification loss
LregL_{reg}: Smooth L1 Loss (R(tit_i - tiโˆ—t^*_i))
NclsN_{cls}: mini-batch์˜ ํฌ๊ธฐ
NregN_{reg}: ์ •๊ทœํ™”๋œ anchor์˜ ์ˆ˜
ฮป\lambda: balance parameter

โœ” Classification์€ object ์—ฌ๋ถ€๋งŒ์„ ๋ถ„๋ฅ˜ํ•˜๊ณ , bbox-regression์€ Ground Truth box ์œ„์น˜๋ฅผ ํ†ตํ•ด ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.(์ž์„ธํ•œ ๋‚ด์šฉ์€ Fast R-CNN ๋…ผ๋ฌธ ์ฐธ๊ณ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.)

Training RPNs

** herbwood์˜ velog Traing ๊ณผ์ •์— ๋Œ€ํ•ด ๊น”๋”ํ•˜๊ฒŒ ์„ค๋ช…๋˜์–ด ์žˆ๋Š” ๋ธ”๋กœ๊ทธ ์ž…๋‹ˆ๋‹ค. ์ดํ•ด๊ฐ€ ์•ˆ๋˜์‹ ๋‹ค๋ฉด ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

โœ” RPN์€ ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด end-to-end ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, stochastic gradient descent(SGD)๋ฅผ ์‚ฌ์šฉํ•ด ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.

โœ” ๋žœ๋คํ•˜๊ฒŒ 256 anchors sample์„ mini-batch๋กœ ์„ ์ •ํ•˜๊ณ  ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. sampling ๋œ anchors๋“ค์€ positive์™€ negative ๋น„์œจ์„ 1:1๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

โœ” ์ดˆ๊ธฐ weights ~ Gaussian(0,0.01), momentum = 0.9, weight decay = 0.0005๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

Sharing Features for RPN and Fast R-CNN

โœ” RPN๊ณผ Fast R-CNN์€ ๋…๋ฆฝ์ ์œผ๋กœ ์ดํ–‰๋˜๊ธฐ์— ํ•ฉ์ณ์ฃผ๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๊ณผ์ •์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค.

4-Step Alternating Training

(1). Image-pre-trained model๋กœ ์ดˆ๊ธฐํ™”ํ•ด RPN์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

(2). ํ•™์Šต๋œ RPN์—์„œ ์‚ฌ์šฉ๋œ proposals์„ ์‚ฌ์šฉํ•ด detection network์ธ Fast R-CNN์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. Fast R-CNN ์—ญ์‹œ Image-pre-trained model๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

(3). RPN training์„ ํ•™์Šต๋œ detection network๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, ๊ณต์œ ๋˜๊ณ  ์žˆ๋Š” conv layer์€ ๊ณ ์ •์‹œํ‚จ ํ›„, ๋‚˜๋จธ์ง€๋Š ๊ณ ์œ ํ•œ RPN์˜ layer๋งŒ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

(4). ๊ณต์œ  conv layer์€ ๊ณ ์ •์‹œํ‚จ ํ›„ ๊ณ ์œ ํ•œ Fast R-CNN์˜ layer๋งŒ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.


Reference

profile
Maybe I will be an AI Engineer?

0๊ฐœ์˜ ๋Œ“๊ธ€