[CV] Feature Pyramid Networks for Object Detection(RetinaNet) review

๊ฐ•๋™์—ฐยท2022๋…„ 1์›” 29์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
8/17
post-custom-banner

๐ŸŽˆ ๋ณธ ๋ฆฌ๋ทฐ๋Š” RetinaNet ๋ฐ ๋ฆฌ๋ทฐ๋ฅผ ์ฐธ๊ณ ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Key Words

๐ŸŽˆ Focal Loss
๐ŸŽˆ One-Stage

Introduction

๐Ÿ‘จโ€๐Ÿซ RetinaNet์€ Focal Loss๋ฅผ ์‚ฌ์šฉํ•œ One-stage detector๋กœ์จ ํ•ต์‹ฌ์€ focal loss๋ฅผ ์‚ฌ์šฉํ•ด "easy negative"์˜ Loss ๊ธฐ์—ฌ๋„๋ฅผ ์ค„์—ฌ, "hard negative" ๋” ๋งŽ์€ ๊ธฐ์—ฌ๋„๋ฅผ ๋†’์—ฌ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์ฃผ๋Š” ๋„คํŠธ์›Œํฌ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” ์ตœ๊ทผ์˜ SOTA detectors๋Š” two-stage๋ฅผ based๋กœ ๊ตฌ์„ฑํ•œ ๋„คํŠธ์›Œํฌ๋“ค ์ž…๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ R-CNN ๊ณ„์—ด๋“ค์˜ ๋ชจ๋ธ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. fisrt-stage์—์„œ๋Š” ๊ฐ์ฒด ํ›„๋ณด๊ตฐ๋“ค์„ ์ฐพ์œผ๋ฉฐ(ex RPN, Seletive Search), two-stage์—์„œ๋Š” ๊ฐ ํ›„๋ณด ์œ„์น˜๋ฅผ foreground class ๋˜๋Š” background class๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์œ„์™€ ๊ฐ™์€ FPN, Mask R-CNN ๋“ฑ๋“ฑ์— ๋ฒ„๊ธˆ๊ฐ€๋Š” COCO Ap๋ฅผ ๊ฐ€์ง€๋Š” one-stage detector ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์œ„์™€ ๊ฐ™์€ ์„ฑ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ one-stage dector์—์„œ training ์ค‘ class imbalance๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๊ณ , ์ƒˆ๋กœ์šด loss function์„ ํ†ตํ•ด imbalanc๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๊ธฐ์กด์˜ two-stage detector์€ two-cascde and sampling heuristics๋ฅผ ์ ์šฉํ•ด class imbalance๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ one-stage detector์˜ ๊ฒฝ์šฐ์—๋Š” ๋งŽ์€(~100k) ํ›„๋ณด๊ตฐ์„ ์ถ”์ถœํ•˜๋Š”๋ฐ, ์ด๊ฒƒ์ด class imbalance๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ์‹ค์ œ ์ด๋ฏธ์ง€์— ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ์ฒด๋Š” ์†Œ์ˆ˜์ด๋ฉด์„œ, ๋Œ€๋ถ€๋ถ„์˜ ๋ฐฐ๊ฒฝ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋งŽ์€ ํ›„๋ณด ๊ฐ์ฒด๋“ค ์ค‘ ๋Œ€๋ถ€๋ถ„์ด ๋ฐฐ๊ฒฝ์„ ์˜๋ฏธํ•˜๊ธฐ์— ๋ถˆ๊ท ํ˜•์„ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, Focal Loss ๋ผ๋Š” ์ƒˆ๋กœ์šด loss function์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. Focal Loss๋Š” train์‹œ ๋” ๋†’์€ ์ •ํ™•๋„์™€ one-stage detecotor์—์„œ์˜ ์ค‘์š”ํ•œ ๊ธฐ๋Šฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ง๊ด€์ ์œผ๋กœ train ์ค‘ easy examples๋ฅผ down-weighting ์‹œ์ผœ, ๋‚˜๋จธ์ง€ hard example์˜ ์ค‘์š”๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.

โœ” ์œ„์˜ Focal loss๋ฅผ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด, RetinaNet์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๊ฐ„๋‹จํ•œ one-stage object detector ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. RetinaNet๋Š” ํšจ๊ณผ์ ์ด๊ณ  ์ •ํ™•ํ•˜๋ฉฐ, ResNet 101-FPN backbone์„ ์‚ฌ์šฉํ•ด COOO test-dev์—์„œ 39.1 AP์˜ ์„ฑ๋Šฅ๊ณผ 5fps์˜ ์†๋„๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Focal Loss

โœ” Focal loss๋Š” one-stage detector์˜ foreground์™€ background์˜ class imbalance๋ฅผ ํ•ด๊ฒฐํ•ด ์ค๋‹ˆ๋‹ค.

Yโˆˆ[ยฑ1]:Y \in [\pm 1]: ground truth class
pโˆˆ[0,1]:p \in [0, 1]: 1์ด๋ผ๊ณ  ์˜ˆ์ธกํ•œ ํ™•๋ฅ 

โœ” Focal loss์„ ์ด์•ผ๊ธฐํ•˜๊ธฐ ์ „, cross entropy(CE) for binary classification์— ๋Œ€ํ•ด ์œ„์™€ ๊ฐ™์ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฐธ๊ณ ๋กœ multi-classification์œผ๋กœ๋„ ํ™•์žฅํ•˜๋ฉด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ์œ„์™€ ๊ฐ™์€ ํ‘œํ˜„์œผ๋กœ ๋‹ค์‹œ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” ์œ„์˜ ํŒŒ๋ž€์ƒ‰ ์„ ์€ ฮณ\gamma = 0 , ์ฆ‰ CE๋ผ๊ณ  ์˜๋ฏธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ฮณ\gamma ๊ฐ’์ด ๋ฐ”๋€Œ๋ฉด์„œ ๊ทธ๋ž˜ํ”„๊ฐ€ ๋ฐ”๋€Œ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Balanced Cross Entropy

โœ” Class imbalance๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•œ ๊ฐ€์ง€ ์ปค๋จผํ•œ ๋ฐฉ๋ฒ•์€ ๊ฐ€์ค‘์น˜ ฮฑโˆˆ[0,1]\alpha \in [0, 1]์„ class 1์ธ ๊ฒฝ์šฐ์— 1 - ฮฑ\alpha๋ฅผ class -1์— ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ๋ฐฉ๋ฒ•์€ positive/negative example ๋ฌธ์ œ์—๋Š” ์˜ํ–ฅ์„ ์ฃผ์ง€๋งŒ, easy/hard negative์— ๋Œ€ํ•ด์„  ์˜ํ–ฅ์„ ์ฃผ์ง€ ๋ชปํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Focal Loss Definition

โœ” ๋Œ€๋ถ€๋ถ„์˜ easy classified nagatives loss๊ฐ€ gradient์˜ ๋Œ€๋ถ€๋ถ„ ์ง€๋ฐฐํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.(class imbalance) ์œ„์˜ balance cross entropy๊ฐ€ easy/hard negative ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†์–ด, ์ƒˆ๋กœ์šด modulating factor((1โˆ’pt)ฮณ(1-p_t)^\gamma)๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. Focal loss๋Š” ์œ„์˜ ์‹์˜๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” ์˜ˆ๋ฅผ ๋“ค๋ฉด ptp_t ๊ฐ’์ด ์ž‘๊ณ  ์ž˜๋ชป ์˜ˆ์ธกํ–ˆ๋‹ค๋ฉด, ์œ„์˜ modulating factor๋Š” 1์— ๊ฐ€๊นŒ์šฐ๋ฉฐ, loss์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์˜ˆ์‹œ๋กœ ptp_t๊ฐ€ 1์— ๊ฐ€๊นŒ์šฐ๋ฉด, factor๋Š” 0์— ๊ฐ€๊นŒ์›Œ ์งˆ ๊ฒƒ์ด๊ณ , well-classified๋œ ์˜ˆ์‹œ์˜ loss๋Š” down-weight ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

โœ” ๋˜ ๋‹ค๋ฅธ ์˜ˆ์‹œ๋กœ ฮณ=2\gamma = 2์ด๊ฑฐ, pt=0.9p_t = 0.9์ด๋ฉด CE์— ๋น„ํ•ด 100๋ฐฐ ๋‚ฎ์€ loss๋ฅผ ๊ฐ€์ง€๊ณ , pt=0.968p_t = 0.968์ด๋ฉด 1000๋ฐฐ ๋‚ฎ์€ loss ๊ฐ’์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

โœ” ์œ„์˜ ์‹๊ณผ ๊ฐ™์ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ฮฑ\alpha-balance ๊ฐ’์„ ์ถ”๊ฐ€ํ•ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•˜๊ฒŒ ์œ„์˜ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Class Imbalance and Model Initialization

โœ” Binary classification models์€ y = -1 or 1์— ์ƒ๊ด€์—†์ด ์ถœ๋ ฅ ํ™•๋ฅ ์ด ๊ฐ™๋„๋ก ์ดˆ๊ธฐํ™” ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ดˆ๊ธฐํ™”๋Š”, ์†์‹ค์ด ์ „์ฒด ์†์‹ค์„ ์ง€๋ฐฐํ•  ์ˆ˜ ์žˆ์–ด ๋ถˆ์•ˆ์ •ํ•œ ์ดˆ๊ธฐ traing ์ดˆ๋ž˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„์˜ ๋ฐฉ๋ฒ•์„ ๋ง‰๊ธฐ์œ„ํ•ด prior(=p)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. p๊ฐ’์€ rare class์— ์˜ํ•ด ์ถ”์ •๋œ ๊ฐ’์œผ๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

Class Imbalance and Two-stage Detectors

โœ” Two-stage detector์˜ ๊ฒฝ์šฐ์—๋Š” CE๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๋Œ€์‹  2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ถˆ๊ท ํ˜•์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
(1) two-stage cascade, (2) biased minibatch sampling์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. cascade ๋ฐฉ๋ฒ•์€ proposal์˜ ์ˆ˜๋ฅผ ์•ฝ ์ฒœ๊ฐœ์— ๊ฐ€๊น๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ๊ฑด proposal์„ ์ค„์ด๋Š” ๊ฒƒ์„ ๋žœ๋ค์ด ์•„๋‹Œ ์ž„์˜๋กœ ์„ ํƒํ•ด์„œ ์ง„ํ–‰ํ•˜๋Š”๋ฐ, ์œ„์˜ ๊ณผ์ •์—์„œ easy negative๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

RetinaNet Dectector

โœ” RetinaNet์€ One-stage detector๋กœ์จ backbone + two task specific subnetworks๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ subnet์€ object classification๊ณผ bounding box๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

Feature Pyramid Network Backbone

โœ” FPN์€ top-down pathway + lateral connections์„ ์‚ฌ์šฉํ•˜๊ณ , single resolutions์„ ๋ฐ›์•„ multi-scale feature pyramid๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

โœ” p3p_3 ~ p7p_7์˜ ํ”ผ๋ผ๋ฏธ๋“œ ๊ตฌ์กฐ๋ฅผ ์ด๋ฃจ๋ฉฐ ๊ฐ๊ฐ 256์˜ ์ฑ„๋„์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ œ์‚ฌํ•œ ๋‚ด์šฉ์€ FPN ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•ด์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Anchors

โœ” ์œ„์˜ anchor box๋“ค์˜ ์‚ฌ์ด์ฆˆ๋Š” p3p_3 ~ p7p_7๋ฅผ ๊ณ ๋ คํ•ด 32 ~ 512์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฐ๊ฐ์˜ ํ”ผ๋ผ๋ฏธ๋“œ ๋ ˆ๋ฒจ์—์„œ 3๊ฐœ์˜ aspect ratios์™€ 3๊ฐœ์˜ size๋ฅผ ์ถ”๊ฐ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. (์ด 9๊ฐœ์˜ anchors)

โœ” ๊ฐ anchor์—๋Š” class(=K)๊ฐœ์˜ one-hot vector๊ฐ€ ํ• ๋‹น๋˜๊ณ  4๊ฐœ์˜ bbox ๊ฐ’์ด ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  anchor๋“ค์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด ์•„๋‹Œ IoU > 0.5 ์ด์ƒ์ธ ๊ฐ’๋“ค๋งŒ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, 0 <= IoU < 0.4์˜ ๊ฐ’๋“ค์€ background๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด์™ธ์˜ anchor๋“ค์€ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.

Classification & Box Regression Subnet

โœ” Classification subnet์—์„œ๋Š” ๊ฐ๊ฐ์˜ ๊ณต๊ฐ„ ์œ„์น˜์— A anchors์™€ K object class์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ subnet์€ FPN level์— ์ž‘์€ FCN์„ ์ถ”๊ฐ€ํ•œ ๊ฒƒ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” Class subnet์€ ์œ„์™€ ๊ฐ™์€ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ RPNs๋Š” ๋Œ€์กฐ์ ์œผ๋กœ, ์œ„์˜ classification subnet์€ ๋” ๊นŠ์œผ๋ฉฐ, ์˜ค์ง 3x3 conv๋งŒ ์‚ฌ์šฉํ•˜๊ณ , box regression๊ณผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

โœ” Box regression Subnet์€ object classification๊ณผ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋˜๋ฉฐ, ๋˜๋‹ค๋ฅธ ์ž‘์€ FCN๊ตฌ์กฐ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ๋งˆ์ง€๋ง‰ ๊ตฌ์กฐ์™€ ๊ฐ™์ด 4A๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” class-agnostic bbox regressor์„ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Inference and Training

โœ” RetinaNet์€ ResNet-FPN backbone + two subnet with FCN์˜ ๊ตฌ์กฐ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. inference์‹œ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด FPN์—์„œ ๊ฐ€์žฅ ๋†’์€ 1000๊ฐœ ์ค‘ 0.05 ์ด์ƒ์˜ confidence ๊ฐ’๋“ค๋งŒ ์ถ”์ถœํ•ด ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค. final detection์—์„œ 0.5์ด์ƒ์˜ threshold๋กœ NMS๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

โœ” ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Focal Loss์— ๋Œ€ํ•ด ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. Focal loss๋Š” classification subnet์˜ ๊ฒฐ๊ณผ๋กœ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, ฮณ=2\gamma = 2๋กœ ์„ค์ •ํ•˜๊ณ  ์ง„ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ณธ RetinaNet์—์„œ์˜ focal loss๋Š” all ~ 100k์˜ anchor๋“ค์— ๋Œ€ํ•ด ๊ณ„์‚ฐ์„ ์ง„ํ–‰ํ–ˆ๋Š”๋ฐ, ์ด๋Š” ์ด์ „์˜ RPN์ด๋‚˜ OHEM์—์„œ ์ž‘์€ ์…‹์˜ ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ๋Š” ๋Œ€์กฐ์ ์ž…๋‹ˆ๋‹ค.

โœ” ์ด๋ฏธ์ง€์˜ ์ „์ฒด focal loss์€ ๋ชจ๋“  ~100k anchor์˜ ๋Œ€ํ•œ focal loss์˜ ํ•ฉ์œผ๋กœ ๊ณ„์‚ฐ๋˜๋ฉฐ, GT๋กœ ํ• ๋‹น๋œ anchor์˜ ์ˆ˜๋กœ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค. ฮณ\gamma์™€ ฮฑ\alpha์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์€ ์œ„์˜ ํ‘œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” Backbone(ResNet)์„ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, FPN์— ์ถ”๊ฐ€์ ์œผ๋กœ layer์— ๋Œ€ํ•ด ์ดˆ๊ธฐํ™”๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ layer ์ œ์™ธํ•œ ๋ชจ๋“  ์ถ”๊ฐ€์ ์ธ layer์— ๋Œ€ํ•ด bias b = 0์œผ๋กœ weight๋Š” ฯƒ=0.01\sigma = 0.01์„ ๊ฐ€์ง€๋Š” ๊ฐ€์šฐ์‹œ๊ฐ„ ๋ถ„ํฌ๋กœ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ conv layer์˜ bias์— ๋Œ€ํ•ด b=โˆ’log((1โˆ’ฯ€))/ฯ€b = -log((1- \pi))/\pi ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ฯ€=0.01\pi = 0.01๋ฅผ ์ฃผ๊ณ  ๋ชจ๋“  ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” Optimization์˜ ๊ฒฝ์šฐ SGD๋ฅผ ์‚ฌ์šฉํ•ด ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. Retinanet์€ 8๊ฐœ์˜ GPU๋ฅผ ๋™๊ธฐํ™”ํ•ด ์‚ฌ์šฉํ–ˆ๊ณ  ๊ฐ ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋‹น 16๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด 90K์˜ iteration์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ดˆ๊ธฐ learning rate๋Š” 0.01๋กœ 60K์— 0.001๋กœ 80k์— 0.0001๋กœ ์ง„ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. Weight decay๋Š” 0.0001๋กœ momentum์€ 0.9๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ training loss = focal loss + standard smooth L1 loss(box regression) ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ‘จโ€๐Ÿซ ์ด ํ›„์˜ ๋‚ด์šฉ๋“ค์€ ๋Œ€๋ถ€๋ถ„ ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ด์•ผ๊ธฐ๋“ค ์ž…๋‹ˆ๋‹ค. ๊ถ๊ธˆํ•˜์‹œ๋ฉด ๋…ผ๋ฌธ ์ฐธ๊ณ  ๋ถ€ํƒ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.


Reference

profile
Maybe I will be an AI Engineer?
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€