[CV] EfficientDet Scalable and Efficient Object Detection review

๊ฐ•๋™์—ฐยท2022๋…„ 2์›” 10์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
11/17
post-custom-banner

๐ŸŽˆ ๋ณธ ๋ฆฌ๋ทฐ๋Š” EfficientDet ๋ฐ ๋ฆฌ๋ทฐ๋ฅผ ์ฐธ๊ณ ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Keywords

๐ŸŽˆ EfficientNet
๐ŸŽˆ BiFPN network
๐ŸŽˆ Weighted Feature Fusion
๐ŸŽˆ Compound Scaling

Introduction

โœ” ์˜ค๋Š˜์€ EfficientDet์— ๋Œ€ํ•ด ๋ฆฌ๋ทฐํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. EfficientDet์€ EfficientNet๋ฅผ backbone์œผ๋กœ ๋‘๊ณ  ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—, EfficientNet ๋…ผ๋ฌธ ํ˜น์€ ์ •๋ฆฌ๋œ ๊ธ€์„ ๋จผ์ € ์ฝ์œผ์‹œ๊ธธ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

โœ” ์ตœ๊ทผ ์ˆ˜๋…„๋™์•ˆ ๋‹ค์–‘ํ•œ ์ข‹์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง„ detection ๋ชจ๋ธ๋“ค์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ SOTA dectection model๋“ค์€ ์ ์  ๋” expensiveํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ฌด๊ฑฐ์šด ๋ชจ๋ธ๋“ค์€ ์ž์œจ ์ฃผํ–‰ ๋ฐ ์‹ค์ œ task์— ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— one-stage๋กœ ๊ตฌ์„ฑ๋œ ๋‹ค์–‘ํ•œ detection ์•„ํ‚คํ…์ฒ˜๋“ค์ด ๋ณด์—ฌ์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์•„ํ‚คํ…์ฒ˜๋“ค ๋˜ํ•œ ์˜ค์ง specific ๋˜๋Š” small range of resource๋ฅผ ๊ตฌ์„ฑํ•˜๋Š”๋ฐ๋งŒ ์ง‘์ค‘ํ•˜๋ฉฐ, ์‹ค์ œ ์–ด๋–ค ๋‹ค์–‘ํ•œ task๋“ค์— ๋Œ€ํ•ด์„œ๋Š” ์ƒ๊ฐํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

โœ” ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” EfficientDet์ด scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints ํ•œ ๋ชจ๋ธ์ด๋ผ๊ณ  ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. EfficientDet์€ one-stage detector paradigm์œผ๋กœ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ณธ ๋…ผ๋ฌธ์€ ๋‘ ๊ฐ€์ง€ ์ฑŒ๋ฆฐ์ง€์— ๋Œ€ํ•ด ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

1. efficient multi-scale feature fusion
2. model scaling

โœ” EfficientDet์€ FPN๊ธฐ๋ฐ˜์œผ๋กœ multi-scale feature fusion์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค๊ณผ๋Š” ๋‹ค๋ฅธ BiFPN network์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. BiFPN๋Š” ๊ธฐ์กด์˜ ๋‹จ์ˆœํ•˜๊ฒŒ ์—ฐ์‚ฐ๋œ feature fusion์—์„œ ๊ฐ ์ด๋ฏธ์ง€์˜ ๋‹ค๋ฅธ ํ•ด์ƒ๋„์˜ ํŠน์ง•์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ทธ ๋ถ€๋ถ„์„ ๋ฐ˜์˜ํ•œ ๋„คํŠธ์›Œํฌ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋˜ํ•œ Compound scaling ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด backbone, feature network, box/class predictions network์˜ ํ•ด์ƒ๋„/๊นŠ์ด/์ฑ„๋„ ์ˆ˜ scaling ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ์œ„์˜ ๊ทธ๋ž˜ํ”„์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด, ์ด์ „์˜ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ํšจ์œจ์ ์ด๋ฉด์„œ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.

โœ” ๊ฒฐ๊ณผ์ ์œผ๋กœ EfficientDet = EfficientNet(backbone) + BiFPN + compound scaling ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

BiFPN

โœ” Multi-scale feature fusion์€ FPN์„ ์‹œ์ž‘์œผ๋กœ ๊ณ„์†ํ•ด์„œ ๋ฐœ์ „ํ•ด์™”์Šต๋‹ˆ๋‹ค. ์ˆ˜์‹์„ ํ™•์ธํ•˜๋ฉด Pโƒ—in=(Pโƒ—l1in,Pโƒ—l2in..)\vec{P}_{in}= (\vec{P}^{in}_{l1},\vec{P}^{in}_{l2} ..) ์ด๋ฉฐ, Pโƒ—liin\vec{P}^{in}_{li}์€ feature level lil_i๋ฅผ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๋‹น์—ฐํ•˜๊ฒŒ๋„ ์œ„์˜ ๋…ผ๋ฌธ์—์„œ๋Š” ์–ด๋–ค ํ•จ์ˆ˜ ff๋ฅผ ์‚ฌ์šฉํ•ด ํšจ์œจ์ ์ธ Pโƒ—out=f(Pโƒ—in)\vec{P}_{out}= f(\vec{P}_{in}) ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

โœ” ๊ธฐ์กด์˜ FPN์˜ ๊ฒฝ์šฐ์—๋Š” ์œ„์˜ ์ˆ˜์‹์— ์˜ํ•ด์„œ ์ •์˜๋ฉ๋‹ˆ๋‹ค. ์•ž์„œ ๋ง์”€๋“œ๋ฆฐ ๊ฒƒ๊ณผ ๊ฐ™์ด ์œ„์™€๊ฐ™์ด feature fusion์€ ๋‹จ์ˆœํžˆ resize๋ฅผ ํ†ตํ•ด sumํ•œ ๊ฒƒ์ด๋ฏ€๋กœ ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„์˜ ํŠน์ง•์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

Cross-Scale Connections

โœ” ๊ธฐ์กด์˜ FPN์€ one-way ์ฆ‰, ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ๋งŒ ์ •๋ณด๊ฐ€ ํ๋ฆ…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด PANet(b)์˜ ๊ฒฝ์šฐ bottom-up path๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ดํ›„์—๋Š” NAS-FPN์ด๋ผ๋Š” Auto-ML์„ ์‚ฌ์šฉํ•œ (c) ๋ฐฉ๋ฒ•์ด ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ค‘ ๋™์ผํ•œ ์กฐ๊ฑด์—์„œ PANet์ด ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜์˜€๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ BiFPN์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋จผ์ € ๊ทธ๋ฆผ (d)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด ์ฒซ๋ฒˆ์งธ ๋…ธ๋“œ์™€ ๋งˆ์ง€๋ง‰ ๋…ธ๋“œ๋ฅผ ์ œ๊ฑฐํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์œ ๋Š” ์ง๊ด€์ ์œผ๋กœ ์œ„์˜ ๋…ธ๋“œ๋“ค์€ feature fusion์ด ์ผ์–ด๋‚˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋•๋ถ„์— ๋ชจ๋ธ์ด ๊ฐ„๋‹จํ•ด์กŒ์Šต๋‹ˆ๋‹ค.

โœ” ์ถ”๊ฐ€์ ์œผ๋กœ ์›๋ž˜ input์„ output์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ถ”๊ฐ€์ ์ธ feature fuse๊ฐ€ ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค. ๋งŽ์€ cost๊ฐ€ ๋“ค์–ด๊ฐ€์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฐ bidirectional path๋Š” ํ•˜๋‚˜์˜ feature layer network๋กœ์จ ์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ณตํ•จ์œผ๋กœ์จ ๋” ์ข‹์€ high-level feature๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Weighted Feature Fusion

โœ” ์•ž์„œ ๋ง์”€๋“ฏ์ด, ์ด์ „๊นŒ์ง€์˜ feature fushion์€ ๊ฐ ํ•ด์ƒ๋„์˜ ํŠน์ง•์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. BiFPN๋Š” ๊ฐ๊ฐ์˜ weight๋ฅผ ๊ฐ๊ฐ์˜ input์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์„ธ ๊ฐ€์ง€์˜ ๋ฒ„์ „์ด ์กด์žฌํ•˜๋ฉฐ ๊ทธ ์ค‘ ๊ฐ€์žฅ ๋น ๋ฅธ Fast normalized fusion๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋จผ์ € Unbound fusion์€ bound๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์ด ๋ถˆ์•ˆ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์œ„์˜ wiw_i๋Š” scala, vector, tensor ๋ชจ๋‘ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•˜์ง€๋งŒ ์ด ์ค‘ scala์ด ๊ฐ€์žฅ ์ ์€ cost๊ฐ€ ์‚ฌ์šฉ๋˜์–ด ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” Softmax๋ฅผ ํ™œ์šฉํ•ด ์‚ฌ์šฉํ–ˆ์ง€๋งŒ ์œ„์˜ ๋ฐฉ๋ฒ•์€ GPU๋ฅผ ๋Š๋ฆฌ๊ฒŒํ•˜๋Š” ์š”์ธ์ด ๋œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋”ฐ๋ผ์„œ ์œ„์˜ Fast normalizaed fusion์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” exp์—ฐ์‚ฐ์ด ๋“ค์–ด๊ฐ€์ง€ ์•Š์œผ๋ฉฐ, Softmax ๋ฐฉ๋ฒ•๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ 30% ๋” ๋น ๋ฅด๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ wiw_i๋Š” 0๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ์ž‘์œผ๋ฉฐ, ์ด๋Š” ์ด์ „์— ReLU ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•ด ์„ฑ๋ฆฝ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ฯต\epsilon = 0.0001์ด๋ฉฐ ์ด๋Š” ์ž‘์€ ๊ฐ’์˜ ์—ฐ์‚ฐ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.

โœ” BiFPN์— ๋Œ€ํ•œ ์ˆ˜์‹์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Level 6์— ๋Œ€ํ•œ ์ˆ˜์‹์ด๋ฉฐ ๊ทธ๋ž˜ํ”„ (d)์™€ ๋น„๊ตํ•œ๋‹ค๋ฉด ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” ๋˜ํ•œ ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, feature fusion์—์„œ depthwise separable convolution์™€ batch normalization ๊ทธ๋ฆฌ๊ณ  activation์„ ๊ฐ๊ฐ์˜ conv ๋’ค์— ์ถ”๊ฐ€ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

EfficientDet

โœ” ์ด๋ฒˆ ์ฑ•ํ„ฐ์—์„œ๋Š” network architecture์™€ new compound scaling์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. (new compound scaling๋ž€ detection task ๋งž์ถค compound scaling ๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.)

EfficientNet

โœ” ๋ณธ ์ณ…ํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ธฐ์— ์•ž์„œ, EfficientNet์— ๋Œ€ํ•ด ๊ฐ„๋‹จํ•˜๊ฒŒ ์„ค๋ช…ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

โœ” EfficientNet์—์„œ ์ œ์•ˆํ•˜๋Š” Compound Scaling์€ ๊ฐ„๋‹จํ•˜๊ฒŒ resoultion /depth/width(channel) ๋ชจ๋‘ ์ ๋‹นํ•œ ๋น„์œจ๋กœ ํ‚ค์šฐ๋ฉด ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

โœ” ๊ฐ๊ฐ์˜ ๊ฐ’์„ ฮฑโ‹…ฮฒ2โ‹…ฮณ2\alpha \cdot \beta^2 \cdot \gamma^2์ด 2์— ๊ฐ€๊น๊ฒŒ ์ง€์ •ํ•˜๊ณ , ฯ•\phi์˜ ๊ฐ’์„ ์กฐ์ •ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ ฯ•\phi ์กฐ์ •ํ•จ์œผ๋กœ์จ B0 ~ B7๊นŒ์ง€ ์‹คํ—˜ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ์œ„์™€ ๊ฐ™์€ ๊ณผ์ •์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ” ๋˜ํ•œ EfficientNet-B0 baseline network๋Š” ์œ„์™€ ๊ฐ™์ด ๊ตฌ์„ฑ๋˜์–ด์žˆ์œผ๋ฉฐ
MBConv์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

โœ” MBConv์€ MobileNet์—์„œ ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•์ด๋ฉฐ, Depthwise Conv์™€ Squeeze & excitation(SE)๋ฅผ ์‚ฌ์šฉํ•œ ๋„คํŠธ์›Œํฌ ์ž…๋‹ˆ๋‹ค.

EfficientDet Architecture

โœ” EfficientDet์€ one-stage detector์ด๋ฉฐ, backbone์œผ๋กœ๋Š” ImageNet์œผ๋กœ pretrained๋œ EfficientNet์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

โœ” BiFPN์€ {P3P_3, P4P_4, P5P_5, P6P_6, P7P_7 } feature network๋ฅผ ๋ฐ˜๋ณตํ•ด์„œ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ class and box network๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

โœ” ์ „์ฒด์ ์ธ network ๊ณผ์ •์€ ์œ„์˜ ์‚ฌ์ง„์œผ๋กœ ์‰ฝ๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Compound Scaling

โœ” ๋ณธ EfficientDet์—์„œ ์ค‘์š”ํ•œ ์ฑŒ๋ฆฐ์ง€ ์ค‘ ํ•˜๋‚˜๋Š” ์–ด๋–ป๊ฒŒ scail up ํ• ๊ฒƒ์ด๋ƒ ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ EfficientNet์˜ ๋ณด๋‹ค ๋” ๋งŽ์€ scaling dimensions์„ ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กœ์šด Compound Scaling์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ๋ณตํ•ฉ ๊ณ„์ˆ˜ ฯ•\phi๋ฅผ ์‚ฌ์šฉํ•ด backbone, BiFPN, class/box network ๋ฐ ํ•ด์ƒ๋„์˜ ๋ชจ๋“  ์น˜์ˆ˜๋ฅผ ๊ณต๋™์œผ๋กœ ์Šค์ผ€์ผ์—…ํ•˜๋Š” new Compound Scaling์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ Backbone network

โœ” Backbone์—์„œ๋Š” EfficientNet์˜ ๋ฐฉ๋ฒ•์„ ๊ทธ๋ž˜๋„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, width/depth์— ๋Œ€ํ•ด์„œ๋งŒ ๊ฐ™์€ ๊ณ„์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ BiFPN network

โœ” BiFPN์˜ DbifpnD_{bifpn}(#layers) ์ ์ฐจ์ ์œผ๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋ฉฐ, ์ž‘์€ ์ •์ˆ˜์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. WbifpnW_{bifpn}(#channels)๋Š” ๊ธฐํ•˜ ๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋ฉฐ ์ด๋Š” EfficientNet ๊ณผ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  grid search๋ฅผ ์‚ฌ์šฉํ•ด width์˜ ์ตœ์ ์˜ ๊ณ„์ˆ˜๋Š” 1.35๋ฅผ ์ฐพ์•˜๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ Box/class prediction network

โœ” BiFPN๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•˜๋ฉฐ Depth์˜ ๊ฒฝ์šฐ์—๋งŒ ์œ„์˜ ์‹์„ ์ ์šฉํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ Input image resolution

โœ” ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„๋Š” ๊ธฐ์กด์˜ Backbone network ๋ฐฉ๋ฒ•๊ณผ ์กฐ๊ธˆ ๋‹ค๋ฅด๊ฒŒ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค. BiFPN์—์„œ fusion feature(level 3-7) ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด์กฐ๊ฑด 27=1282^7 = 128์œผ๋กœ ๋‚˜๋ˆ ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Experiments

EfficientDet for Object Detection

โœ” ์ด๋ฒˆ ์ณ…ํ„ฐ์—์„ ๋Š detection task์— ๋Œ€ํ•œ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ์†Œ๊ฐœํ•ด ๋“œ๋ฆฌ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

โœ” ๊ธฐ๋ณธ์ ์œผ๋กœ D0 ~ D7๊นŒ์ง€๋Š” ์œ„์˜ ํ‘œ๋ฅผ ํ™•์ธํ•˜์‹œ๋ฉด ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

โœ” ๊ฐ๊ฐ์˜ D0 ~ D7๊นŒ์ง€ ๋น„์Šทํ•œ ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฐ๊ณผ์—์„œ EfficientDet์ด ์ข‹์€ ์ •ํ™•๋„์™€ ํšจ์œจ์„ฑ ๋ฐ ์—ฐ์‚ฐ์†๋„ ๋“ฑ๋“ฑ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

โœ” EfficientDet์˜ inference๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ•˜๋Š” ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๊ธฐ์กด์˜ ๋‹ค๋ฅธ ๋ชจ๋ธ๋ณด๋‹ค ๋”์šฑ ์ •ํ™•ํ•˜๊ณ  ๋†’์€ ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐ŸŽˆ ์˜ค๋Š˜์€ EfficientDet์„ ๋ฆฌ๋ทฐํ•ด๋ดค์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋งค์šฐ ์ข‹์€ ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์คฌ์Šต๋‹ˆ๋‹ค.


Reference

profile
Maybe I will be an AI Engineer?
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€