[CV] YOLOv4: Optimal Speed and Accuracy of Object Detection review

๊ฐ•๋™์—ฐยท2022๋…„ 2์›” 22์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
13/17
post-custom-banner

๐ŸŽˆ ๋ณธ ๋ฆฌ๋ทฐ๋Š” YOLOv4 ๋ฐ ๋ฆฌ๋ทฐ๋ฅผ ์ฐธ๊ณ ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Keywords

๐ŸŽˆ Bag of freevies and Bad of specials
๐ŸŽˆ CSPDarknet53 + SPP + PAN + YOLOv3
๐ŸŽˆ Optimize for Single GPU

๐ŸŽˆ YOLOv4 ๋ฆฌ๋ทฐ์— ์•ž์„œ YOLO v4๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋Š”์ง€ ๋จผ์ € ๋ง์”€๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค. YOLO v4์˜ ๊ฒฝ์šฐ์—๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ๊ธฐ์กด์— ์žˆ๋˜ ๋ฐฉ๋ฒ•๋ก ์„ ์‚ฌ์šฉํ•ด Single GPU ํ™˜๊ฒฝ์—์„œ ์ตœ์ ํ™”๋ฅผ ์‹œํ‚จ ๋„คํŠธ์›Œํฌ๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽˆ YOLOv4 ๋…ผ๋ฌธ์—๋Š” ๋„ˆ๋ฌด๋‚˜๋„ ๋งŽ์€ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ์ „๋ถ€๋ฅผ ๋‹ค ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค(๊ฐ๊ฐ์˜ ๋ฐฉ๋ฒ•๋“ค์ด ์ „๋ถ€ ๋…ผ๋ฌธ ํ•œ ํŽธ์ด๊ธฐ์—...) ๊ฒฐ๋ก ์ ์œผ๋กœ ์œ„์—์„œ ์–ธ๊ธ‰๋œ ๋ฐฉ๋ฒ•์ด ์‚ฌ์šฉ๋ฌ๋‹ค๋Š” ๊ฒƒ์ด๊ธฐ์— ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•๋ก ๊ณผ ์ถ”๊ฐ€์ ์œผ๋กœ ์‚ฌ์šฉ๋œ BoF, BoS ๋งŒ ์•Œ๋ฉด ๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

Introduction

๐ŸŽˆ ๋ณธ ๋…ผ๋ฌธ์˜ Introduction์—์„œ๋Š” ์ตœ๊ทผ ์†Œ๊ฐœ๋˜๋Š” ์ •ํ™•ํ•œ ๋„คํŠธ์›Œํฌ๋“ค์€ real-time์œผ๋กœ ์‚ฌ์šฉ๋˜๊ธฐ ์–ด๋ ต๊ฑฐ๋‚˜, ๋งŽ๊ณ  ์ข‹์€ GPU๊ฐ€ ํ•„์š”๋กœ ํ•˜๋‹ค๋Š” ๊ฒƒ์€ ์ด์•ผ๊ธฐ ํ•˜๋ฉด์„œ YOLO v4๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ YOLO v4๋Š” ์œ„์˜ ๊ทธ๋ž˜ํ”„์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด, ๊ธฐ์กด์˜ YOLO v3๋ณด๋‹ค ๋‘ ๋ฐฐ ์ด์ƒ ๋น ๋ฅด๋ฉฐ, ์ •ํ™•๋„๋„ ํ›จ์”ฌ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ธฐ์กด์˜ ๋„คํŠธ์›Œํฌ ๋ณด๋‹ค ๋น ๋ฅธ ์†๋„๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽˆ ์œ„์˜ ์ง€๋ฌธ์ด YOLOv4์˜ Contribution์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์ด ์•„๋‹Œ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค์„ ์ตœ์ ํ™” ํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Related work

๐ŸŽˆ ์œ„ ์‚ฌ์ง„์—์„œ Object detector์— ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Object detection models

๐ŸŽˆ Object detection ์ผ๋ฐ˜์ ์ธ ๊ตฌ์กฐ๋ฅผ Backbones, Neck, Heads๋กœ ๊ตฌ๋ถ„ํ•ด ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ง€๊ธˆ๊นŒ์ง€ ์ œ์‹œ๋œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ๊น”๋”ํ•˜๊ฒŒ ์ •๋ฆฌํ•ด์คฌ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. Object detection ๋…ผ๋ฌธ๋“ค์„ R-CNN๋ถ€ํ„ฐ ์ฝ์œผ์…จ๋‹ค๋ฉด, ์ „๋ถ€๋Š” ์•„๋‹ˆ์—ฌ๋„ ์–ด๋–ค ๋ฐฉ๋ฒ•๋ก ์ธ์ง€ ์–ด๋Š์ •๋„ ์•Œ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. (์ € ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ์ ˆ๋ฐ˜์ •๋„ ์ฝ์–ด๋ณธ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค)

Bag of Freebies

๐ŸŽˆ BoF(Bag of Freebies)๋Š” "์˜ค์ง training์˜ ์ „๋žต๋งŒ์„ ๋ฐ”๊พธ๊ณ , training์˜ cost๋งŒ ์ฆ๊ฐ€์‹œ์ผœ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ค„๋‚ด๋Š” ๋ฐฉ๋ฒ•๋ก " ๋ผ๊ณ  ์ด์•ผ๊ธฐ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ‘จโ€๐Ÿซ BoF(Bag of Freebies)

โœ… Data Augmentation
	- Random erase
    - CutOut
    - MixUp
    - CutMix ๐Ÿ“Œ
    - Style transfer GAN
    - (new) Mosaic data augmentation ๐Ÿ“Œ
    - (new) Self-Adversarial Training ๐Ÿ“Œ
    
โœ… Regularization
 	- Dropout
    - DropPath
    - Spatial Dropout
    - DropBlcok ๐Ÿ“Œ
    
โœ… Loss Function
	- MSE
    - IoU
    - GIoU
    - CIoU ๐Ÿ“Œ
    - DIoU 
    
+ Class label smoothing ๐Ÿ“Œ
+ Cosine annealing scheduler ๐Ÿ“Œ

๐ŸŽˆ ์œ„์˜ ๊ฐ™์ด ๋‹ค์–‘ํ•œ BoF๋“ค์„ ablation study๋ฅผ ์ง„ํ–‰ํ•ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” BoF๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ฒ˜์Œ๋ณด๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ๋งŽ๊ณ , ๋ชจ๋“  ๋‚ด์šฉ์„ ์ „๋ถ€ ์•Œ ์ˆ˜ ์—†๊ธฐ์—, YOLO v4์— ์ ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ๋“ค๋งŒ ๊ณต๋ถ€ํ•ด๋„ ๊ดœ์ฐฎ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Bag of Specials

๐ŸŽˆ BoS(Bad of Specials)์€ "์•ฝ๊ฐ„์˜ inference cost์˜ ์ฆ๊ฐ€๋กœ detection์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” post-processing๊ณผ plugin modules์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ‘จโ€๐Ÿซ BoS(Bad of Specials)

โœ… EnHancement of receptive field
	- SPP ๐Ÿ“Œ
    - ASPP
    - Receptive Field Block(RFB)
    
โœ… Feature Integration
 	- Skip-connection ๐Ÿ“Œ
    - FPN
    - SFAM
    - ASFF
    - PAN ๐Ÿ“Œ
    - BiFPN
    
โœ… Activation Function
	- ReLU, Leaky ReLU, Parametric ReLU
    - ReLU6
    - Swish
    - Mish ๐Ÿ“Œ
 
โœ… Attention Module
	- Squeeze-and-Excitation(SE)
    - Spatial Attention Module(SAM)
    
โœ… Normalization
	- Batch Norm(BN)
    - Cross-GPU Batch Norm(CGBN or SyncBN)
    - Filter Response Normalization(FRN)
    - Cross-Iteration Batch Norm(CBN)
    - (new) Cross mini-Batch Normarlization(CmBN) ๐Ÿ“Œ
    
โœ… Post Processing
	- NMS
    - Soft NMS
    - DIoU NMS ๐Ÿ“Œ
    
+ Cross-stage partial connection(CSP) ๐Ÿ“Œ
+ SAM ๐Ÿ“Œ

๐ŸŽˆ ์œ„ ์—ญ์‹œ ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐฉ๋ฒ•๋ก ๋“ค๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Methodology

๐ŸŽˆ ๋จผ์ € CSPDarknet53์„ backbone ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ทผ๊ฑฐ๋ฅผ ์œ„์˜ ํ‘œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. CSPDarknet53์—์„œ ๊ฐ€์žฅ ๋†’์€ FPS๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ฐธ๊ณ ๋กœ Classification์—์„œ๋Š” CSPDarknet53์ด ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ ์ข‹์€ detector์˜ 3๊ฐ€์ง€ ์กฐ๊ฑด์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž‘์€ ๋ฌผ์ฒด๋ฅผ ์ธ์‹ํ•˜๊ธฐ ์œ„ํ•ด "Higher input network size", ๋„“์€ receptive field๋ฅผ ์œ„ํ•ด "More layers", ๋‹ค๋ฅธ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ€์ง„ ๋‹ค์–‘ํ•œ ๊ฐœ์ฒด๋ฅผ ์ธ์‹ํ•˜๊ธฐ ์œ„ํ•ด "More parameters" ๋“ค์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽˆ ํ•˜์ง€๋งŒ ๋ฌด์กฐ๊ฑด ์ข‹๋‹ค๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋ผ๊ณ  ์ƒ๊ฐ๋˜๋Š” ๊ฒƒ์€, ์œ„์˜ ๊ทธ๋ž˜ํ”„์—์„œ๋„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด "More layers", "More parameters" ํ•˜๋‹ค๊ณ  ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•ญ์ƒ ์ตœ์ ์˜ ์กฐ๊ฑด์„ ์ฐพ์•„์•ผ ๋˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๐ŸŽˆ ๋˜ํ•œ ์œ„์˜ ์‹คํ—˜๊ฒฐ๊ณผ๋กœ ๊ฐ๊ฐ์˜ ์‚ฌ์ด์ฆˆ์˜ receptive field ์˜ํ–ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Additional improvements

๐ŸŽˆ YOLOv4์—์„œ๋Š” single GPU์— ์ตœ์ ํ™” ํ•˜๊ธฐ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก  ๋“ค์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ‘จโ€๐Ÿซ Mosaic data augmentation

  • ๊ธฐ์กด์˜ CutMix์—์„œ๋Š” 2๊ฐ€์ง€์˜ input image๋ฅผ mixํ–ˆ์ง€๋งŒ, Mosaic์˜ ๊ฒฝ์šฐ 4๊ฐ€์ง€์˜ ์ด๋ฏธ์ง€๋ฅผ mixํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. BN ์—ฐ์‚ฐ์—์„œ๋Š” ๊ฐ ๋ ˆ์ด์–ด์—์„œ 4๊ฐœ์˜ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€์˜ ์—ฐ์‚ฐ์ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ 4๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ mixํ•˜๋ฉด์„œ ๊ทธ ๋งŒํผ์˜ mini-batch ํšจ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ‘จโ€๐Ÿซ Self-Adversarial Training(SAT)

  • Self-Adversarial Training(SAT)์€ 1st stage์—์„œ weights ๋ฐ”๊พธ์ง€ ์•Š๋Š” ๋Œ€์‹  ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋ฐ”๊พผ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ self- adversarial attack์ด๋ผ๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. 2nd stage์—์„œ ์ˆ˜์ •๋œ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. (์ด๊ฒŒ ์„ค๋ช…์˜ ์ „๋ถ€์—ฌ์„œ... ์ •ํ™•ํ•˜๊ฒŒ ์–ด๋–ป๊ฒŒ ํ•™์Šต์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ์ง€๋Š” ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.)

๐Ÿ‘จโ€๐Ÿซ Modify SAM and Modify PAN

๐ŸŽˆ ๊ธฐ์กด์˜ CBAM์—์„œ ์ œ์‹œ๋œ SAM ๋ฐฉ๋ฒ•๋ก ๊ณผ PANet์—์„œ ์ œ์‹œ๋œ PAN ๋ฐฉ๋ฒ•์„ ์ˆ˜์ •ํ•ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

โœ… Spatial attention module(SAM)

  • Spatial attention module์€ ์–ด๋””์— ์ค‘์š”ํ•œ ์ •๋ณด๊ฐ€ ์žˆ๋Š”์ง€ ์ง‘์ค‘ํ•˜๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค. ๊ฐ๊ฐ Maxpooling๊ณผ Avgpool์„ ์ ์šฉํ•œ ๋ ˆ์ด์–ด๋ฅผ concatํ•œ ํ›„ 7x7 conv ์—ฐ์‚ฐ์„ ์ ์šฉํ•ด spatial attention map์„ ์ƒ์„ฑํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. SAM ์ฐธ๊ณ ์ž๋ฃŒ

  • Modify SAM์€ ์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด Pooling์ด ์•„๋‹Œ Conv ์—ฐ์‚ฐ์„ ํ†ตํ•ด spatial attention map์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” spatila-wise attention to point-wise attention์ด๋ผ๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

โœ… PAN

  • PANet์€ ๊ธฐ์กด์˜ FPN์˜ top-down pathway์—์„œ bottom-up pathway ์ถ”๊ฐ€๋œ neck ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

  • Modify PAN์€ shorct connection ๋ถ€๋ถ„์—์„œ concatenation ์—ฐ์‚ฐ์œผ๋กœ ์ˆ˜์ •ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๐ŸŽˆ ์ด ์™ธ์—๋„ CmBN(Cross mini-batch normalization), genetic algorithms์„ ์‚ฌ์šฉํ•œ optimal hyper-parameters ์ฐพ๊ธฐ๋ฅผ ์ ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

YOLO v4

๐ŸŽˆ ๊ฒฐ๋ก ์ ์œผ๋กœ YOLov4๋Š” ์œ„์˜ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽˆ ๋…ผ๋ฌธ์—๋Š” ์ „์ฒด์ ์ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ์‚ฌ์ง„์ด ์—†์–ด ๋‹ค๋ฅธ ๊ณณ์—์„œ ์ฐพ์•„์™”์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฒฐ๊ณผ์ ์œผ๋กœ YOLOv4 = CSPDarknet53 + SPP + PAN + YOLOv3 ๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. YOLOv4์—์„œ ์‚ฌ์šฉ๋œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.(๋…ผ๋ฌธ์„ ์ฝ์œผ๋ฉด์„œ ์ฐพ์•„๋ดค๊ธฐ์— ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ง์ ‘ ์ฐพ์•„๋ณด์‹œ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.)

โœ… CSP(Cross Stage Partial)

  • CSP ๋ฐฉ๋ฒ•๋ก ์€ base layer์˜ feature map์„ ๋‘ ํŒŒํŠธ๋กœ ๋ถ„๋ฆฌํ•ด ํ•˜๋‚˜์˜ ํŒŒํŠธ๋งŒ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋ก  ์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๊ณผ์ • ์ค‘ backbone์—์„œ duplicate gradient information์ด ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. duplicate์ด ๋ฐœ์ƒํ•˜๋Š” ์ด์œ ๋Š” skip connection ๋•Œ๋ฌธ์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ํŒŒํŠธ ๋ถ„๋ฆฌ๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ธฐ์กด ๋ชจ๋ธ์˜ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๊ณ  ์ •ํ™•๋„๋ฅผ ์ƒ์Šน์‹œํ‚จ๋‹ค๊ณ ํ•ฉ๋‹ˆ๋‹ค.

โœ… SPP(Spatial Pyramid Pooling)

  • SPP(Spatial Pyramid Pooling)์€ ์ด์ „์˜ CNN ๊ตฌ์กฐ๊ฐ€ ๊ณ ์ •๋œ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์ทจํ•˜๋Š” ๊ฒƒ(R-CNN)์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก  ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. CNN์„ ํ†ตํ•ด ์ถ”์ถœ๋œ feature map์„ ์ง€์ •๋œ ํฌ๊ธฐ(4x4, 2x2, 1x1)๋กœ Poolingํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

โœ… Mish

  • Mish๋Š” ReLU์˜ ๋‹จ์ ์„ ๋ณด์•ˆํ•œ activation function์ด๋ฉฐ, AutoML๋กœ ๋งŒ๋“  Swish์— ์‚ฌ๋žŒ์ด ์ถ”๊ฐ€์ ์œผ๋กœ ๊ด€์—ฌํ•œ activation function์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  • ์•„๋ž˜์™€ ๊ฐ™์€ ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

    	- Mish์˜ ์ถœ๋ ฅ๊ฐ’ ๋ฒ”์œ„ [-0.31, ๋ฌดํ•œ] ์ด๋ผ๊ณ ํ•ฉ๋‹ˆ๋‹ค.
    	- ReLU์˜ ๊ฒฝ์šฐ ์Œ์˜ ๊ฐ’์„ 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ์—, ์ •๋ณด ์†์‹ค์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ, Mish๋Š” ์ž‘์€ ์Œ์˜ ๊ฐ’์„ ํ—ˆ์šฉํ•ด ์ด๋ฅผ ๋ณด์™„ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
     	- Mish๋Š” ์—ฐ์†์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ํŠน์ด์ ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.
        - Loss ๊ฐ’์ด smoothing ๋˜๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽˆ ์ด ์™ธ์—๋„ DropBlock, DIoU, CIoU ๋“ฑ๋“ฑ ๋งŽ์€ ๋ฐฉ๋ฒ•๋“ค์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ ๊ถ๊ธˆํ•˜์‹œ๋ฉด ์ฐพ์•„๋ณด์‹œ๋Š” ๊ฑธ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Experimental

Influence of different features on Classifier training

๐ŸŽˆ backbone ๋„คํŠธ์›Œํฌ์— ์‚ฌ์šฉ๋˜๋Š” BoF ablation study ์‹คํ—˜๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. CSPDarknet-53์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ "CutMix", "Mosaic", "Label Smoothing", "Mish"๋ฅผ ์‚ฌ์šฉํ• ๋•Œ ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Influence of different feature on Detector training

๐ŸŽˆ Detector์—์„œ๋Š” ์œ„์™€ ๊ฐ™์€ BoF๋“ค์„ ์‚ฌ์šฉํ•ด ablaction study๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ Eliminate grid sensitivity, Mosaic, IoU threshold, Genetic algorithms, Optimized Anchor๋ฅผ ์‚ฌ์šฉํ–ˆ์„๋•Œ ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ‘จโ€๐Ÿซ YOLO v4๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์ด ์•„๋‹Œ ์ง€๊ธˆ๊นŒ์ง€ ๋‚˜์˜จ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์„ one-stage ๋ฐฉ๋ฒ•์œผ๋กœ single GPU์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ์ตœ์ ํ™”๋œ ๋„คํŠธ์›Œํฌ๋ผ๊ณ  ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค.


Reference

profile
Maybe I will be an AI Engineer?
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€