Recycling waste object detection (2nd ๐Ÿฅˆ)

Batwanยท2024๋…„ 11์›” 2์ผ
4

Boostcamp-AI

๋ชฉ๋ก ๋ณด๊ธฐ
3/12

์žฌํ™œ์šฉ ์“ฐ๋ ˆ๊ธฐ object detection์„ ์ฃผ์ œ๋กœ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ–ˆ๋‹ค. detection์„ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๋‹ค์–‘ํ•œ ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. ํŠนํžˆ yolo model์„ ์ค‘์ ์œผ๋กœ ์—ฐ๊ตฌํ•ด ์™”๊ธฐ์— ๋‹ค๋ฅธ detection์„ ํ•˜๋”๋ผ๋„ ์ ์šฉ๋Šฅ๋ ฅ๋„ ์ƒ๊ธด ๊ฒƒ ๊ฐ™๋‹ค. ๋งค๋ ฅ์žˆ๋Š” ๊ธฐ์ˆ ๋กœ ์กฐ๊ธˆ ๋” ๊ณต๋ถ€ํ•ด ๋ณด๊ณ  ์‹ถ๋‹ค๋Š” ์š•์‹ฌ๋„ ์ƒ๊ฒจ ๋‹ค์–‘ํ•œ ์‹œ๋„๋ฅผ ์ง„ํ–‰ํ–ˆ๋‹ค. ์ง€๊ธˆ๊นŒ์ง€์˜ ์‹œ๋„์™€ ์—ฐ๊ตฌ๋ฅผ ๊ธฐ๋กํ•˜๊ณ ์ž ํ•œ๋‹ค.

1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

๋ฐ”์•ผํ๋กœ ๋Œ€๋Ÿ‰ ์ƒ์‚ฐ, ๋Œ€๋Ÿ‰ ์†Œ๋น„์˜ ์‹œ๋Œ€. ์šฐ๋ฆฌ๋Š” ๋งŽ์€ ๋ฌผ๊ฑด์ด ๋Œ€๋Ÿ‰์œผ๋กœ ์ƒ์‚ฐ๋˜๊ณ , ์†Œ๋น„๋˜๋Š” ์‹œ๋Œ€๋ฅผ ์‚ด๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ฌธํ™”๋Š” ์“ฐ๋ ˆ๊ธฐ ๋Œ€๋ž€, ๋งค๋ฆฝ์ง€ ๋ถ€์กฑ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ์‚ฌํšŒ ๋ฌธ์ œ๋ฅผ ๋‚ณ๊ณ  ์žˆ๋‹ค. ๋ถ„๋ฆฌ์ˆ˜๊ฑฐ๋Š” ์ด๋Ÿฌํ•œ ํ™˜๊ฒฝ ๋ถ€๋‹ด์„ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ž˜ ๋ถ„๋ฆฌ๋ฐฐ์ถœ ๋œ ์“ฐ๋ ˆ๊ธฐ๋Š” ์ž์›์œผ๋กœ์„œ ๊ฐ€์น˜๋ฅผ ์ธ์ •๋ฐ›์•„ ์žฌํ™œ์šฉ๋˜์ง€๋งŒ, ์ž˜๋ชป ๋ถ„๋ฆฌ๋ฐฐ์ถœ ๋˜๋ฉด ๊ทธ๋Œ€๋กœ ํ๊ธฐ๋ฌผ๋กœ ๋ถ„๋ฅ˜๋˜์–ด ๋งค๋ฆฝ ๋˜๋Š” ์†Œ๊ฐ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ์‚ฌ์ง„์—์„œ ์“ฐ๋ ˆ๊ธฐ๋ฅผ Detection ํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค.

  • ์ฃผ์ œ : ์žฌํ™œ์šฉ ํ’ˆ๋ชฉ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ Object Detection
  • ๋ชฉํ‘œ : 10 ์ข…๋ฅ˜์˜ ์“ฐ๋ ˆ๊ธฐ ํ’ˆ๋ชฉ Object Detection ๋ชจ๋ธ ๊ฐ์ฒด ํƒ์ง€ ์„ฑ๋Šฅ ํ–ฅ์ƒ

2. ๋ฐ์ดํ„ฐ

0123456789
General trashPaperPaper packMetalGlassPlasticStyrofoamPlastic bagBatteryClothing
  • ์ „์ฒด ์ด๋ฏธ์ง€ ๊ฐœ์ˆ˜ : 9754์žฅ
  • ์ด๋ฏธ์ง€ ํฌ๊ธฐ : (1024, 1024)
โ”œโ”€โ”€ dataset
    โ”œโ”€โ”€ train.json
    โ”œโ”€โ”€ test.json
    โ”œโ”€โ”€ train
    โ””โ”€โ”€ test
  • train: 4883์žฅ์˜ train image ์กด์žฌ
  • test: 4871์žฅ์˜ test image ์กด์žฌ
  • train.json: train image์— ๋Œ€ํ•œ annotation file (coco format)
  • test.json: test image์— ๋Œ€ํ•œ annotation file (coco format)

3. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰ ์ ˆ์ฐจ ๋ฐ ๋ฐฉ๋ฒ•

[ํ”„๋กœ์ ํŠธ ํƒ€์ž„๋ผ์ธ]

[ํ”„๋กœ์ ํŠธ ์„ฑ๋Šฅ ๊ทธ๋ž˜ํ”„]

์šฐ๋ฆฌ๋Š” ์‚ฌ์ง„์—์„œ ์“ฐ๋ ˆ๊ธฐ๋ฅผ Detection ํ•˜๊ธฐ ์œ„ํ•ด์„œ MMDetection ์˜คํ”ˆ์†Œ์Šค ๊ฐ์ฒด ํƒ์ง€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ์—ฌ๋Ÿฌ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์„ ์žฌํ™œ์šฉ ํ’ˆ๋ชฉ ๋ถ„๋ฅ˜ ๊ณผ์ œ์— ๋งž๊ฒŒ ํŠœ๋‹ํ•˜๊ณ  ์กฐํ•ฉํ•˜์—ฌ ๋„ค์ด๋ฒ„ ๋ถ€์ŠคํŠธ์บ ํ”„์—์„œ ๊ฐœ์ตœํ•œ โ€œ์žฌํ™œ์šฉ ํ’ˆ๋ชฉ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ Object Detection ๋ฆฌ๋”๋ณด๋“œ ํ”„๋กœ์ ํŠธโ€ ์—์„œ mAP50 0.7482๋ฅผ ๊ธฐ๋กํ•˜์—ฌ 2๋“ฑ์„ ์ฐจ์ง€ํ–ˆ๋‹ค. ์“ฐ๋ ˆ๊ธฐ Detection ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๊ฐ€ ํ•œ ์‹œ๋„๋ฅผ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ๋“ค์„ ์žฌํ™œ์šฉ ํ’ˆ๋ชฉ ๋ถ„๋ฅ˜ task์— ๋งž๊ฒŒ ํŠœ๋‹ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•œ๋‹ค.
  • ๋‹ค์–‘ํ•œ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ๋“ค์„ ์•™์ƒ๋ธ”ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚จ๋‹ค.

1. ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ EDA

1-1 ๋ฐ์ดํ„ฐ ๋ถ„ํ• 

๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •์—์„œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ถ„ํ• ํ–ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์˜ ํด๋ž˜์Šค ๋ถ„ํฌ ๋ถˆ๊ท ํ˜•์„ ๋ณด์ •ํ•˜๊ธฐ ์œ„ํ•ด Startifed group k-fold ๋ฐฉ์‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ fold์˜ ํด๋ž˜์Šค ๋ถ„ํฌ๊ฐ€ ๊ท ๋“ฑํ•ด์ง€๋„๋ก ํ–ˆ๋‹ค.

์ตœ์ข…์ ์œผ๋กœ Fold1์— ๋Œ€ํ•ด์„œ ์‹คํ—˜ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๊ณ  Fold1 ์˜ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋“  Fold์— ๋ฐ˜์˜ํ•˜์—ฌ 5-Fold ์•™์ƒ๋ธ”์„ ํ†ตํ•ด ์ตœ์ข… ๊ฒฐ๊ณผ๋ฌผ์„ ๋งŒ๋“ค์—ˆ๋‹ค.

1-2 Anchor box optimization

  • ํด๋ž˜์Šค๋ณ„ bounding box aspect ratio
MeanQ1, Q3 Range
Battery1.4332[0.7054, 1.6608]
Clothing1.3916[0.7451, 1.6229]
General trash1.2963[0.8391, 1.4873]
Glass1.2395[0.6954, 1.4824]
Metal1.0440[0.4725, 1.3875]
Paper1.2881[0.7065, 1.5731]
Paper pack1.4741[0.7928, 1.6312]
Plastic1.1068[0.6247, 1.3174]
Plastic bag1.4283[0.6307, 2.1209]
Styrofoam1.2283[0.6537, 1.5352]
All Classes[0.4725, 2.1209]

์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์—์„œ ์ตœ์ ์˜ anchor box ratio๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•˜์—ฌ bounding box aspect ratio๋ฅผ ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋ถ„์„ํ–ˆ๋‹ค.

๋ชจ๋“  ํด๋ž˜์Šค์—์„œ ๊ฐ์ฒด์˜ aspect ratio์˜ Q1-Q3 ๋ฒ”์œ„๊ฐ€ [0.4725, 2.1209]์— ๋ถ„ํฌํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ„ํฌ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์— ์ตœ์ ํ™”๋œ anchor box ratio๋ฅผ ์„ค์ •ํ•˜์˜€๋‹ค.

1-3 ๋ชจ๋ธ Predictions bounding box ๋ฐ PR ๊ณก์„  ์‹œ๊ฐํ™”

์‹คํ—˜ ์ค‘์ธ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ Predictions bounding box์™€ Ground Truth bounding box ๊ทธ๋ฆฌ๊ณ  PR ๊ณก์„ ์„ ์‹œ๊ฐํ™”ํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋ˆˆ์œผ๋กœ ํ™•์ธํ•˜๊ณ  ๋ชจ๋ธ์˜ ์•ฝ์ ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. [๊ทธ๋ฆผ 3]์„ ๋ณด๋ฉด ์ž‘์€ ๊ฐ์ฒด(General trash)์— ๋Œ€ํ•ด์„œ localization์„ ๋ชปํ•˜๊ณ , ๊ฒน์ณ์žˆ๋Š” ๊ฐ์ฒด(Plastic)์— ๋Œ€ํ•ด classifiaction์„ ๋ชปํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

2. MMdetection

2-1 ๋ฒ ์ด์Šค ๋ผ์ธ ์ดˆ๊ธฐ ์‹คํ—˜

๋ฒ ์ด์Šค๋ผ์ธ ์ดˆ๊ธฐ ์‹คํ—˜์—์„œ๋Š” MMDetction ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๋‹ค์–‘ํ•œ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ๋“ค์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์˜€์œผ๋ฉฐ, Fold 1 ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ๋‹ค. [ํ‘œ1]์„ ํ†ตํ•ด Faster R-CNN, Cascade R-CNN, ATSS, UniverseNet, RetinaNet, VFNet ๋ชจ๋ธ๋“ค์˜ ์ดˆ๊ธฐ ์„ฑ๋Šฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

ModelBackboneNeckOptimizerlrEpochTest mAP50
faster RCNNresnet50fpnSGD0.02120.3734
cascade RCNNSwinLfpnSGD0.02200.5161
ATSSSwinLfpnSGD0.02200.5015
universenetSwinLfpnAdamW0.0001200.5545
retinanetSwinLfpnAdamW0.0001200.5438
vfnetSwinLfpnAdamW0.0001200.5623

[ํ‘œ1] MMDetection ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ ์ดˆ๊ธฐ ์„ฑ๋Šฅ

์ดํ›„ Backbone, Neck, Optimizer, learning rate, ์ฆ๊ฐ• ๊ธฐ๋ฒ• ๋“ฑ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

2-2 Backbone

ModelBackbone ModelTest mAP50
Cascade RCNNresnet500.3613
Cascade RCNNSwinS0.4628
Cascade RCNNSwinL0.5161

[ํ‘œ2] Backbone Model์— ๋”ฐ๋ฅธ Cascade RCNN ์„ฑ๋Šฅ

Cascade R-CNN์— ResNet50, Swin Transformer Small, Swin Transformer Large๋ฅผ backbone ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์˜€๋‹ค. ResNet50์€ ์ „ํ†ต์ ์ธ CNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋กœ, Swin Transformer์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์€ ์ •ํ™•๋„๋ฅผ ๋ณด์˜€๋‹ค. Swin Transformer์€ Transformer ๊ธฐ๋ฐ˜์˜ backbone์œผ๋กœ, ํŒจ์น˜ ๋‹จ์œ„๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ๋„ ์ „์—ญ์ ์ธ ํŠน์ง•์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ, ๋ชจ๋ธ์˜ backbone์„ Swin Transformer model๋กœ ์ „ํ™˜ํ•จ์œผ๋กœ์จ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
Swin Transformer model์˜ ๊ฒฝ์šฐ ImageNet 22k์˜ (384x384) ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ ธ์™€ fine-tuning ํ•˜์—ฌ ์‚ฌ์šฉํ–ˆ๋‹ค. git

2-3 Neck

์—ฌ๋Ÿฌ input์— ๋Œ€ํ•ด ๋”์šฑ ๊ฒฌ๊ณ ํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ ์ž 1024x1024์™€ 1024x720์˜ ๋‘ ์ข…๋ฅ˜์˜ scale ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผฐ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ์— ๋งž์ถ”์–ด, ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํ˜•ํƒœ์˜ ๊ฐ์ฒด๋ฅผ ์ž˜ ์žก์•„๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด scale, spatial, task์— ๋Œ€ํ•œ attention์„ ํ•˜๋‚˜์˜ head๋กœ ํ†ตํ•ฉ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” dyhead๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ATSS ๋ชจ๋ธ์˜ head ๋ถ€๋ถ„์— dyhead๋ฅผ ๊ฒฐํ•ฉ์‹œํ‚จ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ, ๊ธฐ์กด ATSS ๋ชจ๋ธ์— ๋น„ํ•ด ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์˜ด์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

2-4 ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

a. IOU threshold ์™€ mAP50

Cascade RCNN ๋ชจ๋ธ์— ๋Œ€ํ•ด ๊ฐ๊ฐ IoU threshold ๊ฐ’์„ ๋‹ค๋ฅด๊ฒŒ ์„ค์ •ํ•˜์—ฌ inference๋ฅผ ํ–ˆ์„ ๋•Œ ์ž„์˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋Š” ์œ„์™€ ๊ฐ™๋‹ค. IoU threshold ๊ฐ’์„ ๋†’์ธ ๊ฒฝ์šฐ๊ฐ€ score ๊ฐ’์ด ๋‚ฎ์€ ๋ฐ•์Šค๋“ค์„ ์ œ๊ฑฐํ•˜๋ฏ€๋กœ ์‹œ๊ฐ์ ์œผ๋กœ๋Š” ๋” ์ž˜ detection์ด ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ, ์‹ค์ œ๋กœ mAP50์€ ๋” ๋‚ฎ๊ฒŒ ๋‚˜์˜จ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ‰๊ฐ€์ง€ํ‘œ์ธ mAP50์˜ ํŠน์„ฑ ์ƒ ํ‹€๋ฆฐ bounding box์— ๋Œ€ํ•ด ๊ฐ์ ๋˜๋Š” ์ ์ˆ˜๋ณด๋‹ค ground truth๋ฅผ ๋งž์ท„์„ ๋•Œ ์–ป๋Š” ์ ์ˆ˜๊ฐ€ ํฌ๋‹ค. ๋”ฐ๋ผ์„œ bounding box๋ฅผ ๋งŽ์ด ์ƒ์„ฑํ•˜๋Š”ํ•˜๋„๋ก IOU threshold ๊ฐ’์„ ๋‚ฎ์ถ”์–ด ์‚ฌ์šฉํ–ˆ๋‹ค.

b. Learning rate ๋ฐ scheduler

learning rate๊ฐ€ ์ž‘์•„์งˆ ๋•Œ๋งˆ๋‹ค validation ์„ฑ๋Šฅ์ด ์˜ค๋ฅด๋Š”๊ฑธ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ  ์ตœ์ข…์ ์œผ๋กœ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ฒŒ ๋„์™€์คŒ์„ ์•Œ ์ˆ˜ ์žˆ๊ณ  ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ StepLR scheduler ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋ชจ๋ธ๋“ค์˜ ํ•™์Šต๊ณผ์ •์„ ์•ˆ์ •ํ™” ์‹œ์ผฐ๋‹ค.

2-5 ์ฆ๊ฐ• ๊ธฐ๋ฒ•

a. Image ์‚ฌ์ด์ฆˆ ์กฐ์ • ๋ฐ multi-scale ์ ์šฉ

์ž‘์€ ํฌ๊ธฐ์˜ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ ์˜ˆ์ธก ์„ฑ๋Šฅ์ด ๋‚ฎ์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๋ฅผ ๋Š˜๋ ค ์ž‘์€ ํฌ๊ธฐ์˜ ๊ฐ์ฒด๋ฅผ ์ž˜ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต์‹œ์ผœ ๋ณด์•˜๋‹ค. [๊ทธ๋ฆผ 7]์„ ๋ณด๋ฉด input ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์›๋ณธ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์ธ 1024x1024 ํ•ด์ƒ๋„๋กœ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„  GPU ์„ฑ๋Šฅ์˜ ํ•œ๊ณ„๋กœ batch size๋ฅผ ์ค„์—ฌ์•ผํ–ˆ๊ณ , batch size๋ฅผ ์ค„์˜€์„ ๋•Œ validation loss๋‚˜ mAP50 ๊ฐ’์˜ ๋ณ€๋™์ด ์ปค์ง€๋ฉด์„œ ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๊ณ  test ์„ฑ๋Šฅ์ด ๋‚ฎ์•„์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค.
๋”ฐ๋ผ์„œ ๊ธฐ์กด์˜ batch size๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์ด๋ฏธ์ง€ ๋‚ด์— ํฌํ•จ๋œ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ •๋ณด์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด, ์ฆ๊ฐ• ๊ธฐ๋ฒ• ์ค‘ random crop์„ ์ด์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด๋‚˜ bounding box ์ž์ฒด์˜ ํฌ๊ธฐ๋Š” ์œ ์ง€ํ•˜๋ฉด์„œ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ๋ฅผ ์ค„์ด๊ณ  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

๋” ๋‚˜์•„๊ฐ€ ์‹ค์ œ ํ•™์Šต์—์„  1024x1024 ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋งŒ ๋„ฃ์–ด์ฃผ๋Š” ๊ฒƒ์ด ์•„๋‹Œ, 1024x720์˜ ์ด๋ฏธ์ง€๋„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์ด ๋”์šฑ ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„ํ–ˆ๋‹ค. ๋˜ํ•œ test ๊ณผ์ •์—์„œ๋„MultiScaleFlipAug ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ์ข…๋ฅ˜์˜ scale์„ ๋™์‹œ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ์–ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์œ„ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ํ‘œ๋กœ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

ModelBackboneNeckImage sizebatch sizerandom cropMultiScaleFlipAugTest mAP50
ATSSSwinLdyhead512x5124xx0.5815
ATSSSwinLdyhead640x6404xx0.6081
ATSSSwinLdyhead720x7204xx0.6470
ATSSSwinLdyhead1024x10242xx0.5902
ATSSSwinLdyhead1024x10244ox0.6680
ATSSSwinLdyhead1024x1024
1024x7204oo0.6752

[ํ‘œ3] ๋ชจ๋ธ ๊ตฌ์กฐ ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋”ฐ๋ฅธ ATSS ์„ฑ๋Šฅ ๋น„๊ตํ‘œ (test mAP50)

b. Mosaic

[๊ทธ๋ฆผ9] ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•: Mosaic

๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ ๊ฐ์ฒด๋ฅผ [๊ทธ๋ฆผ9]์™€ ๊ฐ™์ด ์กฐํ•ฉํ•˜์—ฌ ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต์ด ๋  ์ˆ˜ ์žˆ๋„๋ก ์ด๋ฏธ์ง€์— Mosaic ์ฆ๊ฐ• ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต์‹œํ‚ค๋Š” ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค.

Cascade RCNN ๋ชจ๋ธ์— ํšจ๊ณผ๊ฐ€ ์ข‹์•˜๋˜ ์ฆ๊ฐ• ๋ฐ ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์„ ์ ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฌผ์„ ์–ป์€ ๋’ค Mosaic ์ฆ๊ฐ• ๊ธฐ๋ฒ•์œผ๋กœ 3epoch ๋™์•ˆ ์ถ”๊ฐ€ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์„ ๋•Œ, ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

ModelBackbone Modelrandom crop + MultiScaleFiipAugMosaic fine-tuningTest mAP50
Cascade RCNNSwinLxx0.5161
Cascade RCNNSwinLox0.6373
Cascade RCNNSwinLoo0.5210

[ํ‘œ4] ๋ชจ๋ธ ๊ตฌ์กฐ ๋ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์— ๋”ฐ๋ฅธ Cascade RCNN ์„ฑ๋Šฅ ๋น„๊ตํ‘œ (test mAP50)

2-6 MMDetection ๊ฒฐ๋ก 

๋‹ค์–‘ํ•œ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ ๋ฐ backbone, neck ๊ตฌ์กฐ, ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹, ์ฆ๊ฐ• ๋ฐ ํ•™์Šต ๊ธฐ๋ฒ• ๋ณ€๊ฒฝ ๋“ฑ์„ ํ†ตํ•ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œ์ผฐ๋‹ค. ์—ฌ๋Ÿฌ ๋ชจ๋ธ์— ๋Œ€ํ•˜์—ฌ ๊ตฌ์กฐ ๋ฐ ๊ธฐ๋ฒ•์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™” ๋ถ„์„์„ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • Swin Transformer ๊ฐ™์€ ๊ฐ•๋ ฅํ•œ backbone ๋ชจ๋ธ๊ณผ Multi-scale ๊ธฐ๋ฒ•์„ ์กฐํ•ฉํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค
  • ATSS ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ SwinL backbone๊ณผ FPN ๊ทธ๋ฆฌ๊ณ  ๋‹ค์ค‘ ์Šค์ผ€์ผ ์ฆ๊ฐ•์„ ์ ์šฉํ–ˆ์„ ๋•Œ, ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. (mAP50 : 0.6752)
  • Cascade RCNN ๋ชจ๋ธ๊ณผ UniversNet ์—ญ์‹œ ๋‹ค์ค‘ ์Šค์ผ€์ผ ์ฆ๊ฐ•์„ ์ ์šฉํ–ˆ์„ ๋•Œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.
  • Mosiac ์ฆ๊ฐ•์˜ ๊ฒฝ์šฐ ์ข‹์ง€ ๋ชปํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธํ–ˆ๋‹ค.

3. Yolo

3-1 ๋ฒ ์ด์Šค๋ผ์ธ ์ดˆ๊ธฐ ์‹คํ—˜

ModelImage sizeFLOPsvalidatoin mAP40
YOLOv5x61280209.855.0
YOLO11x640194.954.7

[ํ‘œ5] YOLO v5 & v11 ๋ฒ ์ด์Šค๋ผ์ธ ์„ฑ๋Šฅ ๋น„๊ต (validationm mAP50)

YOLO ๋ชจ๋ธ์„ ์„ ํƒํ•  ๋•Œ YOLOv5์™€ YOLO11 ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๊ณ ์ž ํ–ˆ๋‹ค. YOLO11x์€ ์ตœ์‹  ๋ชจ๋ธ๋กœ ์†๋„๊ฐ€ ๋น ๋ฅด๋ฉฐ, ๋‚ฎ์€ FLOPs๋ฅผ ์š”๊ตฌํ•˜๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋ฒˆ ๋Œ€ํšŒ๋Š” ์†๋„์™€ ๊ฒฝ๋Ÿ‰ํ™”๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š๊ธฐ์— ์„ฑ๋Šฅ ์šฐ์„ ์œผ๋กœ YOLOv5x6๋ฅผ ์„ ํƒํ–ˆ๋‹ค. YOLOv5x6๋Š” YOLO11x๋ณด๋‹ค parameter๊ฐ€ 2.5๋ฐฐ ๋งŽ์•„ ์†๋„๋Š” ๋Š๋ฆฌ์ง€๋งŒ, ๋†’์€ ์ •ํ™•๋„๋ฅผ ํ•œ๋‹ค. ๋˜ํ•œ Image size๋กœ 1024x1024๋กœ YOLOv5x6๊ฐ€ ์ ํ•ฉํ•˜๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋‹ค.

ModelImage sizeepochTest mAP50
YOLOv5640200.3620
YOLOv5x640200.4226
YOLOv5x61280200.4770
YOLOv11640200.3675
YOLOv11x640200.4415

[ํ‘œ6] YOLO ๋ชจ๋ธ์˜ Image size ๋ณ„ ์„ฑ๋Šฅ ๋น„๊ต (test mAP50)

[ํ‘œ6]์„ ๋ณด์•˜์„ ๋•Œ, Yolov5x6์˜ ์„ฑ๋Šฅ์ด ์ œ์ผ ์ข‹์•˜๋˜ ๊ฒƒ์œผ๋กœ ์•Œ ์ˆ˜ ์žˆ๋‹ค. YOLO์˜ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ์„ YOLOv5x6์œผ๋กœ ์„ ํƒํ•˜๊ณ  ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

3-2 ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

ModelImage sizelrmomentumdecayStep sizeanchor boxtest mAP50
YOLOv5x612800.10.9370.0053original0.4770
YOLOv5x612800.010.9370.00053original0.5015
YOLOv5x612800.010.9370.00053anchor box tunning0.5303

[ํ‘œ 7] ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ ์ฆ๊ฐ•, ํ•™์Šต ๊ธฐ๋ฒ•์— ๋”ฐ๋ฅธ YOLOv5x6 ๋ชจ๋ธ ์„ฑ๋Šฅ ๋น„๊ต (test mAP50)

์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด layer freeze, hyperparameter, augmentation, anchor box optimization๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉฐ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ์ฒ˜์Œ์— layer freeze ๋Š” ๊ณผ์ ํ•ฉ์˜ ์œ„ํ—˜์„ฑ ๋•Œ๋ฌธ์— ์„ ํƒํ–ˆ์ง€๋งŒ, ์„ฑ๋Šฅ์ด ๋‚ฎ์•„์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜๊ณ  layer๋ฅผ freeze ์‹œํ‚ค์ง€์•Š๊ณ  fine-tuning ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ ํƒํ–ˆ๋‹ค.

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ†ตํ•ด ๊ธฐ๋ณธ ๋ฒ ์ด์Šค๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์ตœ์ข…์ ์œผ๋กœ anchor box optimization์„ ์ ์šฉํ•ด ์ตœ์ ์˜ anchor box๋ฅผ ์ฐพ๊ณ  ์ด๋ฅผ ํ•™์Šต์— ์ ์šฉํ–ˆ์„ ๋•Œ, ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

4. Co-DINO

4-1 ๋ชจ๋ธ ํƒ์ƒ‰ ๋ฐ ์„ ์ •

์ตœ์ ์˜ ๋ชจ๋ธ์„ ์ฐพ๊ธฐ ์œ„ํ•ด Papers with Code๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ SOTA ๋ชจ๋ธ๋“ค์„ ํ›„๋ณด๋กœ ์„ ์ •ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ๋‹ค.

ModelBackbonePre-training DatasetFine-Tuning Dataset splitvalidation mAP50testmAP50
Co-Deformable-DETRR50COCOTrain set0.3341
Co-DINOSwin-TCOCOTrain-Validation split0.4220
Co-DINOSwin-LCOCOTrain-Validation split0.71700.7071
Co-DINOSwin-LCOCOTrain set****0.7190
Co-DINOSwin-LCOCO5-fold CV****0.7283

[ํ‘œ 8] DETR ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ ๋ฐ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋น„๊ต (test mAP50)

์‹คํ—˜ ๊ฒฐ๊ณผ, Model ๋ถ€๋ถ„์—์„œ๋Š” Co-DINO๊ฐ€ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค. Co-DINO ๋ชจ๋ธ์˜ ํŠน์„ฑ์„ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ, ํ•ด๋‹น ๋ชจ๋ธ์€ Contrastive Denoising ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ํšจ์œจ์ ์ธ ์•ต์ปค ๋ฐ•์Šค๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๊ฐ์ฒด ๊ฒ€์ถœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋œ๋‹ค.

Backbone ๋ชจ๋ธ์—์„œ๋Š” ์ด์ „์˜ ์‹คํ—˜๊ฒฐ๊ณผ๊ฐ€ ์ฆ๋ช…ํ•˜๋“ฏ, Transformer ๊ธฐ๋ฐ˜์˜ Swin-L๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค. ์‹คํ—˜์—์„œ COCO ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ pre-training๋œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ํ™œ์šฉํ•˜์˜€๊ณ , ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํ•  ๋ฐฉ๋ฒ•์„ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ ์„ฑ๋Šฅ์„ ๋น„๊ตํ–ˆ๋‹ค.

K-fold cross validation testing ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„ํ• ํ•˜์—ฌ ํ•™์Šตํ•œ ๋’ค ์•™์ƒ๋ธ”ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ƒˆ์„ ๋•Œ, ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ์…‹์„ Fold๋กœ ๋ถ„ํ• ํ•˜๊ณ  ์•™์ƒ๋ธ”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฑ„ํƒํ•˜์˜€๊ณ , ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ๋กœ๋Š” Co-DINO ๋ชจ๋ธ์„ ์„ ์ •ํ•˜์˜€๋‹ค.

4-2 ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

๊ฒฝ์ง„๋Œ€ํšŒ๋ผ๋Š” ํŠน์„ฑ์ƒ ์‹œ๊ฐ„์  ์ œ์•ฝ์ด ์กด์žฌํ•˜์˜€๊ณ  Co-DINO Swin-L ๋ชจ๋ธ์€ 12์—ํญ ํ•™์Šต์— ์•ฝ 36์‹œ๊ฐ„ ๊ฑธ๋ฆฌ๋Š” ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•˜์—ฌ ๊ธฐ์กด ๋…ผ๋ฌธ์—์„œ ์ฐพ์•„๋‚ธ ์ตœ์ ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๊ธฐ์ค€์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

Optimizer๋Š” AdamW, learning rate 0.0002, input image size 1280ร—1280์„ ๊ธฐ์ค€์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

ModelBackboneFine-Tuning DatasetInput image sizevalidationmAP50testmAP50
Co-DINOSwin-TTrain-validation split(1024, 1024)0.4160
Co-DINOSwin-TTrain-validation split(1280, 1280)0.4220
Co-DINOSwin-TTrain-validation split(1536, 1536)0.0720
Co-DINOSwin-LTrain set(512, 512)0.6686
Co-DINOSwin-LTrain set(1280, 1280)****0.7790

[ํ‘œ 9] Co-DINO ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋น„๊ต

์ด์ „ ์‹คํ—˜๋“ค์—์„œ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ ํด์ˆ˜๋ก ํ•ด์ƒ๋„๊ฐ€ ๋†’์•„์ ธ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค. ์ด๋ฅผ Co-DINO ๋ชจ๋ธ์—๋„ ์ ์šฉํ•ด๋ณด๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์—์„œ์˜ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€๋‹ค.

[ํ‘œ 9]๋ฅผ ๋ณด๋ฉด ์ด๋ฏธ์ง€๊ฐ€ ์›๋ณธ์— ๋น„ํ•ด ์ž‘์„์ˆ˜๋ก ์ •๋ณด ์†์‹ค์ด ์ผ์–ด๋‚˜ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ 1280x1280 ์ด์ƒ์œผ๋กœ ํ‚ค์› ์„ ๋•Œ๋Š” ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๊ทธ ์ด์œ ๋กœ ํŠน์ • ์‚ฌ์ด์ฆˆ ์ด์ƒ์˜ ์ด๋ฏธ์ง€์—์„œ๋Š” backbone์˜ window์˜ ํฌ๊ธฐ ๋ฐ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์˜ ๊ตฌ์กฐ ์ƒ feature ์ •๋ณด๋ฅผ ์ œ๋Œ€๋กœ ๋ฝ‘์•„๋‚ด์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•˜์˜€๋‹ค.

4-3 ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•

๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๋ฒ•์œผ๋กœ๋Š” SOTA ๋…ผ๋ฌธ์—์„œ ์ œ๊ณตํ•˜๋Š” LSJ ์ฆ๊ฐ•๋ฒ•์„ ํ™œ์šฉํ•˜์˜€๋‹ค. ์ด ํ›„ Super resolution ๋ฐ Center Crop์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ณ  ์ž‘์€ box์—์„œ์˜ ๊ฐ์ฒด ํƒ์ง€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋„๋ชจํ–ˆ๋‹ค.

a. Large Scale Jittering

Scale Jittering Augmentation์ด๋ž€, ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜๊ฑฐ๋‚˜ ์ž˜๋ผ๋‚ด๋Š” ๋ฐฉ์‹์˜ ์ฆ๊ฐ•๋ฒ•์ด๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ SSJ ๋ฐฉ์‹๊ณผ LSJ ๋ฐฉ์‹์ด ์žˆ๋Š”๋ฐ, SSJ๋Š” ์ด๋ฏธ์ง€๋ฅผ 0.8~1.25๋ฐฐ, LSJ๋Š” ์ด๋ฏธ์ง€๋ฅผ 0.1~2.0๋ฐฐ ์ด๋‚ด์—์„œ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•œ ๋’ค ์ด๋ฏธ์ง€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ž๋ฅด๊ณ  ๋žœ๋ค ์ขŒ์šฐ ๋ฐ˜์ „๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ๋‹ค์–‘์„ฑ์„ ์ฆ๊ฐ€์‹œ์ผœ ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

์šฐ๋ฆฌ ๋ชจ๋ธ์—์„œ๋Š” ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๊ณ  ์•Œ๋ ค์ง„ LSJ ์ฆ๊ฐ•์„ ์ ์šฉํ•˜์˜€๋‹ค.

b. Super Resolution & Center Crop

[๊ทธ๋ฆผ 12] EDSR ๊ธฐ๋ฒ• ์˜ˆ์‹œ (Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.)

์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํ•ด์ƒ๋„๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๋” ์ž‘์€ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก EDSR ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์›๋ณธ Train ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ์กด (1024, 1024)์—์„œ (2048, 2048)๋กœ ์ฆ๊ฐ€์‹œํ‚จ ๋’ค ํ•™์Šต์— ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์˜€๋‹ค. ์ด๋•Œ, ํ•ด์ƒ๋„๋Š” ์ฆ๊ฐ€์‹œํ‚ค๋ฉด์„œ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋ฅผ (1024, 1024)๋กœ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก Center Crop๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.
์ด ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋†’์€ ํ•ด์ƒ๋„์˜ Center Crop ๋œ 3909์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ถ”๊ฐ€๋กœ ์ƒ์„ฑํ•˜์—ฌ 7818์žฅ์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ํ™•์žฅ์‹œ์ผœ ํ•™์Šตํ•˜์˜€๋‹ค.

[๊ทธ๋ฆผ 13]์„ ๋ณด๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์˜ ํฐ ๋ณ€ํ™”๋กœ ์ ์€ ์—ํญ์—์„œ๋Š” ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ์ถ”๊ฐ€์ ์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ฑฐ๋‚˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ํ•™์Šต์„ ๋” ์ง„ํ–‰ํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐˆ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์˜€๋‹ค. ํ•˜์ง€๋งŒ ํ•™์Šต ์‹œ๊ฐ„์ด 2๋ฐฐ๋กœ ์ฆ๊ฐ€ํ•˜์—ฌ ๊ฒฝ์ง„๋Œ€ํšŒ๋ผ๋Š” ๊ตฌ์กฐ ์ƒ ์‹คํ—˜์„ ์ค‘๋‹จํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ฒฐ์ •ํ•˜์˜€๋‹ค.

4-4 Co-DINO ๊ฒฐ๋ก 

ModelBackbonePretraining DatasetFine-Tuning DatasetInput image sizevalidation mAP50testmAP50
Co-Deformable-DETRR50COCOTrain set(400~800)
(multi-size)0.3341
Co-DINOSwin-TCOCOTrain-validation split(1024, 1024)0.4160
Co-DINOSwin-TCOCOTrain-validation split(1280, 1280)0.4220
Co-DINOSwin-TCOCOTrain-validation split(1536, 1536)0.0720
Co-DINOSwin-TCOCOTrain-validation split(1280, 1280)0.71700.7071
Co-DINOSwin-TCOCOTrain set(512, 512)0.6686
Co-DINOSwin-TCOCOTrain set(1280, 1280)0.7190
Co-DINOSwin-TCOCO5-fold CV(1280, 1280)0.7283

[ํ‘œ 10] DETR ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ์„ฑ๋Šฅ ์ตœ์ข… ๋น„๊ต (mAP50)

์ตœ์ข…์ ์œผ๋กœ Co-DINO ๋ชจ๋ธ์„ COCO ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ ธ์™€์„œ ์šฐ๋ฆฌ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ fine-tuning ํ•˜์—ฌ ์‚ฌ์šฉํ–ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” 5๊ฐœ์˜ Fold๋กœ ๋‚˜๋ˆ„๊ณ  ์ด๋ฅผ ์•™์ƒ๋ธ” ํ–ˆ์„ ๋•Œ, ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

์ตœ์ข… ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๊ณผ์ •์—์„œ ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์›Œ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ ์ž ํ•˜์˜€์ง€๋งŒ 1280x1280 ์ด์ƒ์˜ ํ•ด์ƒ๋„์—์„œ๋Š” ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ SR ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์–‘์„ ์ฆ๊ฐ€์‹œ์ผœ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋„๋ชจํ–ˆ์ง€๋งŒ ํ•™์Šต ์‹œ๊ฐ„์ด ๋„ˆ๋ฌด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ๋ฌธ์ œ๋กœ ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ lsj ์ฆ๊ฐ• ๊ธฐ๋ฒ•๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ–ˆ๋‹ค.

5. ์•™์ƒ๋ธ”

Object Detection Task ์—์„  ๋Œ€ํ‘œ์ ์œผ๋กœ NMS, soft NMS, NMW, WBF 4๊ฐ€์ง€์˜ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์ด์žˆ๋‹ค. ๊ฐ„๋‹จํ•œ ๊ธฐ๋ณธ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ๊ฐ ์•™์ƒ๋ธ”์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ ๋’ค, ๊ฒฐ๊ณผ๊ฐ€ ์ข‹์•˜๋˜ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์œผ๋กœ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฌผ์„ ๋งŒ๋“ค์—ˆ๋‹ค.

  • 5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ”
    • ๋ชจ๋ธ์„ 5-Fold cross validation testing ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šตํ–ˆ์„ ๋•Œ, ๊ฐ Fold๋ฅผ ํ•ฉ์น˜๊ธฐ ์œ„ํ•œ ์•™์ƒ๋ธ”
  • Different ๋ชจ๋ธ ์•™์ƒ๋ธ”
    • ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์น˜๊ธฐ ์œ„ํ•œ ์•™์ƒ๋ธ”

5-1 5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ• ํƒ์ƒ‰

Model5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ”test mAP50
ATSS1 Fold0.6716
NMS0.6869
Soft-NMS0.6779
NMW0.6895
WBF0.6978

[ํ‘œ 11] 5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ”์—์„œ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์— ๋”ฐ๋ฅธ ATSS ์„ฑ๋Šฅ ๋น„๊ต (test mAP50)

๋‹จ์ผ Fold๋กœ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋˜ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์ธ ATSS ๋ชจ๋ธ๋กœ 5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ์„ ๋น„๊ตํ–ˆ๋‹ค. ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์•˜๋˜ ๊ธฐ๋ฒ•๊ณผ ๋‘ ๋ฒˆ์งธ๋กœ ์ข‹์•˜๋˜ ๊ธฐ๋ฒ•์ธ NMW์™€ WBF 2๊ฐ€์ง€๋กœ ๋‚˜๋จธ์ง€ ๋‹จ์ผ ๋ชจ๋ธ์˜ 5-Fold ๊ฒฐํ•ฉ๊ณผ ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์˜ ์•™์ƒ๋ธ”์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ๋‹ค.

5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ Different ๋ชจ๋ธ ์•™์ƒ๋ธ”์— ์žฌ๋ฃŒ๋กœ ์“ฐ์ผ ๋ชจ๋ธ๋“ค์„ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Model5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ”test mAP50
ATSSWBF0.6978
Cascade RCNNWBF0.6276
UniversNetWBF0.6266
CO-DINONMW0.7283
YOLOv5x6x0.5013

[ํ‘œ 12] 5-Fold cross validation testing ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šตํ–ˆ๋˜ ๋ชจ๋ธ๋“ค์˜ 5-Fold ๊ฒฐํ•ฉ ์•™์ƒ๋ธ” ๊ฒฐ๊ณผ

์•™์ƒ๋ธ” ๊ธฐ๋ฒ•ATSSCascade RCNNUniversNetCo-DINOYOLOv5x6test mAP50
WBFoo0.7055
NMWoo0.7118
NMWooo0.6948
WBFoo0.7198
NMSoo0.7217
NMWooo0.7327
NMWoooo0.7553

์ตœ์ข…์ ์œผ๋กœ ATSS, Cascade RCNN, Co-DINO, YOLOv5x6 ์„ NMW ๊ธฐ๋ฒ•์œผ๋กœ ์•™์ƒ๋ธ” ํ–ˆ์„ ๋•Œ, ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ  ๋Œ€ํšŒ๋ฅผ 2๋“ฑ์œผ๋กœ ๋งˆ๋ฌด๋ฆฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํŠน์ดํ•œ ์ ์€ YOLOv5x6 ์˜ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๊ฒฐ๊ณผ๊ฐ€ ์—„์ฒญ ๋‚ฎ์€๋ฐ, ์•™์ƒ๋ธ”์„ ๊ฐ™์ด ํ–ˆ์„ ๋•Œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. YOLOv5x6 ๋ชจ๋ธ์ด ์ •ํ™•ํ•˜๊ฒŒ ๋งž์ถœ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๋“ค์— ๋Œ€ํ•ด์„œ๋งŒ bounding box๋ฅผ ์น˜๋Š” ๊ฒฝํ–ฅ์„ฑ ๋•Œ๋ฌธ์— ์•™์ƒ๋ธ” ํ–ˆ์„ ๋•Œ ์ข‹์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ ๊ฑฐ ๊ฐ™๋‹ค.

(YOLOv5x6 ์˜ ๊ฒฝ์šฐ Fold๋ฅผ ๋‚˜๋ˆ„์ง€ ์•Š๊ณ  ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ๋‹ค.)

5-2 Different ๋ชจ๋ธ ์•™์ƒ๋ธ”

์„ฑ๋Šฅ์ด ๋†’์•˜๊ฑฐ๋‚˜ bouding box ํƒ์ง€๊ฐ€ ์ข‹์•˜๋˜ [ํ‘œ 12]์˜ ๋ชจ๋ธ๋“ค์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์•™์ƒ๋ธ”ํ•˜๊ณ  ์„ฑ๋Šฅ์ผ ๋น„๊ตํ–ˆ๋‹ค.

๊ฐœ์ธ ํšŒ๊ณ 

๋ชจ๋ธ ๊ฐœ์„ 
Yolo ๋ชจ๋ธ๋ณด๋‹ค๋Š” DETR, DINO ๋ชจ๋ธ์ด ์„ฑ๋Šฅ์ด 38% ์ด์ƒ ์ข‹์•„์„œ Yolo๋Š” ํ›„์ˆœ์œ„๋กœ ๋‚จ๊ฒจ์ ธ ์žˆ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ Yolo๋Š” ์•™์ƒ๋ธ”์— ์ข‹์€ ์—ญํ• ์„ ํ•˜๊ณ  ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋๊นŒ์ง€ ๊ตฌํ˜„ํ•˜๊ณ ์ž ํ–ˆ๋‹ค. ๋Œ€ํšŒ์˜ ๋ฆฌ๋”๋ณด๋“œ๋„ ์ค‘์š”ํ•˜์ง€๋งŒ Object detection์—๋Š” Yolo๊ฐ€ ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฒˆ ๊ธฐํšŒ์— ๋ฐฐ์›Œ๋ณด๊ณ ์ž ํ•™์Šตํ–ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ Yolo ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ดํ•ด์™€ ํŠœ๋‹์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋„๊ฐ€ ๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ๋Œ€ํšŒ์—์„œ์˜ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ Yolo์™€ ์•™์ƒ๋ธ”์„ ํ†ตํ•œ ๊ฒฐ๊ณผ๊ฐ€ ์ •ํ™•๋„ 2% ํ–ฅ์ƒ์œผ๋กœ ๋Œ€ํšŒ์—์„œ 2๋“ฑ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.
Yolo ๋ชจ๋ธ๋กœ anchor box optimization, hyperparameter, augmentation, layer freeze ๋“ฑ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์‹คํ—˜์„ ํ•ด๋ณด์•˜๊ณ , ์ตœ์ ์˜ ๊ฐ’๋“ค์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. Layer freeze๋Š” ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์œ ๋ฆฌํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ freeze๋ฅผ ์ง„ํ–‰ํ•˜์ง€ ์•Š๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ๋‹ค.

๋ฐฐ์šด์ 
๋๊นŒ์ง€ ํฌ๊ธฐํ•˜์ง€ ์•Š๊ณ  ๊พธ์ค€ํžˆ ๋…ธ๋ ฅํ•˜๋Š” ํ•˜๋Š” ์‚ฌ๋žŒ์€ ๋Œ€๊ฐ€๊ฐ€ ์˜จ๋‹ค๊ณ  ์ž์ฃผ ๋“ค์—ˆ๋‹ค. ์ด๋ฒˆ ํ”„๋กœ์ ํŠธ์—์„œ ๋‹ค์‹œ ํ•œ๋ฒˆ ๋Š๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. Yolo ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ๋‹ค๋ฅธ ๋ชจ๋ธ๋ณด๋‹ค ๋‚ฎ์€ ๊ฒƒ์œผ๋กœ ํŒ๋‹จํ•˜์—ฌ Yolo์— ์ง‘์ค‘๋ณด๋‹จ ๋‹ค๋ฅธ ๋ชจ๋ธ์— ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋‘ ๊ฐ€์ง€์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” Confidence Score ์ˆœ์œผ๋กœ ์ •๋ ฌ์„ ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค. ์ด๋ฅผ ๊ณ ๋ คํ–ˆ๋‹ค๋ฉด test-acc๋Š” ๋” ๋†’์•˜์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” Yolo๋Š” ์•™์ƒ๋ธ”์— ์‚ฌ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์— ๋„์›€์ด ๋œ๋‹ค๋Š” ์ ์ด๋‹ค. ๋‚ฎ์€ ์ •ํ™•๋„์˜€์ง€๋งŒ, ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•˜๋ฉด ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ฆฌ๋”๋ณด๋“œ 7๋“ฑ์—์„œ 2๋“ฑ์ด ๋  ์ˆ˜ ์žˆ์—ˆ๋˜ ์ด์œ ๋Š” yolo๋ฅผ ํ™œ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์•Œ๊ณ  ์žˆ์—ˆ๋‹ค๋ฉด yolo์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋” ๋…ธ๋ ฅํ–ˆ์„ ๊ฒƒ์ด๋‹ค. ๋๊นŒ์ง€ ๋…ธ๋ ฅํ•˜๋Š” ์ž์„ธ๊ฐ€ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋Š” ๊ณ„๊ธฐ์˜€๋‹ค.

๊ฐœ์„ ๋ฐฉํ–ฅ
๋๊นŒ์ง€ ํฌ๊ธฐํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ œ์ผ ์ค‘์š”ํ•  ๊ฒƒ ๊ฐ™๋‹ค. ์ถฉ๋ถ„ํ•œ ์‹œ๊ฐ„์ด ์ฃผ์–ด์กŒ์ง€๋งŒ, ์‹œ๊ฐ„์— ์ซ“๊ฒจ ์ตœ์ข… ์ œ์ถœํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋ฐ˜์„ฑํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ถ๊ธˆํ•œ ๋ถ€๋ถ„์ด ์žˆ์œผ๋ฉด ๋ถ„์„ํ•˜๋Š” ์ž์„ธ๋ฅผ ๊ฐ€์ ธ์•ผ ํ•œ๋‹ค. ์•™์ƒ๋ธ” ์ฝ”๋“œ๋Š” 10๋ถ„ ์ด๋‚ด์— ์™„๋ฃŒ๋˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๊ณ  ์žˆ์ง€๋งŒ, ์ด๋ฒˆ ์•™์ƒ๋ธ” ์ฝ”๋“œ๋Š” 2์‹๊ฐ„ 30๋ถ„์ด ์†Œ์š”๋๋‹ค. ์˜์•„ํ–ˆ์ง€๋งŒ csv ํŒŒ์ผ์ด ํฌ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•œ ์ผ์ด๋ผ ์ƒ๊ฐํ–ˆ์ง€๋งŒ, ๊ฒฐ๊ตญ ์ฝ”๋“œ์˜ ๋ฌธ์ œ๊ฐ€ ๋งž์•˜๋‹ค. ์›๋ž˜ ์ƒ๊ฐํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋‹ค๋ฅด๋ฉด ์ˆ˜๊ธํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ ๋ถ„์„์— ๋”ฐ๋ฅธ ์ดํ•ด๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

profile
AI is my life

2๊ฐœ์˜ ๋Œ“๊ธ€

comment-user-thumbnail
2025๋…„ 3์›” 7์ผ

์ข‹์€ ํฌ์ŠคํŒ… ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค~ ์ง€๋‚˜๊ฐ€๋‹ค ์šฐ์—ฐํžˆ ๋ณด๊ณ ๊ฐ‘๋‹ˆ๋‹คใ…Žใ…Ž ์žฌ๋ฏธ์žˆ์—ˆ์–ด์š”

1๊ฐœ์˜ ๋‹ต๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด

Powered by GraphCDN, the GraphQL CDN