๐ŸŽฒ[AI] Regularization

manduยท2025๋…„ 5์›” 19์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
20/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.


1. Regularization์ด๋ž€?

  • Overfitting(๊ณผ์ ํ•ฉ)์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด Generalization Error๋ฅผ ์ค„์ด๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•

  • ์ผ๋ฐ˜์ ์œผ๋กœ training error๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์„ ์ผ๋ถ€๋Ÿฌ ๋ฐฉํ•ดํ•˜๋Š” ํ˜•ํƒœ
    โ†’ ์ด ๊ณผ์ •์—์„œ training error๊ฐ€ ๋†’์•„์งˆ ์ˆ˜ ์žˆ์Œ

  • loss๊ฐ€ ์ตœ์†Œํ™” ๋ ์ˆ˜๋ก ์ตœ๋Œ€ํ™” ๋˜๋Š” term์„ ์ถ”๊ฐ€ํ•˜๋Š” ์›๋ฆฌ
    โ†’ ์ตœ์†Œํ™” term๊ณผ ์ตœ๋Œ€ํ™” term์˜ ๊ท ํ˜•์„ ์ฐพ๋„๋ก ํ•จ

  • ๋ชจ๋ธ์ด noise์— ๊ฐ•ํ•˜๊ณ  unseen ๋ฐ์ดํ„ฐ์—๋„ ์ž˜ ์ž‘๋™ํ•˜๋„๋ก ์œ ๋„

Overfitting
Training error๊ฐ€ generalization error์— ๋น„ํ•ด ํ˜„์ €ํžˆ ๋‚ฎ์•„์ง€๋Š” ํ˜„์ƒ


2. ์ฃผ์š” Regularization ๊ธฐ๋ฒ• ๋ถ„๋ฅ˜

์ ์šฉ ์˜์—ญ๊ธฐ๋ฒ•
๋ฐ์ดํ„ฐData Augmentation, Noise Injection
์†์‹ค ํ•จ์ˆ˜Weight Decay (L1, L2)
Layer ์ถ”๊ฐ€Dropout, Batch Normalization
ํ•™์Šต ๋ฐฉ์‹Early Stopping, Bagging & Ensemble

Ensemble Learning
์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ์กฐํ•ฉํ•ด ํ•˜๋‚˜์˜ ๊ฐ•๋ ฅํ•œ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ธฐ๋ฒ•

Bagging (Bootstrap Aggregating)
Ensemble์˜ ํ•œ ์ข…๋ฅ˜๋กœ, ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์— ํ•™์Šต์‹œ์ผœ ์˜ˆ์ธก์„ ํ‰๊ท  ๋˜๋Š” ํˆฌํ‘œ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹
e.g. ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ(Random Forest)

Boosting
์•ฝํ•œ ๋ชจ๋ธ(weak learner)์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์—ฐ๊ฒฐํ•ด์„œ ์„ฑ๋Šฅ์„ ์ ์  ๋†’์ด๋Š” ๋ฐฉ์‹
์ด์ „ ๋ชจ๋ธ์ด ํ‹€๋ฆฐ ๋ถ€๋ถ„์„ ๋ณด์™„ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋‹ค์Œ ๋ชจ๋ธ์ด ํ•™์Šต

Stacking
์—ฌ๋Ÿฌ ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ๋‹ค์‹œ ์กฐํ•ฉํ•ด์„œ ์ตœ์ข… ์˜ˆ์ธก์„ ํ•˜๋Š” ๋ฐฉ์‹
๋ณดํ†ต 1๋‹จ๊ณ„ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฉ”ํƒ€ ๋ชจ๋ธ(meta-model)์ด ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์ตœ์ข… ์˜ˆ์ธก


3. Weight Decay

  • Weight parameter๋Š” ๋…ธ๋“œ์™€ ๋…ธ๋“œ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋ƒ„
    โ†’ ์ˆซ์ž๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ๊ฐ•ํ•œ ๊ด€๊ณ„
  • ๋˜ํ•œ, ๋ณดํ†ต ํ•™์Šต์ด ์ง„ํ–‰๋ ์ˆ˜๋ก Weight์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์ง
  • ์†์‹ค ํ•จ์ˆ˜์— Weight์˜ ๊ฐ€์ค‘์น˜ ํฌ๊ธฐ์— ๋Œ€ํ•œ ํŒจ๋„ํ‹ฐ ํ•ญ์„ ์ถ”๊ฐ€
    โ†’ ์ „์ฒด์ ์ธ ๊ด€๊ณ„์˜ ๊ฐ•๋„๋ฅผ ์ œํ•œํ•˜์—ฌ ์ถœ๋ ฅ ๋…ธ๋“œ๊ฐ€ ๋‹ค์ˆ˜์˜ ์ž…๋ ฅ ๋…ธ๋“œ๋กœ๋ถ€ํ„ฐ ๋งŽ์ด ๋ฐฐ์šฐ์ง€ ์•Š๋„๋ก ์ œํ•œ
    โ†’ ๊ณผ๋„ํ•œ weight ํ™•์žฅ์„ ๋ฐฉ์ง€ํ•˜์—ฌ ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€

L2 Regularization (๊ฐ€์žฅ ์ผ๋ฐ˜์ )

L(ฮธ)=Loriginal(ฮธ)+ฮฑโˆ—โˆฃโˆฃWโˆฃโˆฃยฒL(ฮธ) = L_{original}(ฮธ) + ฮฑ * ||W||ยฒ

L1 Regularization

L(ฮธ)=Loriginal(ฮธ)+ฮฑโˆ—โˆฃโˆฃWโˆฃโˆฃL(ฮธ) = L_{original}(ฮธ) + ฮฑ * ||W||

ํ•ญ๋ชฉL1 NormL2 Norm
์ˆ˜์‹Loss+ฮปโˆ‘abs(wi)Loss + \lambda\sum abs(w_i)Loss+ฮปโˆ‘wi2Loss + \lambda\sum w_i^2
๋ณ„๋ช…LassoRidge
๊ฐ€์ค‘์น˜ ์ฒ˜๋ฆฌ์ผ๋ถ€ ๊ฐ€์ค‘์น˜๋ฅผ ์™„์ „ํžˆ 0์œผ๋กœ ๋งŒ๋“ฆ๋ชจ๋“  ๊ฐ€์ค‘์น˜๋ฅผ ์ž‘๊ฒŒ ์œ ์ง€
ํšจ๊ณผFeature Selection ๊ฐ€๋ŠฅRegularization ์ค‘์‹ฌ
๊ฒฐ๊ณผํฌ์†Œํ•œ ๋ชจ๋ธ ์ƒ์„ฑ๋ถ€๋“œ๋Ÿฌ์šด ๋ชจ๋ธ ์ƒ์„ฑ
  • ์ผ๋ฐ˜์ ์œผ๋กœ bias๋Š” regularization ๋Œ€์ƒ์—์„œ ์ œ์™ธ
  • Hyper-parameter ฮฑ๋ฅผ ํ†ตํ•ด ๋‘ term ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ์กฐ์ ˆ
  • PyTorch์—์„œ๋Š” optimizer ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ weight_decay ์‚ฌ์šฉ
  • ๋„ˆ๋ฌด ๊ฐ•ํ•˜๊ฒŒ regularization์„ ๊ฑธ๊ฒŒ ๋˜๋Š”๊ฑฐ๋ผ, ์‹ค์ œ๋กœ ๋งŽ์ด ์“ฐ๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋‹˜

4. Data Augmentation

  • ๊ธฐ์กด ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๋‚˜ ๋ณ€ํ˜•์„ ์ฃผ์–ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์žฅ
  • ํŠนํžˆ, ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ ์ƒํƒœ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
    โ†’ ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์œผ๋ฉด bias๊ฐ€ ๋˜๊ฒŒ ์‹ฌํ• ๊ฒƒ์ด๋‹ˆ๊นŒ!
  • ํ•ต์‹ฌ ํŠน์ง•(feature)์„ ์œ ์ง€ํ•œ ์ฑ„, ์ž…๋ ฅ ๋ถ„ํฌ๋ฅผ ๋‹ค์–‘ํ™”ํ•˜์—ฌ ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€
  • ๋ณดํ†ต์€ ํ•ต์‹ฌ ํŠน์ง•์„ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด ํœด๋ฆฌ์Šคํ‹ฑํ•œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉ
  • ๊ทœ์น™์„ ํ†ตํ•ด ์ฆ๊ฐ•(augment)ํ•˜๋Š” ๊ฒƒ์€ ์˜ณ์ง€ ์•Š์Œ
    โ†’ ๋ชจ๋ธ์ด ๊ทธ ๊ทœ์น™์„ ๋ฐฐ์›Œ๋ฒ„๋ฆผ
    โ†’ Ramdomness๊ฐ€ ํ•„์š”ํ•จ!

์ด๋ฏธ์ง€ ๋ถ„์•ผ ์˜ˆ์‹œ

  • Salt & Pepper noise: ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ™”์งˆ์„ ์ผ๋ถ€๋กœ ์•ˆ ์ข‹๊ฒŒ ํ•˜๋Š” ๊ฒƒ
    (์ด๋ฏธ์ง€์— ๊ฐ‘์ž๊ธฐ ์ƒ๊ธฐ๋Š” ํฐ ์ (์†Œ๊ธˆ)๊ณผ ๊ฒ€์€ ์ (ํ›„์ถ”) ๊ฐ™์€ ์žก์Œ)
  • RGB ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€
  • ํšŒ์ „ (Rotation)
  • ์ขŒ์šฐ ๋ฐ˜์ „ (Flipping)
  • ์œ„์น˜ ์ด๋™ (Shifting)

ํ…์ŠคํŠธ ๋ถ„์•ผ ์˜ˆ์‹œ

  • ๋‹จ์–ด ์ƒ๋žต (Dropping): ๋ฌธ์žฅ์— ์ž„์˜๋กœ ๋‹จ์–ด ๋นต๊พธ ๋šซ๊ธฐ
  • ๋‹จ์–ด ์œ„์น˜ ๋ฐ”๊พธ๊ธฐ (Exchange): ์ž„์˜๋กœ ๋Œ€์ƒ ๋‹จ์–ด๋ฅผ ์ฃผ๋ณ€ ๋‹จ์–ด์™€ ์œ„์น˜ ๊ตํ™˜ (ํ•œ๊ตญ์–ด๋Š” ๋งค์šฐ ํšจ๊ณผ์ )

์ƒ์„ฑ ๋ชจ๋ธ ํ™œ์šฉ

  • AutoEncoder(AE), GAN ๋“ฑ์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ ์ƒ์„ฑ
  • ๋‹จ์ : ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ๊ฐœ๋… ํ•™์Šต์€ ์–ด๋ ค์›€ (์ƒ์„ฑ ๋ชจ๋ธ๋„ ์ด๋ฏธ ํŠน์ • ๋ฐ์ดํ„ฐ๋งŒ ๋ณด๊ณ  ํ•™์Šต๋œ ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์—)
  • ์žฅ์ : ์ตœ์ ํ™”์—์„œ ์œ ๋ฆฌ, ๋…ธ์ด์ฆˆ ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ

5. Dropout

  • ๊ณผ์ ํ•ฉ(overfitting)์„ ๋ง‰๊ธฐ ์œ„ํ•œ Regularization ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, ํ•™์Šต ์‹œ ์ž„์˜๋กœ ๋‰ด๋Ÿฐ์„ ๋น„ํ™œ์„ฑํ™”ํ•˜์—ฌ ํŠน์ • ๋…ธ๋“œ์— ์˜์กดํ•˜์ง€ ์•Š๋„๋ก ํ•จ

ํšจ๊ณผ

  • ๊ณต๋™ ์ ํ•ฉ(Co-adaptation) ๋ฐฉ์ง€

    ๊ณต๋™ ์ ํ•ฉ (Co-adaptation)
    ์—ฌ๋Ÿฌ ๋‰ด๋Ÿฐ๋“ค์ด ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ํ•ญ์ƒ ํ•จ๊ป˜ ์ž‘๋™ํ•˜๋ฉด ํ•™์Šต์ด ํŽธ์ค‘๋  ์ˆ˜ ์žˆ์Œ
    e.g. ๋‰ด๋Ÿฐ A๊ฐ€ ํ•ญ์ƒ B์™€ ๊ฐ™์ด ์ž‘๋™ํ•ด์„œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋Š” ๋ฐฉ์‹์ด๋ฉด, A๋‚˜ B๊ฐ€ ์—†์–ด์ง€๋ฉด ์„ฑ๋Šฅ์ด ํ™• ๋–จ์–ด์ง
    โ†’ ํ•™์Šต ์ค‘ ๋žœ๋ค์œผ๋กœ ๋‰ด๋Ÿฐ์„ ์ œ๊ฑฐํ•ด์„œ, ๊ฐ ๋‰ด๋Ÿฐ์ด ๋…๋ฆฝ์ ์œผ๋กœ๋„ ์ž˜ ์ž‘๋™ํ•˜๋„๋ก ๊ฐ•์ œ

  • ํ•™์Šต ์‹œ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋‰ด๋Ÿฐ ์กฐํ•ฉ์œผ๋กœ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋งŒ๋“ค์–ด์ง
    โ†’ ์ฆ‰, ๋งค๋ฒˆ ๋‹ค๋ฅธ ์ž‘์€ ๋ชจ๋ธ๋“ค์ด ํ•™์Šต๋˜๊ณ , ์ถ”๋ก ํ•  ๋•Œ๋Š” ๊ทธ๊ฒƒ๋“ค์„ ํ‰๊ท ๋‚ธ ํšจ๊ณผ๊ฐ€ ๋ฐœ์ƒ
    โ†’ ์•™์ƒ๋ธ” ํšจ๊ณผ (๊ณผ์ ํ•ฉ ๋ฐฉ์ง€, ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ โ†‘)

๋™์ž‘ ๋ฐฉ์‹

  • ํ•™์Šต(Training) ์‹œ
    • model.train()
    • ํ™•๋ฅ  p๋กœ ๋‰ด๋Ÿฐ์„ ๋žœ๋คํ•˜๊ฒŒ drop (turn-off)
  • ์ถ”๋ก (Validataion, Test) ์‹œ
    • model.eval()
    • drop out ๊ผญ ๊บผ์ค˜์•ผ ํ•จ!
    • ๋ชจ๋“  ๋‰ด๋Ÿฐ ์‚ฌ์šฉ, ๋Œ€์‹  weight์— p๋ฅผ ๊ณฑํ•ด ๋ณด์ •
    • ํ•˜์ง€๋งŒ ํ•™์Šต ๋•Œ๋ณด๋‹ค ํ‰๊ท ์ ์œผ๋กœ 1p1\over p๋ฐฐ ๋” ํฐ ์ž…๋ ฅ์„ ๋ฐ›๊ฒŒ ๋  ๊ฒƒ
    • ์ฆ‰, ๋” ์ ์€ ๋‰ด๋Ÿฐ์„ ์‚ฌ์šฉํ•ด์„œ ํ•™์Šต๋˜์—ˆ์œผ๋‹ˆ, ์ถ”๋ก  ์‹œ์—๋„ ๋™์ผํ•œ ์Šค์ผ€์ผ๋กœ ๋ณด์ • ํ•„์š”!

์ ์šฉ ์œ„์น˜

  • ์ผ๋ฐ˜์ ์œผ๋กœ: Linear Layer โ†’ Activation โ†’ Dropout โ†’ ๋‹ค์Œ Layer
    • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ p ํ•„์š”

์žฅ๋‹จ์ 

  • ์žฅ์ :
    • Generalization error ๊ฐ์†Œ
  • ๋‹จ์ :
    • ํ•™์Šต ์†๋„ ์ €ํ•˜
    • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ p ์ถ”๊ฐ€

6. Batch Normalization

๊ฐœ์š”

  • ํ•™์Šต ์†๋„๋ฅผ ๋†’์ด๊ณ  ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • ๊ฐ ์ธต์˜ ์ž…๋ ฅ์„ ์ •๊ทœํ™”ํ•˜์—ฌ internal covariance shift ๋ฌธ์ œ ์™„ํ™”
  • ์ž…๋ ฅ์„ ์ •๊ทœํ™”(Standardization)ํ›„ scaling(ฮณ), shifting(ฮฒ) ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ณด์ •
  • Hyper-Parameter์˜ ์ถ”๊ฐ€ ์—†์ด ๋น ๋ฅธ ํ•™์Šต๊ณผ ๋†’์€ ์„ฑ๋Šฅ ๋ชจ๋‘ ๋ณด์žฅ!
  • RNN ๋นผ๊ณ  ๋ชจ๋‘ ์‚ฌ์šฉ๊ฐ€๋Šฅ (๋Œ€์‹  Layer Normalization ์‚ฌ์šฉ ๊ฐ€๋Šฅ)

๋‚ด๋ถ€ ๊ณต๋ณ€๋Ÿ‰ ๋ณ€ํ™”(Internal Covariance Shift)

  • ํ•™์Šต ์ค‘ ๋„คํŠธ์›Œํฌ ๊ฐ ์ธต์— ๋“ค์–ด์˜ค๋Š” ์ž…๋ ฅ๊ฐ’์˜ ๋ถ„ํฌ๊ฐ€ ๊ณ„์† ๋ฐ”๋€Œ๋Š” ํ˜„์ƒ
  • ์ฆ‰, Layer๋งˆ๋‹ค Input data์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๊ฒŒ ๋˜์–ด, ๋ชจ๋ธ์€ ๊ฒฐ๊ตญ ์ผ๊ด€์ ์ธ ํ•™์Šต์ด ์–ด๋ ค์›Œ์ง
    e.g.
์–ด๋–ค ์ธต์˜ ์ถœ๋ ฅ์ด ์›๋ž˜ ํ‰๊ท  0, ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 1
๊ทธ๋Ÿฐ๋ฐ ํ•™์Šต์ด ๋˜๋ฉด์„œ ํ‰๊ท ์ด 10, ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 5๋กœ ๋ฐ”๋€๋‹ค๋ฉด?
โ†’ ๋‹ค์Œ ์ธต์€ ์™„์ „ํžˆ ๋‹ค๋ฅธ ๋ถ„ํฌ์˜ ์ž…๋ ฅ์„ ๋ฐ›๊ฒŒ ๋˜๊ณ , ๋‹ค์‹œ ์ ์‘ํ•ด์•ผ ํ•จ

๊ณต๋ณ€๋Ÿ‰ ๋ณ€ํ™”(covariate shift)
ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์™€ ๋‹ค๋ฅธ ์ƒํ™ฉ์„ ์˜๋ฏธ

์ˆ˜์‹

y=ฮณโˆ—(xโˆ’ฮผ)(ฯƒ+ฮต)+ฮฒy = ฮณ * {(x - ฮผ) \over (ฯƒ + ฮต)} + ฮฒ

  • x: ์ž…๋ ฅ, ฮผ: ํ‰๊ท , ฯƒ: ํ‘œ์ค€ํŽธ์ฐจ, ฮณ, ฮฒ: ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ, ฮต: ๋งค์šฐ ์ž‘์€ ๊ฐ’
  • |ฮผ| = |ฯƒ| = (vector_size, )
  • ์ •๊ทœํ™”ํ•˜๊ณ  ํ‘œํ˜„๋ ฅ์„ ์žƒ์ง€ ์•Š๋„๋ก ๋‹ค์‹œ ์Šค์ผ€์ผ ์กฐ์ • ๋ฐ ์ด๋™

ํ•™์Šต vs ์ถ”๋ก 

  • Drop out๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํ•™์Šต๊ณผ ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ ๋ชจ๋“œ๊ฐ€ ๋‹ฌ๋ผ์ ธ์•ผ ํ•จ
  • ํ•™์Šต(Training) ์‹œ
    • model.train()
    • mini-batch ๊ธฐ์ค€ ํ‰๊ท /ํ‘œ์ค€ํŽธ์ฐจ ๊ณ„์‚ฐ
  • ์ถ”๋ก (Validataion, Test) ์‹œ
    • model.eval()
    • moving average ํ†ต๊ณ„ ์‚ฌ์šฉ (๋ฏธ๋ž˜ ๋ฐ์ดํ„ฐ ๋ณด๋Š” ๊ฒƒ์€ cheating)
    • ์ง€๊ธˆ๊นŒ์ง€์˜ ์ž…๋ ฅ์„ ํ†ตํ•ด ํ‰๊ท /ํ‘œ์ค€ํŽธ์ฐจ ๊ณ„์‚ฐ

์ ์šฉ ์œ„์น˜

  • ๋ณดํ†ต ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹:
    1) Linear โ†’ Activation โ†’ BN
    2) Linear โ†’ BN โ†’ Activation
  • Dropout ๋Œ€์ฒด ๊ฐ€๋Šฅ

7. Dropout vs Batch Normalization

ํ•ญ๋ชฉDropoutBatch Normalization
๋ชฉ์ ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€ํ•™์Šต ์•ˆ์ •ํ™” ๋ฐ ์ผ๋ฐ˜ํ™”
๋ฐฉ์‹๋‰ด๋Ÿฐ ๋ฌด์ž‘์œ„ ๋น„ํ™œ์„ฑํ™”์ž…๋ ฅ ์ •๊ทœํ™” + ์žฌ์Šค์ผ€์ผ๋ง
์žฅ์ Generalization ํ–ฅ์ƒ๋น ๋ฅธ ์ˆ˜๋ ด, ์„ฑ๋Šฅ ํ–ฅ์ƒ
๋‹จ์ ํ•™์Šต ๋А๋ฆผ, p ์„ค์ • ํ•„์š”์ถ”๊ฐ€ ๊ณ„์‚ฐ ํ•„์š”
Mode ์ „ํ™˜ํ•„์š” (train/eval)ํ•„์š” (train/eval)

model.eval()์„ ํ˜ธ์ถœํ•˜๋ฉด Dropout๊ณผ Batch Normalization์ด ์ž๋™์œผ๋กœ "inference ๋ชจ๋“œ"๋กœ ์ „ํ™˜

profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€