๐ŸŽฒ[AI] Hyper-parameter

manduยท2025๋…„ 5์›” 6์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
15/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.


1. Hyper-parameter๋ž€?

  • ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ž๋™์œผ๋กœ ํ•™์Šต๋˜์ง€ ์•Š์ง€๋งŒ ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ์„ค์ •๊ฐ’์œผ๋กœ,
    ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์„ค์ •ํ•˜๋ฉฐ, ์ฃผ๋กœ ์‹คํ—˜์  ํŠœ๋‹์ด ํ•„์š”

    โ†” Model Parameter(== Weight Parameter): ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ์„ค์ • ๊ฐ’์œผ๋กœ, ๋ฐ์ดํ„ฐ ํ•™์Šต์„ ํ†ตํ•ด ๋ณ€๊ฒฝ๋จ
  • e.g.
    • ํ•™์Šต๋ฅ  (Learning Rate)
    • ๋„คํŠธ์›Œํฌ ๊นŠ์ด์™€ ๋„ˆ๋น„ (Depth & Width)
    • ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์ข…๋ฅ˜ (ReLU, Leaky ReLU, Leaky ReLU์˜ ๊ฐ๋„, ...)
    • ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• (์–ด๋–ค random ํ•จ์ˆ˜๋ฅผ ์“ธ ๊ฒƒ์ธ๊ฐ€?)
    • Epoch ์ˆ˜, Mini-batch ํฌ๊ธฐ
    • ๋“ฑ๋“ฑ
  • Grid-search, Bayesian Optimization์„ ํ†ตํ•ด ์ตœ์ ํ™”๋œ Hyper-parameter๋ฅผ ์ฐพ์•„์•ผ ํ•จ
  • ํ—ˆ๋‚˜, ๋Œ€๊ฒŒ ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์€ ์•ฝ๊ฐ„์˜ ์„ฑ๋Šฅ ๋ณ€ํ™”๋งŒ ์•ผ๊ธฐํ•˜๊ธฐ๋„ ํ•จ
  • ๋”ฐ๋ผ์„œ Criticalํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด ๋ฌด์—‡์ธ์ง€ ์ธ์ง€ํ•˜๊ณ  ์ฃผ๋กœ ํŠœ๋‹ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”
  • ์–ด์ฉ” ์ˆ˜ ์—†์ด ๋…ธํ•˜์šฐ๊ฐ€ ์ค‘์š”ํ•œ ์˜์—ญ

Bayesian Optimization

  • Bayesian Optimization์€ ๋น„์‹ผ ํ•จ์ˆ˜(f(x))๋ฅผ ์ตœ๋Œ€ํ•œ ์ ์€ ํ‰๊ฐ€ ํšŸ์ˆ˜๋กœ ์ตœ์ ํ™”ํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์ž‘๋™ ์›๋ฆฌ
    1. Surrogate ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ
      ์šฐ๋ฆฌ๊ฐ€ ์ตœ์ ํ™”ํ•˜๋ ค๋Š” ์‹ค์ œ ํ•จ์ˆ˜ f(x)๋ฅผ ๋ฐ”๋กœ ์•Œ ์ˆ˜ ์—†์œผ๋‹ˆ๊นŒ,
      ๋Œ€์‹  ๊ทธ๊ฑธ ํ™•๋ฅ  ๋ชจ๋ธ(Gaussian Process)๋กœ ๊ทผ์‚ฌํ•จ.
      ์ด ๋ชจ๋ธ์€ x์— ๋Œ€ํ•ด f(x)๊ฐ€ ์–ด๋–ค ๊ฐ’์ผ์ง€, ๋ถˆํ™•์‹ค์„ฑ๋„ ๊ฐ™์ด ์•Œ๋ ค์คŒ.
    2. Acquisition Function ๊ณ„์‚ฐ
      ์ƒˆ๋กœ์šด ์ ์„ ์–ด๋””์„œ ํ‰๊ฐ€ํ• ์ง€ ๊ฒฐ์ •ํ•  ํ•จ์ˆ˜.
      ์˜ˆ: ๊ธฐ๋Œ€ ๊ฐœ์„ (Expected Improvement, EI), Upper Confidence Bound(UCB) ๋“ฑ
      ์›์น™: ์ž˜๋  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๊ณ  + ๋ถˆํ™•์‹ค์„ฑ์ด ํฐ ๊ณณ์„ ์„ ํƒํ•จ.
    3. ํ•จ์ˆ˜ ํ‰๊ฐ€ & ์—…๋ฐ์ดํŠธ
      ์„ ํƒํ•œ x์—์„œ ์‹ค์ œ f(x)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , surrogate ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธ
      ๋ฐ˜๋ณต โ†’ ์ ์  ์ตœ์ ๊ฐ’์— ๊ฐ€๊นŒ์›Œ์ง

2. TIP: ํšจ์œจ์ ์ธ ์‹คํ—˜ ๋ฐฉ๋ฒ•

  • Baseline์„ ์ง“๊ณ  ์‹œ์ž‘ํ•˜์ž: ๊ฐ€์žฅ ๊ธฐ์ดˆ์ ์ธ ๋ชจ๋ธ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•  ๊ฒƒ!
  • ๊ทธ ๋’ค๋กœ ์ˆ˜๋งŽ์€ ์‹คํ—˜ ๊ณผ์ • ๋ฐ ๊ฒฐ๊ณผ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ณ  ์ •๋ฆฌํ•˜๋ฉด์„œ ์ ์ง„์ ์œผ๋กœ ๋ชจ๋ธ์„ ์„ฑ์žฅ์‹œ์ผœ์•ผ ํ•จ
  • e.g. ๋ชจ๋ธ ํŒŒ์ผ ์ด๋ฆ„์— ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํฌํ•จ:
    model.n_layers-10.n_epochs-100.act-leaky_relu.loss-xxx.accuracy-xx.pth
  • ๋‹น์—ฐํžˆ Table๋กœ ์ •๋ฆฌํ•˜๋ฉด ๋”์šฑ ์ง๊ด€์ 
  • ์‹คํ—˜ ๊ด€๋ฆฌ ํˆด ์ถ”์ฒœ:

profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€