๐ŸŽฒ[AI] Back-propagation

manduยท2025๋…„ 5์›” 5์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
12/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.


1. ๋ฌธ์ œ์ : Layer๊ฐ€ ๋งŽ์•„์กŒ์„ ๋•Œ๋Š”?

Loss=1Nโˆ‘i=1N(yiโˆ’y^i)2y^=f3โˆ˜f2โˆ˜f1h1=f1(x)=ฯƒ1(xโ‹…W1+b1)h2=f2(h1)=ฯƒ2(h1โ‹…W2+b2)y^=f3(h2)=ฯƒ3(h2โ‹…W3+b3)Loss = {1 \over N}\sum_{i=1}^N(y_i-ลท_i)^2 \\ ลท = f_3 \circ f_2 \circ f_1 \\ h_1 = f_1(x) = \sigma_1 (x \cdot W_1 + b_1) \\ h_2 = f_2(h_1) = \sigma_2 (h_1 \cdot W_2 + b_2) \\ ลท = f_3(h_2) = \sigma_3 (h_2 \cdot W_3 + b_3) \\
  • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ์—ฌ๋Ÿฌ ์ธต(layer)์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๊ฐ ์ธต์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋งŽ์Œ
  • ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•ด ๋งค๋ฒˆ Loss๋ฅผ ์ง์ ‘ ๋ฏธ๋ถ„ํ•˜๋ฉด ๋งค์šฐ ๋ณต์žกํ•˜๋ฏ€๋กœ ๋น„ํšจ์œจ์ 
  • ์ค‘๋ณต๋œ ์—ฐ์‚ฐ์ด ๋งŽ์•„ ๊ณ„์‚ฐ ๋น„์šฉ์ด ํผ

2. ํ•ด๊ฒฐ์ฑ…: Back-propagation

2.1 ๊ฐœ๋…

  • ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์€ ํ•ฉ์„ฑํ•จ์ˆ˜๋กœ ํ‘œํ˜„๋จ
  • Chain Rule(์—ฐ์‡„ ๋ฒ•์น™)์„ ํ™œ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์„ ๊ฐ ๊ตฌ์„ฑ ํ•จ์ˆ˜๋ณ„๋กœ ๋‚˜๋ˆ ์„œ ์ฒ˜๋ฆฌ
  • ์ฆ‰, ๊ธฐ์กด์— ๊ณ„์‚ฐํ•œ ๊ฐ ํ•จ์ˆ˜๋ณ„ ๋ฏธ๋ถ„๊ฐ’๋“ค์„ ์žฌํ™œ์šฉํ•˜์—ฌ ๋ฏธ๋ถ„ ๊ณผ์ •์„ ํšจ์œจ์ ์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ
  • ๋’ค์ชฝ์œผ๋กœ ํผ์ ธ๋‚˜๊ฐ€๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— back-propagation์ด๋ผ๊ณ  ํ•จ

2.2 ์›๋ฆฌ

  • ์‹ ๊ฒฝ๋ง์€ ํ•ฉ์„ฑ ํ•จ์ˆ˜์˜ ๊ตฌ์กฐ:
    f(x) = f3(f2(f1(x)))

  • Chain Rule์„ ์ ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„ํ•ด:
    โˆ‚L/โˆ‚x = โˆ‚L/โˆ‚f3 * โˆ‚f3/โˆ‚f2 * โˆ‚f2/โˆ‚f1 *โˆ‚f1/โˆ‚x

2.3 ๊ณ„์‚ฐ ์ ˆ์ฐจ

  1. Feed-forward๋กœ ์ถœ๋ ฅ๊ฐ’ ๋ฐ Loss ๊ณ„์‚ฐ
  2. Back-propagation์„ ํ†ตํ•ด ์ถœ๋ ฅ์ธต๋ถ€ํ„ฐ ์—ญ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ ์ธต์˜ gradient ๊ณ„์‚ฐ (๋ฏธ๋ถ„)
  3. Gradient Descent๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ
  • ๋ฌผ๋ก  Pytorch์—์„œ Backward() ํ•จ์ˆ˜๊ฐ€ ๋‹ค ํ•ด์คŒ..!

profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€