DNN (Deep Neural Networks)

์ฐฝ์Šˆยท2025๋…„ 4์›” 15์ผ

Deep Learning

๋ชฉ๋ก ๋ณด๊ธฐ
15/16
post-thumbnail

์‹ฌ์ธต ์‹ ๊ฒฝ๋ง (Deep Neural Networks)

์‹ฌ์ธต ์‹ ๊ฒฝ๋ง(DNN)์€ ๋‹ค์ธต ํผ์…‰ํŠธ๋ก (MLP: Multi-Layer Perceptron)์˜ ์€๋‹‰์ธต์„ ์—ฌ๋Ÿฌ ๊ฐœ ์‚ฌ์šฉํ•œ ํ˜•ํƒœ์ด๋‹ค. ์ฆ‰, ๋‹จ์ผ ์€๋‹‰์ธต์„ ๋„˜์–ด์„œ ๊นŠ์€ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ์‹ ๊ฒฝ๋ง์„ ์˜๋ฏธํ•œ๋‹ค.

  • ์ž…๋ ฅ์ธต โ†’ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์€๋‹‰์ธต โ†’ ์ถœ๋ ฅ์ธต ๊ตฌ์กฐ
  • MLP๋Š” ์€๋‹‰์ธต์ด 1๊ฐœ์ด์ง€๋งŒ DNN์€ ์€๋‹‰์ธต์ด ๋‹ค์ˆ˜ ์กด์žฌํ•œ๋‹ค.
  • MLP์™€ DNN์€ ๊ธฐ๋ณธ์ ์ธ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜(ex. ์—ญ์ „ํŒŒ, ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•)์ด ๋™์ผํ•˜๋‹ค.
  • ์ตœ๊ทผ์— ๋”ฅ๋Ÿฌ๋‹์€ ์ปดํ“จํ„ฐ ์‹œ๊ฐ, ์Œ์„ฑ ์ธ์‹, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ, ์†Œ์…œ ๋„คํŠธ์›Œํฌ ํ•„ํ„ฐ๋ง, ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋“ฑ์— ์ ์šฉ๋˜์–ด์„œ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€์— ํ•„์ ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ณ  ์žˆ๋‹ค.


๐Ÿคจ MLP์˜ ๋ฌธ์ œ์  ํ•ด๊ฒฐ

  • ์€๋‹‰์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
    โžก๏ธ ReLU์™€ BatchNorm์œผ๋กœ ํ•ด๊ฒฐ

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•˜๋ฉด ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜์˜€๋‹ค.
    โžก๏ธ ์ •๊ทœํ™”์™€ Dropout ๋“ฑ์œผ๋กœ ํ•ด๊ฒฐ

  • 2012๋…„ AlexNet์ด ImageNet ๋Œ€ํšŒ์—์„œ ์šฐ์Šนํ•˜๋ฉฐ, ๋”ฅ๋Ÿฌ๋‹ ํ˜๋ช…์˜ ์‹œ์ž‘์ ์ด ๋˜์—ˆ๋‹ค.

๐Ÿš‘ GPU์˜ ๋„์›€

  • DNN์˜ ํ•™์Šต ์†๋„๋Š” ์ƒ๋‹นํžˆ ๋А๋ฆฌ๊ณ  ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต์— ์‹œ๊ฐ„๊ณผ ์ž์›์ด ๋งŽ์ด ์†Œ๋ชจ๋˜์—ˆ๋‹ค.

  • ๊ฒŒ์ด๋จธ๋“ค์˜ ์˜ํ–ฅ์œผ๋กœ GPU ๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•˜๋ฉด์„œ GPU์˜ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์„ ๋”ฅ๋Ÿฌ๋‹์ด ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋Š” ์˜ํ–ฅ์ด ํฌ๊ฒŒ ์ž‘์šฉํ–ˆ๋‹ค.


๐Ÿ”— ์€๋‹‰์ธต์˜ ์—ญํ• 

  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์€๋‹‰์ธต ์ค‘์—์„œ ์•ž๋‹จ์€ ๊ฒฝ๊ณ„์„ (์—์ง€)๊ณผ ๊ฐ™์€ ์ €๊ธ‰ ํŠน์ง•๋“ค์„ ์ถ”์ถœํ•˜๊ณ  ๋’ท๋‹จ์€ ์ฝ”๋„ˆ์™€ ๊ฐ™์€ ๊ณ ๊ธ‰ ํŠน์ง•๋“ค์„ ์ถ”์ถœํ•œ๋‹ค.


๐Ÿ”— MLP vs DNN

๊ธฐ์กด ์‹ ๊ฒฝ๋ง์˜ ๋ฌธ์ œ

  • ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค ๋ฌธ์ œ (Gradient vanishing problem)
  • ์†์‹คํ•จ์ˆ˜ ์„ ํƒ ๋ฌธ์ œ (Loss function selection problem)
  • ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฌธ์ œ (Weight initialization problem)
  • ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฌธ์ œ (Categorical data problem)
  • ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ๋ฌธ์ œ (Data normalization problem)
  • ๊ณผ์ž‰ ์ ํ•ฉ ๋ฌธ์ œ (Overfitting problem)


๊ธฐ์กด ์‹ ๊ฒฝ๋ง์˜ ๋ฌธ์ œ๋“ค

โœ… ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค ๋ฌธ์ œ (Gradient vanishing problem)

  • ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์—์„œ ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ์ „๋‹ฌ๋˜๋‹ค๊ฐ€ ์ ์  0์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ํ˜„์ƒ์œผ๋กœ, ์ถœ๋ ฅ์ธต์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๊ฐ€์ค‘์น˜๋“ค์€ ํ•™์Šต์ด ๋˜์ง€ ์•Š๋Š”๋‹ค.
  • ์‹ ๊ฒฝ๋ง์ด ๋„ˆ๋ฌด ๊นŠ๊ธฐ ๋•Œ๋ฌธ์— ๋‚จ์•„์žˆ๋Š” ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•œ๋‹ค.

์›์ธ

์‹œ๊ทธ๋ชจ์ด๋“œ(sigmoid) ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ ๊ทธ ์›์ธ์ด ๋œ๋‹ค.

โžก๏ธ ๊ทธ๋ž˜๋””์–ธํŠธ๋Š” ์ ‘์„ ์˜ ๊ธฐ์šธ๊ธฐ, ์•ฝ๊ฐ„ ํฐ ์–‘์ˆ˜๋‚˜ ์Œ์ˆ˜๊ฐ€ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์— ๋“ค์–ด์˜ค๋ฉด ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ฑฐ์˜ 0์ด ๋จ
โžก๏ธ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ฐ’์€ ํ•ญ์ƒ 0์—์„œ 1์‚ฌ์ด, 1๋ณด๋‹ค ์ž‘์€ ๊ฐ’์ด ์—ฌ๋Ÿฌ๋ฒˆ ๊ณฑํ•ด์ง€๋ฉด ๊ฒฐ๊ตญ 0์œผ๋กœ ์ˆ˜๋ ดํ•จ

ํ•ด๊ฒฐ๋ฐฉ์•ˆ

  • ํ™œ์„ฑํ™”ํ•จ์ˆ˜๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ(sigmoid) ๋Œ€์‹  ReLU ํ•จ์ˆ˜๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค.
  • ReLU ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ๊ฐ’์„ 0๊ณผ 1์‚ฌ์ด๋กœ ์••์ถ•ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ๋ฏธ๋ถ„๊ฐ’์ด 0์ด ์•„๋‹ˆ๋ฉด 1์ด๋˜์–ด ์ถœ๋ ฅ์ธต์˜ ์˜ค์ฐจ๊ฐ€ ๊ฐ์‡ ๋˜์ง€ ์•Š๊ณ  ๊ทธ๋Œ€๋กœ ์—ญ์ „ํŒŒ ๋œ๋‹ค.
  • 0์—์„œ๋Š” ๋ฏธ๋ถ„ ๋ถˆ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๋ฌธ์ œ๋˜์ง€ ์•Š๊ณ  ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.


โœ… ์†์‹คํ•จ์ˆ˜ ์„ ํƒ ๋ฌธ์ œ (Loss function selection problem)

  • ๊ธฐ์กด ์†์‹ค ํ•จ์ˆ˜๋กœ๋Š” MSE(ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ)๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.
    E=1mโˆ‘(yiโˆ’y^i)2E = \frac{1}{m} \sum \left( y_i - \hat{y}_i \right)^2 where mm: ์ถœ๋ ฅ ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜

  • ์ •๋‹ต๊ณผ ์˜ˆ์ธก๊ฐ’์˜ ์ฐจ์ด๊ฐ€ ์ปค์ง€๋ฉด MSE๋„ ์ปค์ง€๋ฏ€๋กœ ์ถฉ๋ถ„ํžˆ ๊ฐ€๋Šฅํ•œ ์†์‹คํ•จ์ˆ˜์ง€๋งŒ, ๋ถ„๋ฅ˜๋ฌธ์ œ๋ฅผ ์œ„ํ•ด์„œ๋Š” ๋” ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ํ•จ์ˆ˜๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค.

  • ์ด๋•Œ, ์ถœ๋ ฅ์ธต์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์†Œํ”„ํŠธ๋งฅ์Šค(softmax) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

์†Œํ”„ํŠธ๋งฅ์Šค(softmax) ํ™œ์„ฑํ™” ใ…Ž๋งˆ์ˆ˜

  • Max ํ•จ์ˆ˜์˜ ์†Œํ”„ํŠธํ•œ ๋ฒ„์ „
  • Max ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ์€ ์ „์ ์œผ๋กœ ์ตœ๋Œ€ ์ž…๋ ฅ ๊ฐ’์— ์˜ํ•˜์—ฌ ๊ฒฐ์ •๋จ
  • ํ•ฉ์ด 1 ์ด๋ฏ€๋กœ ํ™•๋ฅ ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ


๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค ํ•จ์ˆ˜ (Cross Entropy Loss function)

2๊ฐœ์˜ ํ™•๋ฅ  ๋ถ„ํฌ p,qp, q์— ๋Œ€ํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

H(p,q)=โˆ’โˆ‘xp(x)logโกnq(x)H(p, q) = - \sum_x p(x) \log_n q(x)

โžก๏ธ ๋ชฉํ‘œ ์ถœ๋ ฅ ํ™•๋ฅ ๋ถ„ํฌ pp์™€ ์‹ค์ œ ์ถœ๋ ฅ ํ™•๋ฅ ๋ถ„ํฌ qq ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •

`์ผ๋ฐ˜์ ์ธ ๊ณ„์‚ฐ ์‹`:
H(p,q)=โˆ’โˆ‘xp(x)logโกnq(x)=โˆ’(1.0โ‹…logโก0.7+0.0โ‹…logโก0.2+0.0โ‹…logโก0.1)=0.154901H(p, q) = -\sum_x p(x) \log_n q(x) \\ = -(1.0 \cdot \log 0.7 + 0.0 \cdot \log 0.2 + 0.0 \cdot \log 0.1) \\ = 0.154901

์™„๋ฒฝํ•˜๊ฒŒ ์ผ์น˜ํ•œ๋‹ค๋ฉด:

H(p,q)=โˆ’โˆ‘xp(x)logโกnq(x)=โˆ’(1.0โ‹…logโก1.0+0.0โ‹…logโก0.0+0.0โ‹…logโก0.0)=0H(p, q) = -\sum_x p(x) \log_n q(x) \\ = -(1.0 \cdot \log 1.0 + 0.0 \cdot \log 0.0 + 0.0 \cdot \log 0.0) \\ = 0

๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ๊ณ„์‚ฐ

์ถœ๋ ฅ ๋…ธ๋“œ๊ฐ€ 3๊ฐœ ์žˆ๋Š” ์‹ ๊ฒฝ๋ง์—์„œ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ๊ณ„์‚ฐ (๊ฐ ์ƒ˜ํ”Œ ๋ณ„ ๊ณ„์‚ฐ)

  • ์ฒซ๋ฒˆ์งธ ์ƒ˜ํ”Œ

    H(p,q)=โˆ’(logโก(0.1)โ‹…0+logโก(0.3)โ‹…0+logโก(0.6)โ‹…1)=โˆ’(0+0+logโก(0.6))โ‰ˆ0.51\begin{aligned} H(p, q) &= -\left( \log(0.1) \cdot 0 + \log(0.3) \cdot 0 + \log(0.6) \cdot 1 \right) \\ &= - (0 + 0 + \log(0.6)) \approx 0.51 \end{aligned}
  • ๋‘๋ฒˆ์งธ ์ƒ˜ํ”Œ

    H(p,q)=โˆ’(logโก(0.2)โ‹…0+logโก(0.6)โ‹…1+logโก(0.2)โ‹…0)=โˆ’(0+logโก(0.6)+0)โ‰ˆ0.51\begin{aligned} H(p, q) &= -\left( \log(0.2) \cdot 0 + \log(0.6) \cdot 1 + \log(0.2) \cdot 0 \right) \\ &= - (0 + \log(0.6) + 0) \approx 0.51 \end{aligned}
  • ์„ธ๋ฒˆ์งธ ์ƒ˜ํ”Œ

    H(p,q)=โˆ’(logโก(0.3)โ‹…1+logโก(0.4)โ‹…0+logโก(0.3)โ‹…0)=โˆ’(logโก(0.3)+0+0)โ‰ˆ1.20\begin{aligned} H(p, q) &= -\left( \log(0.3) \cdot 1 + \log(0.4) \cdot 0 + \log(0.3) \cdot 0 \right) \\ &= -(\log(0.3) + 0 + 0) \approx 1.20 \end{aligned}
  • 3๊ฐœ ์ƒ˜ํ”Œ์˜ ํ‰๊ท  ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค๋ฅ˜ ๊ณ„์‚ฐ 0.51+0.51+1.203โ‰ˆ0.74\frac{0.51 + 0.51 + 1.20}{3} \approx 0.74


ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ vs ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ

์ถœ๋ ฅ ์œ ๋‹›์ด ํ•˜๋‚˜์ธ ๊ฒฝ์šฐ

  • ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ (MSE)

    E=(yโˆ’y^i)2E = (y - \hat{y}_i)^2
  • ์ด์ง„ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ (Binary Cross Entropy)

    E=โˆ’[ylogโกn(y^)+(1โˆ’y)logโกn(1โˆ’y^)]E = -\left[ y \log_n (\hat{y}) + (1 - y) \log_n (1 - \hat{y}) \right]
  • MSE

    • ์ •๋‹ต์ด 0์ผ ๋•Œ, E=y^i2E = \hat{y}_i^2

    • ์ •๋‹ต์ด 1์ผ ๋•Œ, E=(1โˆ’y^i)2E = (1 - \hat{y}_i)^2

  • BCE

    • ์ •๋‹ต์ด 0์ผ ๋•Œ, E=โˆ’logโกn(1โˆ’y^)E = -\log_n (1 - \hat{y})

    • ์ •๋‹ต์ด 1์ผ ๋•Œ, E=โˆ’logโกn(y^)E = -\log_n (\hat{y})

๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ์˜ ์˜ค์ฐจ๊ฐ€ MSE์˜ ์˜ค์ฐจ๋ณด๋‹ค ํผ์„ ์•Œ์ˆ˜ ์žˆ๋‹ค.
๋ฏธ๋ถ„ ๊ฐ’์€ ๋” ์ฐจ์ด๊ฐ€ ๋‚˜๋ฉฐ ์ด ๋ฏธ๋ถ„๊ฐ’์ด ๊ณฑํ•ด์ ธ์„œ ๊ฐ€์ค‘์น˜๊ฐ€ ๋ณ€๊ฒฝ๋จ์œผ๋กœ ์ด์ง„ ๋ถ„๋ฅ˜๋ฌธ์ œ์—์„œ MSE๋ณด๋‹ค ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ์œ ๋ฆฌ


Keras์—์„œ์˜ ์†์‹คํ•จ์ˆ˜

BinaryCrossEntropy (BCE)

โžก๏ธ ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์‚ฌ์šฉ (ex. ์ด๋ฏธ์ง€๋ฅผ "๊ฐ•์•„์ง€" vs "๊ฐ•์•„์ง€ ์•„๋‹˜" ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ๋•Œ)

BCE=โˆ’1nโˆ‘i=1n(yilogโกn(y^i)+(1โˆ’yi)logโกn(1โˆ’y^i))\mathrm{BCE} = -\frac{1}{n} \sum_{i=1}^{n} \left( y_i \log_n (\hat{y}_i) + (1 - y_i) \log_n (1 - \hat{y}_i) \right)

์ƒ˜ํ”Œย 1:BCE1=โˆ’(1โ‹…logโก(0.8)+(1โˆ’1)โ‹…logโก(1โˆ’0.8))์ƒ˜ํ”Œย 2:BCE2=โˆ’(0โ‹…logโก(0.3)+(1โˆ’0)โ‹…logโก(1โˆ’0.3))์ƒ˜ํ”Œย 3:BCE3=โˆ’(0โ‹…logโก(0.5)+(1โˆ’0)โ‹…logโก(1โˆ’0.5))์ƒ˜ํ”Œย 4:BCE4=โˆ’(1โ‹…logโก(0.9)+(1โˆ’1)โ‹…logโก(1โˆ’0.9))\begin{aligned} \text{์ƒ˜ํ”Œ 1:} \quad \mathrm{BCE}_1 &= -\left( 1 \cdot \log(0.8) + (1 - 1) \cdot \log(1 - 0.8) \right) \\ \text{์ƒ˜ํ”Œ 2:} \quad \mathrm{BCE}_2 &= -\left( 0 \cdot \log(0.3) + (1 - 0) \cdot \log(1 - 0.3) \right) \\ \text{์ƒ˜ํ”Œ 3:} \quad \mathrm{BCE}_3 &= -\left( 0 \cdot \log(0.5) + (1 - 0) \cdot \log(1 - 0.5) \right) \\ \text{์ƒ˜ํ”Œ 4:} \quad \mathrm{BCE}_4 &= -\left( 1 \cdot \log(0.9) + (1 - 1) \cdot \log(1 - 0.9) \right) \end{aligned}

BCE1+BCE2+BCE3+BCE44โ‰ˆ0.345\frac{BCE1 + BCE2 + BCE3 + BCE4}{4} \approx 0.345

import numpy as np
import tensorflow as tf

y_true = [[1], [0], [0], [1]]
y_pred = [[0.8], [0.3], [0.5], [1.9]]
bce = tf.keras.losses.BinaryCrossentropy()
print(bce(y_tre, y_pred).numpy
# 0.34458154

CategoricalCrossentropy (CCE)

โžก๏ธ ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ถ„์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์‚ฌ์šฉ (ex. ์ด๋ฏธ์ง€๋ฅผ "๊ฐ•์•„์ง€" vs "๊ณ ์–‘์ด" vs "ํ˜ธ๋ž‘์ด" ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ๋•Œ)
โžก๏ธ ์ •๋‹ต์€ onehot ์ธ์ฝ”๋”ฉ์œผ๋กœ ์ œ๊ณตํ•œ๋‹ค.

y_true = [[0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [1.0, 0.0, 0.0]] # ๊ณ ์–‘์ด, ํ˜ธ๋ž‘์ด, ๊ฐ•์•„์ง€
y_pred = [[0.6, 0.3, 0.1], [0.3, 0.6, 0.1], [0.1, 0.7, 0.2]]
cce = tf.keras.losses.CategoricalCrossentropy()
print(cce(y_true, y_pred).numpy())
# 1.936381

SparseCategoricalCrossentropy (CCE)

โžก๏ธ ์ •๋‹ต ๋ ˆ์ด๋ธ”์ด onehot ์ธ์ฝ”๋”ฉ์ด ์•„๋‹ˆ๊ณ  ์ •์ˆ˜๋กœ ์ฃผ์–ด์งˆ ๋•Œ ์‚ฌ์šฉ(ex. ์ด๋ฏธ์ง€๋ฅผ "0(๊ฐ•์•„์ง€)" vs "1(๊ณ ์–‘์ด)" vs "2(ํ˜ธ๋ž‘์ด)" ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ๋•Œ)

y_true = np.array([1, 2, 0])# ๊ณ ์–‘์ด, ํ˜ธ๋ž‘์ด, ๊ฐ•์•„์ง€
y_pred = np.array([[0.6, 0.3, 0.1], [0.3, 0.6, 0.1], [0.1, 0.7, 0.2]])
scce = tf.keras.losses.SparseCategoricalCrossentropy()
print(scce(y_true, y_pred))
# 1.936381

MeanSquaredError

โžก๏ธ ํšŒ๊ท€๋ฌธ์ œ์—์„œ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ์‚ฌ์šฉํ•œ๋‹ค.

y_true = [ 12 , 20 , 29 , 60 ]
y_pred = [ 14 , 18 , 27 , 55 ]
mse = tf.keras.losses.MeanSquaredError ()
print(mse(y_true, y_pred).numpy())
# 9.25

์‚ฌ์šฉ์ž ์ง€์ • ์†์‹คํ•จ์ˆ˜ ๋งŒ๋“ค๊ธฐ

โžก๏ธ ์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์„ ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜์—ฌ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.
โžก๏ธ ๋ชจ๋ธ์˜ ์ปดํŒŒ์ผ ๋‹จ๊ณ„์—์„œ ํ•จ์ˆ˜๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์ž‘์„ฑํ•œ๋‹ค.

def custom_loss_function(y_true, y_pred) :
	squared_difference = tf.square (y_true-y_pred)
	return tf.reduce_mean (squared_difference, axis = -1 )

model.compile(optimizer=โ€˜adamโ€™, loss= custom_loss_function)

โœ… ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฌธ์ œ (Weight initialization problem)

  • ๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋ฉด ์—ญ์ „ํŒŒ๊ฐ€ ์ œ๋Œ€๋กœ ์ด๋ค„์ง€์ง€ ์•Š๋Š”๋‹ค.

  • ๋ชจ๋“  ๊ฐ€์ค‘์น˜๊ฐ€ ๋™์ผํ•˜๋ฉด ๋…ธ๋“œ๋“ค์ด ๋™์ผํ•œ ์—ญํ• ๋งŒ ํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ ๋Œ€์นญ์„ ๊นจ์•ผ ํ•œ๋‹ค.

    ๊ฐ€์ค‘์น˜๊ฐ€ ๋™์ผํ•  ๋•Œ์˜ ๋ฌธ์ œ์ 

  • ์ด๋ฅผ ์œ„ํ•ด ๊ฐ€์ค‘์น˜๋Š” ๋‚œ์ˆ˜๋กœ ์ดˆ๊ธฐํ™”ํ•ด์•ผ ํ•œ๋‹ค.

  • ๋„ˆ๋ฌด ํฐ ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜๋Š” ๊ทธ๋ž˜๋””์–ธํŠธ ํญ๋ฐœ์„ ์œ ๋ฐœํ•˜์—ฌ ํ•™์Šต์ด ๋ฐœ์‚ฐํ•˜๊ฒŒ ๋œ๋‹ค.

ฮดj={(outjโˆ’tj)fโ€ฒ(netj),j๊ฐ€ย ์ถœ๋ ฅ์ธตย ๋…ธ๋“œ์ผย ๋•Œ(โˆ‘kwjkฮดk)fโ€ฒ(netj),j๊ฐ€ย ์€๋‹‰์ธตย ๋…ธ๋“œ์ผย ๋•Œ\delta_j = \begin{cases} (out_j - t_j) f'(net_j), & \text{j๊ฐ€ ์ถœ๋ ฅ์ธต ๋…ธ๋“œ์ผ ๋•Œ} \\ \left( \sum_k w_{jk} \delta_k \right) f'(net_j), & \text{j๊ฐ€ ์€๋‹‰์ธต ๋…ธ๋“œ์ผ ๋•Œ} \end{cases}

์œ„ ์‹์—์„œ ๊ฐ€์ค‘์น˜๊ฐ€ 0์ด๋ผ๋ฉด ๋ธํƒ€๊ฐ€ ์ „๋‹ฌ๋˜์ง€ ์•Š๋Š”๋‹ค.

๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•

โžก๏ธ Xavier ๋ฐฉ๋ฒ•

  • ๋ถ„์‚ฐ var(wi)=1Ninvar(w_i)=\frac{1}{N_{in}} ์„ ๊ฐ€์ง€๋Š” ์ •๊ทœ๋ถ„ํ‘œ์—์„œ ๋‚œ์ˆ˜๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค.
  • ์—ฌ๊ธฐ์„œ NinN_{in}์€ ์œ ๋‹›์œผ๋กœ ๋“ค์–ด์˜ค๋Š” ๊ฐ„์„ ์˜ ๊ฐœ์ˆ˜์ด๊ณ  NoutN_{out}์€ ์œ ๋‹›์—์„œ ๋‚˜๊ฐ€๋Š” ๊ฐ„์„ ์˜ ๊ฐœ์ˆ˜์ด๋‹ค.

โžก๏ธ He์˜ ๋ฐฉ๋ฒ•

  • ๋ถ„์‚ฐ var(wi)=2Nin+Noutvar(w_i)=\frac{2}{N_{in}+N_{out}} ์„ ๊ฐ€์ง€๋Š” ์ •๊ทœ๋ถ„ํฌ์—์„œ ๋‚œ์ˆ˜๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค.

W = np.random.randn(N_in, N_out)*np.sqrt(1/N_in) # Xavier์˜ ๋ฐฉ๋ฒ•
W = np.random.randn(N_in, N_out)*np.sqrt(2/(N_in+N_out)) # He์˜ ๋ฐฉ๋ฒ•

์ผ€๋ผ์Šค์—์„œ์˜ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•

# layers, initializers ๋ชจ๋“ˆ import
from tensorflow.keras import layers
from tensorflow.keras import initializers

# initializers๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”
layer = layers.Dense(units=64, kernel_initializer=initializers.RandomNormal(stddev=0.01),
bias_initializer=initializers.Zeros())

# ๋ฌธ์ž์—ด ์‹๋ณ„์ž๋ฅผ ํ†ตํ•ด ์ „๋‹ฌ
layer = layers.Dense(units=64, kernel_initializer='random_normal', bias_initializer='zeros')

# RandomNormal ํด๋ž˜์Šค: ์ •๊ทœ ๋ถ„ํฌ๋กœ ํ…์„œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ด๋‹ˆ์…œ๋ผ์ด์ €
initializer = tf.keras.initializers.RandomNormal(mean=0, stddev=1.)
layers = tf.keras.layers.Dense(3, kernel_initializer=initializer)

# RandomUniform ํด๋ž˜์Šค: ๊ท ์ผ ๋ถ„ํฌ๋กœ ํ…์„œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ด๋‹ˆ์…œ๋ผ์ด์ €
initializer = tf.keras.initializers.RandomUniform(minval=0, maxval=1.)
layers = tf.keras.layers.Dense(3, kernel_initializer=initializer)

โœ… ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฌธ์ œ (Categorical data problem)

์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ค‘์—๋Š” โ€œmaleโ€, โ€œfemaleโ€๊ณผ ๊ฐ™์ด ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋“ค์ด ์•„์ฃผ ๋งŽ๋‹ค.
โžก๏ธ ์ˆซ์ž๋กœ ๋ฐ”๊พธ์–ด ์ฃผ์–ด์•ผ ํ•จ

for ix in train.index:
	if train.loc[ix, 'Sex']=="male":
		train.loc[ix, 'Sex']=1
	else:
		train.loc[ix, 'Sex']=0

๐Ÿ” ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๋Š” 3๊ฐ€์ง€ ๋ฐฉ๋ฒ•

  • ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ(Integer Encoding): ๊ฐ ๋ ˆ์ด๋ธ”์ด ์ •์ˆ˜๋กœ ๋งคํ•‘๋˜๋Š” ๊ฒฝ์šฐ
  • ์›-ํ•ซ ์ธ์ฝ”๋”ฉ(One-Hot Encoding): ๊ฐ ๋ ˆ์ด๋ธ”์ด ์ด์ง„ ๋ฒกํ„ฐ์— ๋งคํ•‘๋˜๋Š” ๊ฒฝ์šฐ
  • ์ž„๋ฒ ๋”ฉ(Embedding): ๋ฒ”์ฃผ์˜ ๋ถ„์‚ฐ๋œ ํ‘œํ˜„์ด ํ•™์Šต๋˜๋Š” ๊ฒฝ์šฐ โ†’ ์ถ”ํ›„ ํฌ์ŠคํŒ…์— ๋‹ค๋ฃธ

์ •์ˆ˜ ์ธ์ฝ”๋”ฉ

  • sklearn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์ œ๊ณตํ•˜๋Š” Label Encoder ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉ
import numpy as np
X = np.array([['Korea', 44, 7200], ['Japan', 27, 4800], ['China', 30, 6100]])

from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X[:, 0] = labelencoder.fit_transform(X[:, 0])
print(X)
[['2' '44' '7200']
['1' '27' '4800']
['0' '30' '6100']]

onehot ์ธ์ฝ”๋”ฉ(sklearn ์‚ฌ์šฉ)

import numpy as np
X = np.array([['Korea', 38, 7200], ['Japan', 27, 4800], ['China', 30, 3100]])

from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder()

# ์›ํ•˜๋Š” ์—ด์„ ๋ฝ‘์•„์„œ 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค์–ด์„œ ์ „๋‹ฌํ•˜์—ฌ์•ผ ํ•œ๋‹ค.
XX = onehotencoder.fit_transform(X[:,0].reshape(-1,1)).toarray()
print(XX)

X = np.delete(X, [0], axis=1) # 0๋ฒˆ์งธ ์—ด ์‚ญ์ œ
X = np.concatenate((XX, X), axis = 1) # X์™€ XX๋ฅผ ๋ถ™์ธ๋‹ค.
print(X)
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
[['0.0' '0.0' '1.0' '38' '7200']
['0.0' '1.0' '0.0' '27' '4800']
['1.0' '0.0' '0.0' '30' '3100']]

onehot ์ธ์ฝ”๋”ฉ(Keras ์‚ฌ์šฉ)

  • to_categorical()์„ ํ˜ธ์ถœํ•˜์—ฌ ๊ตฌํ˜„
class_vector =[2, 6, 6, 1]

from tensorflow.keras.utils import to_categorical
output = to_categorical(class_vector, num_classes = 7)
print(output)
[[0 0 1 0 0 0 0]
[0 0 0 0 0 0 1]
[0 0 0 0 0 0 1]
[0 1 0 0 0 0 0]]

โœ… ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ๋ฌธ์ œ (Data normalization problem)

  • ์‹ ๊ฒฝ๋ง์€ ์ž…๋ ฅ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฒ”์œ„์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ•™์Šตํ•˜๋ฏ€๋กœ, ์ž…๋ ฅ ๊ฐ’์˜ ๋ฒ”์œ„๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.

  • ๋ถ€๋™์†Œ์ˆ˜์  ์ •๋ฐ€๋„ ๋ฌธ์ œ๋ฅผ ํ”ผํ•˜๋ ค๋ฉด ์ž…๋ ฅ์„ ๋Œ€๋žต -1.0 ~ 1.0 ๋ฒ”์œ„๋กœ ๋งž์ถ”๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

  • ์ผ๋ฐ˜์ ์œผ๋กœ ํ‰๊ท ์ด 0์ด ๋˜๋„๋ก ์ •๊ทœํ™”ํ•˜๊ณ , ๋ฒ”์œ„๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์กฐ์ •ํ•œ๋‹ค.

xjโ€ฒ=xjโˆ’ฮผjฯƒjx_j' = \frac{x_j - \mu_j}{\sigma_j}

์ •๊ทœํ™” ์˜ˆ์ œ

โžก๏ธ ์‚ฌ๋žŒ์˜ ๋‚˜์ด, ์„ฑ๋ณ„, ์—ฐ๊ฐ„ ์ˆ˜์ž…์„ ๊ธฐ์ค€์œผ๋กœ, ์„ ํ˜ธํ•˜๋Š” ์ž๋™์ฐจ์˜ ํƒ€์ž…(์„ธ๋‹จ ์•„๋‹ˆ๋ฉด SUV)์„ ์˜ˆ์ธกํ•  ์‹ ๊ฒฝ๋ง์„ ๋งŒ๋“ค๊ณ  ์‹ถ๋‹ค๊ณ  ๊ฐ€์ •

๋‚˜์ด, ์„ฑ๋ณ„, ์—ฐ๊ฐ„์ˆ˜์ž…, ์ž๋™์ฐจ

[0]		30		male		3800		SUV
[1]		36		female		4200		SEDAN
[2]		52		male		4000		SUV
[3]		42		female		4400		SEDAN
	์ •๊ทœํ™”ํ•„์š”	๋ฒ”์ฃผํ˜•๋ฐ์ดํ„ฐ				์›-ํ•ซ์ธ์ฝ”๋”ฉ
[0]	-1.23		-1.0		-1.34	(1.0 0.0)
[1]	-0.49		1.0			0.45	(0.0 1.0)
[2]	1.48		-1.0		-0.45	(1.0 0.0)
[3]	0.25		1.0			1.34	(0.0 1.0)

๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ๋ฐฉ๋ฒ• (sklearn ์‚ฌ์šฉ)

sklearn์˜ MinmaxScaler ํด๋ž˜์Šค๋Š” ๋‹ค์Œ์˜ Numpy ์ˆ˜์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •๊ทœํ™”ํ•œ๋‹ค.

from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]

scaler = MinMaxScaler()
scaler.fit(data) # ์ตœ๋Œ€๊ฐ’๊ณผ ์ตœ์†Œ๊ฐ’์„ ์•Œ์•„๋‚ธ๋‹ค.
print(scaler.transform(data)) # ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•œ๋‹ค.
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]

๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ๋ฐฉ๋ฒ• (Keras ์‚ฌ์šฉ)

๋ฐ์ดํ„ฐ ์ •๊ทœํ™”๊ฐ€ ํ•„์š”ํ•˜๋ฉด ์ผ€๋ผ์Šค์˜ Normalization ๋ ˆ์ด์–ด๋ฅผ ์ค‘๊ฐ„์— ๋„ฃ์œผ๋ฉด ๋œ๋‹ค.

  • 1๋‹จ๊ณ„: 0~1 ๋ฒ”์œ„๋กœ ํ‘œ์ค€ํ™”
Xstd=Xโˆ’Xminโก(axis=0)Xmaxโก(axis=0)โˆ’Xminโก(axis=0)X_{\text{std}} = \frac{X - X_{\min(axis=0)}}{X_{\max(axis=0)} - X_{\min(axis=0)}}
  • 2๋‹จ๊ณ„: ์›ํ•˜๋Š” ๋ฒ”์œ„ [min, max]๋กœ ์Šค์ผ€์ผ๋ง
Xscaled=Xstdโ‹…(maxโˆ’min)+minX_{\text{scaled}} = X_{\text{std}} \cdot (\text{max} - \text{min}) + \text{min}
tf.keras.preprocessing.Normalization(
	axis=-1, dtype=None, mean=None, variance=None, **kwargs
)
# axis: ์œ ์ง€ํ•ด์•ผ ํ•˜๋Š” ์ถ•, mean: ์ •๊ทœํ™” ์ค‘ ์‚ฌ์šฉํ•  ํ‰๊ท ๊ฐ’, variance=์ •๊ทœํ™” ์ค‘ ์‚ฌ์šฉํ•  ๋ถ„์‚ฐ๊ฐ’
# ์ด ๋ ˆ์ด์–ด๋Š” ์ž…๋ ฅ์„ ํ‰๊ท ์ด 0์ด๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 1์ธ ๋ถ„ํฌ๋กœ ์ •๊ทœํ™”์‹œํ‚ด
import tensorflow as tf
import numpy as np

adapt_data = np.array([[1.], [2.], [3.], [4.], [5.]], dtype=np.float32)
input_data = np.array([[1.], [2.], [3.]], np.float32)
layer = tf.keras.layers.Normalization()
layer.adapt(adapt_data)
layer(input_data)

#<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
#array([[-1.4142135 ],
# [-0.70710677],
# [ 0. ]], dtype=float32)>

โœ… ๊ณผ์ž‰ ์ ํ•ฉ ๋ฌธ์ œ (Overfitting problem)

  • ๊ณผ์ž‰ ์ ํ•ฉ(overfitting)์€ ์ง€๋‚˜์น˜๊ฒŒ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ํŠนํ™”๋ผ ์‹ค์ œ ์ ์šฉ ์‹œ ์ข‹์ง€ ๋ชปํ•œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๋Š” ๊ฒƒ์„
    ๋งํ•œ๋‹ค.
  • ๊ณผ์ž‰ ์ ํ•ฉ์€ ์‹ ๊ฒฝ๋ง์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋งŽ์„ ๋•Œ ๋ฐœ์ƒํ•œ๋‹ค.
  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์™ธ์— ๊ฒ€์ฆ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†์‹ค(MSE)๋ฅผ ๊ณ„์‚ฐํ•ด ๋ณด๋ฉด U shape์˜ ํ•™์Šต ์ปค๋ธŒ๊ฐ€ ๋‚˜์˜จ๋‹ค.

โžก๏ธ ์ผ๋ฐ˜ํ™”์— ์‹คํŒจ

โžก๏ธ Variance-bias trade off โžก๏ธ `ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ`์˜ ์†์‹ค๊ฐ’์€ ๊ณ„์† `๊ฐ์†Œ`ํ•˜์ง€๋งŒ `๊ฒ€์ฆ๋ฐ์ดํ„ฐ`์˜ ์†์‹ค๊ฐ’์€ `์ฆ๊ฐ€`ํ•˜๊ณ  ์žˆ์Œ

๊ณผ์ž‰ ์ ํ•ฉ์˜ ์˜ˆ

IMDB ์˜ํ™” ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ

import numpy as numpy
import tensorflow as tf
import matplotlib.pyplot as plt

# ๋ฐ์ดํ„ฐ ๋‹ค์šด๋กœ๋“œ (์ƒ์œ„ 1000๊ฐœ ๋‹จ์–ด๋ฅผ ์„ ํƒ)
(train_data, train_labels), (test_data, test_labels) = \
	tf.keras.datasets.imdb.load_data(num_words=1000)

# ์›-ํ•ซ ์ธ์ฝ”๋”ฉ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜
def one_hot_sequences(sequences, dimension=1000):
	results = numpy.zeros((len(sequences), dimension))
	for i, word_index in enumerate(sequences):
		results[i, word_index] = 1.
	return results

train_data = one_hot_sequences(train_data)
test_data = one_hot_sequences(test_data)
# ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๊ตฌ์ถ•
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(1000,)))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# ์‹ ๊ฒฝ๋ง ํ›ˆ๋ จ, ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ์ „๋‹ฌ
history = model.fit(train_data,
					train_labels,
					epochs=20,
					batch_size=512,
					validation_data=(test_data, test_labels),
					verbose=2)
# ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ์†์‹ค๊ฐ’๊ณผ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์˜ ์†์‹ค๊ฐ’์„ ๊ทธ๋ž˜ํ”„์— ์ถœ๋ ฅ
history_dict = history.history
loss_values = history_dict['loss'] # ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์†์‹ค๊ฐ’
val_loss_values = history_dict['val_loss'] # ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ์†์‹ค๊ฐ’
acc = history_dict['accuracy'] # ์ •ํ™•๋„
epochs = range(1, len(acc) + 1) # ์—ํฌํฌ ์ˆ˜

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss Plot')
plt.ylabel('loss')
plt.xlabel('epochs')
plt.legend(['train error', 'val error'], loc='upper left')
plt.show()

๊ณผ์ž‰ ์ ํ•ฉ ๋ฐฉ์ง€ ์ „๋žต

๊ฐ€์ค‘์น˜(weight)์˜ ๊ฐœ์ˆ˜๋ฅผ ์ค„์ด๊ฑฐ๋‚˜ ์ œํ•œํ•˜๊ณ  ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ(traing data)์˜ ์–‘์„ ๋Š˜๋ฆฌ๋ฉด ๋œ๋‹ค.

  • ์กฐ๊ธฐ ์ข…๋ฃŒ(early stopping): ๊ฒ€์ฆ ์†์‹ค์ด ์ฆ๊ฐ€ํ•˜๋ฉด ํ›ˆ๋ จ์„ ์กฐ๊ธฐ์— ์ข…๋ฃŒํ•œ๋‹ค.
  • ๊ฐ€์ค‘์น˜ ๊ทœ์ œ ๋ฐฉ๋ฒ•(weight regularization): ๊ฐ€์ค‘์น˜์˜ ์ ˆ๋Œ€๊ฐ’์„ ์ œํ•œํ•œ๋‹ค.
  • ๋“œ๋กญ์•„์›ƒ ๋ฐฉ๋ฒ•(dropout): ๋ช‡ ๊ฐœ์˜ ๋‰ด๋Ÿฐ์„ ์‰ฌ๊ฒŒ ํ•œ๋‹ค.
  • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•(data augmentation): ๋ฐ์ดํ„ฐ๋ฅผ ๋งŽ์ด ๋งŒ๋“ ๋‹ค.

์กฐ๊ธฐ์ข…๋ฃŒ (early stopping)

โžก๏ธ ๋ชจ๋ธ์ด ๋…ธ์ด์ฆˆ๋ฅผ ๋„ˆ๋ฌด ์—ด์‹ฌํžˆ ํ•™์Šตํ•˜๋ฉด ํ•™์Šต ์ค‘์— ๊ฒ€์ฆ ์†์‹ค์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.
โžก๏ธ ๊ฒ€์ฆ ์†์‹ค์ด ๋” ์ด์ƒ ๊ฐ์†Œํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ผ ๋•Œ๋งˆ๋‹ค ํ›ˆ๋ จ์„ ์ค‘๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ€์ค‘์น˜ ๊ทœ์ œ (weight regularization)

โžก๏ธ ๊ฐ€์ค‘์น˜์˜ ๊ฐ’์ด ๋„ˆ๋ฌด ํฌ๋ฉด, ํŒ๋‹จ ๊ฒฝ๊ณ„์„ ์ด ๋ณต์žกํ•ด์ง€๊ณ  ๊ณผ์ž‰ ์ ํ•ฉ์ด ์ผ์–ด๋‚œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค.

  • L1 ๊ทœ์ œ: Loss=Cost+ฮปโˆ‘โˆฃWโˆฃ\text{Loss} = \text{Cost} + \lambda \sum |W|
  • L2 ๊ทœ์ œ: Loss=Cost+ฮปโˆ‘W2\text{Loss} = \text{Cost} + \lambda \sum W^2

โžก๏ธ L1 ๊ทœ์ œ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ๋‹จ์ ์ด ์žˆ์–ด์„œ L2๊ทœ์ œ๋ฅผ ๋” ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค.

# ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๊ตฌ์ถ•
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu', input_shape=(1000,)))
model.add(tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

๋“œ๋กญ์•„์›ƒ (dropout)

โžก๏ธ ๋“œ๋กญ์•„์›ƒ์€ ๋ช‡ ๊ฐœ์˜ ๋…ธ๋“œ๋“ค์„ ํ•™์Šต ๊ณผ์ •์—์„œ ๋žœ๋คํ•˜๊ฒŒ ์ œ์™ธํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
โžก๏ธ ๋ณดํ†ต 0.2์—์„œ 0.5์‚ฌ์ด์˜ ๊ฐ’์„ ์‚ฌ์šฉํ•œ๋‹ค.

# ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๊ตฌ์ถ•
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ• (data augmentation)

โžก๏ธ ์†Œ๋Ÿ‰์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—์„œ ๋งŽ์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
โžก๏ธ ์ด๋ฏธ์ง€๋ฅผ ์ขŒ์šฐ๋กœ ํ™•๋Œ€ํ•œ๋‹ค๊ฑฐ๋‚˜ ํšŒ์ „์‹œ์ผœ์„œ ๋ณ€ํ˜•๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ด๊ฒƒ์„ ์ƒˆ๋กœ์šด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

์•™์ƒ๋ธ” (ensemble)

โžก๏ธ ์—ฌ๋Ÿฌ ์ „๋ฌธ๊ฐ€๋ฅผ ๋™์‹œ์— ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋™์ผํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹ ๊ฒฝ๋ง์„ N๊ฐœ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค.

  • ์•ฝ 2~5%์ •๋„์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

โžก๏ธ ๊ฐ ์‹ ๊ฒฝ๋ง์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œํ‚จ ํ›„์— ๋งˆ์ง€๋ง‰์— ํ•ฉ์น˜๋Š” ๊ฒƒ์ด๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€