MLP_1 (Multilayer Perceptron)

창슈·2025λ…„ 4μ›” 10일

Deep Learning

λͺ©λ‘ 보기
11/16
post-thumbnail

λ‹€μΈ΅ νΌμ…‰νŠΈλ‘  (MLP)

λ‹€μΈ΅ νΌμ…‰νŠΈλ‘ μ€ μž…λ ₯μΈ΅(input layer)κ³Ό 좜λ ₯μΈ΅(output layer) 사이에 은닉측(hidden layer)을 ν¬ν•¨ν•œ 인곡신경망 ꡬ쑰이닀.
각 측은 λ‰΄λŸ°μœΌλ‘œ κ΅¬μ„±λ˜λ©°, μΈ΅ κ°„μ—λŠ” κ°€μ€‘μΉ˜(weight)κ°€ μ—°κ²°λ˜μ–΄ μžˆλ‹€.

  • 순방ν–₯ 패슀 (Forward Pass)
    μž…λ ₯이 μ£Όμ–΄μ§€λ©΄ 이λ₯Ό μ‹ κ²½λ§μ˜ 각 측을 따라 순방ν–₯으둜 μ „νŒŒμ‹œμΌœ 좜λ ₯을 κ³„μ‚°ν•œλ‹€. 이 과정을 톡해 신경망은 ν˜„μž¬μ˜ μž…λ ₯에 λŒ€ν•œ 예츑 값을 μƒμ„±ν•˜κ²Œ λœλ‹€.

  • 였차 계산 (Error Estimation)
    μ‹ κ²½λ§μ˜ 좜λ ₯κ³Ό μ •λ‹΅(label) κ°„μ˜ 차이λ₯Ό κ³„μ‚°ν•˜μ—¬ 였차(error)λ₯Ό κ΅¬ν•œλ‹€. 이 μ˜€μ°¨λŠ” ν•™μŠ΅ κ³Όμ •μ—μ„œ μ€‘μš”ν•œ μ •λ³΄λ‘œ ν™œμš©λœλ‹€.

  • μ—­λ°©ν–₯ 패슀 (Backward Pass)
    κ³„μ‚°λœ 였차λ₯Ό λ°”νƒ•μœΌλ‘œ, μ‹ κ²½λ§μ˜ κ°€μ€‘μΉ˜μ™€ λ°”μ΄μ–΄μŠ€ 값을 μ‘°μ •ν•œλ‹€. μ΄λŠ” 였차λ₯Ό 쀄이기 μœ„ν•œ λ°©ν–₯으둜 μ§„ν–‰λ˜λ©°, 일반적으둜 μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜(backpropagation)이 μ‚¬μš©λœλ‹€.

πŸ“– 은닉측이 μ‘΄μž¬ν•˜λŠ” MLPλ₯Ό ν•™μŠ΅μ‹œν‚€λŠ” 방법

  • 1980λ…„λŒ€μ— μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜(backpropagation)이 κ°œλ°œλ˜μ—ˆκ³ , μ§€κΈˆκΉŒμ§€λ„ λ”₯λŸ¬λ‹μ˜ 근간이 되고 μžˆλ‹€.

  • μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜(backpropatation) μ΄λž€ μ‹ κ²½λ§μ˜ 좜λ ₯κ³Ό μ •λ‹΅κ³Όμ˜ 차이, 즉 였차λ₯Ό μ—­λ°©ν–₯으둜 μ „νŒŒμ‹œν‚€λ©΄μ„œ 였차λ₯Ό μ€„μ΄λŠ” λ°©ν–₯으둜 κ°€μ€‘μΉ˜(weight)와 λ°”μ΄μ–΄μŠ€(bias)λ₯Ό λ³€κ²½ν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ΄λ‹€.

  • 이 κ³Όμ •μ—μ„œ 핡심적인 μˆ˜ν•™ μ•Œκ³ λ¦¬μ¦˜μ€ κ²½μ‚¬ν•˜κ°•λ²•(gradient descent)이닀.


πŸ’‘ Activation Function (ν™œμ„±ν™”ν•¨μˆ˜)

ν™œμ„±ν™” ν•¨μˆ˜λŠ” μž…λ ₯의 총합을 λ°›μ•„ 좜λ ₯값을 κ³„μ‚°ν•˜λŠ” ν•¨μˆ˜μ΄λ‹€.
βœ”οΈ νΌμ…‰νŠΈλ‘ μ˜ 계단 ν•¨μˆ˜: 초기 νΌμ…‰νŠΈλ‘ μ—μ„œλŠ” 계단 ν•¨μˆ˜(step function)λ₯Ό μ‚¬μš©ν•˜μ˜€λ‹€.
βœ”οΈ MLP의 λΉ„μ„ ν˜• ν•¨μˆ˜: MLPμ—μ„œλŠ” λ‹€μ–‘ν•œ λΉ„μ„ ν˜• ν•¨μˆ˜(nonlinear function)듀이 ν™œμ„±ν™” ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ λ³΅μž‘ν•œ νŒ¨ν„΄μ„ ν•™μŠ΅ν•  수 있게 ν•΄μ€€λ‹€.

  • 일반적으둜 ν™œμ„±ν™” ν•¨μˆ˜λ‘œλŠ” λΉ„μ„ ν˜• ν•¨μˆ˜κ°€ 많이 μ‚¬μš©λœλ‹€.

  • μ„ ν˜•ν•¨μˆ˜ λ ˆμ΄μ–΄λŠ” μ—¬λŸ¬κ°œλ₯Ό 결합해도 결ꡭ은 μ„ ν˜• ν•¨μˆ˜ ν•˜λ‚˜μ˜ λ ˆμ΄μ–΄λ‘œ λŒ€μΉ˜λ  수 μžˆλ‹€λŠ” 것이 μˆ˜ν•™μ μœΌλ‘œ 증λͺ…λ˜μ—ˆκΈ° λ•Œλ¬Έμ΄λ‹€.
    (λ³΅μž‘ν•œ ν•¨μˆ˜, 고차원적 νŒ¨ν„΄, λΉ„μ„ ν˜•μ  관계듀을 λͺ¨λΈμ΄ ν•™μŠ΅ν•  수 있게 λœλ‹€.)


βœ”οΈ Step Function (계단 ν•¨μˆ˜)

  • μž…λ ₯ μ‹ ν˜Έμ˜ 총합이 0을 λ„˜μœΌλ©΄ 1을 좜λ ₯ν•˜κ³ , κ·Έλ ‡μ§€ μ•ŠμœΌλ©΄ 0을 좜λ ₯
f(x)={1,ifΒ x>00,ifΒ x≀0f(x) = \begin{cases} 1, & \text{if } x > 0 \\ 0, & \text{if } x \leq 0 \end{cases}

πŸ“¦ Code

# 계단 ν•¨μˆ˜ μ •μ˜ (방법 1: 쑰건문 μ‚¬μš©)
def step(x):
    if x > 0.000001:  # 뢀동 μ†Œμˆ˜μ  였차 λ°©μ§€
        return 1
    else:
        return 0

# 계단 ν•¨μˆ˜ μ •μ˜ (방법 2: λ„˜νŒŒμ΄ 배열을 λ°›κΈ° μœ„ν•˜μ—¬ λ³€κ²½)
def step(x):
    result = x > 0.000001  # True λ˜λŠ” False
    return result.astype(np.int32)  # μ •μˆ˜λ‘œ λ°˜ν™˜ (1 λ˜λŠ” 0)
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-10.0, 10.0, 0.1)
y = step(x)
plt.plot(x, y); plt.show()

βœ”οΈ Sigmoid Function (μ‹œκ·Έλͺ¨μ΄λ“œ ν•¨μˆ˜)

  • S자 ν˜•νƒœλ₯Ό 가짐

  • 1980λ…„λŒ€λΆ€ν„° μ‚¬μš©λ˜μ–΄ 온 전톡적인 ν™œμ„±ν™” ν•¨μˆ˜. κ³„λ‹¨ν•¨μˆ˜λŠ” x=0x = 0μ—μ„œ κΈ‰κ²©ν•˜κ²Œ λ³€ν™”ν•˜μ—¬ 미뢄이 λΆˆκ°€λŠ₯ν•˜μ§€λ§Œ μ‹œκ·Έλͺ¨μ΄λ“œ ν•¨μˆ˜λŠ” λ§€λ„λŸ½κ²Œ λ³€ν™”ν•˜κΈ° 떄문에 μ–Έμ œ μ–΄λ””μ„œλ‚˜ 미뢄이 κ°€λŠ₯ν•˜λ‹€λŠ” μž₯점이 μžˆλ‹€.

  • κ²½μ‚¬ν•˜κ°•λ²•μ΄λΌλŠ” μ΅œμ ν™” 기법을 μ μš©ν•  수 μžˆλ‹€.

f(x)=11+eβˆ’xf(x) = \frac{1}{1 + e^{-x}}
fβ€²(x)=βˆ’(βˆ’eβˆ’x)(1+eβˆ’x)2=11+eβˆ’xΓ—eβˆ’x1+eβˆ’x=f(x)Γ—(1βˆ’f(x))f'(x) = \frac{-(-e^{-x})}{(1 + e^{-x})^2} = \frac{1}{1 + e^{-x}}Γ—\frac{e^{-x}}{1 + e^{-x}} = f(x)Γ—(1-f(x))

πŸ“¦ Code

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

x = np.arange(-10.0, 10.0, 0.1)
y = sigmoid(x)

plt.plot(x, y)
plt.show()

βœ”οΈ ReLU Function (Rectified Linear Unit ν•¨μˆ˜)

  • μ΅œκ·Όμ— 많이 μ‚¬μš©λ˜λŠ” ν™œμ„±ν™” ν•¨μˆ˜.

  • μž…λ ₯이 0을 λ„˜μœΌλ©΄ κ·ΈλŒ€λ‘œ 좜λ ₯, μž…λ ₯이 0보닀 μž‘μœΌλ©΄ 좜λ ₯은 0이 λœλ‹€.

  • 미뢄도 κ°„λ‹¨ν•˜κ³  μ‹¬μΈ΅μ‹ κ²½λ§μ—μ„œ λ‚˜νƒ€λ‚˜λŠ” gradient 감쇠가 μΌμ–΄λ‚˜μ§€ μ•Šμ•„μ„œ 많이 μ‚¬μš©λœλ‹€.

f(x)={x,ifΒ x>00,ifΒ x≀0f(x) = \begin{cases} x, & \text{if } x > 0 \\ 0, & \text{if } x \leq 0 \end{cases}

πŸ“¦ Code

import numpy as np
import matplotlib.pyplot as plt

def relu(x):
    return np.maximum(x, 0)

x = np.arange(-10.0, 10.0, 0.1)
y = relu(x)

plt.plot(x, y)
plt.show()

βœ”οΈ tanh Function (Hyperbolic tangent ν•¨μˆ˜)

  • numpyμ—μ„œ μ œκ³΅ν•˜κ³  있기 λ•Œλ¬Έμ—, λ³„λ„μ˜ ν•¨μˆ˜ μž‘μ„±μ΄ ν•„μš”ν•˜μ§€ μ•Šλ‹€.

  • μ‹œκ·Έλͺ¨μ΄λ“œ ν•¨μˆ˜μ™€ μ•„μ£Ό λΉ„μŠ·ν•˜μ§€λ§Œ 좜λ ₯값이 -1μ—μ„œ 1κΉŒμ§€ 이닀.

  • RNNμ—μ„œ 많이 μ‚¬μš©λœλ‹€.

f(x)=exβˆ’eβˆ’xex+eβˆ’x=21+eβˆ’2xβˆ’1f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = \frac{2}{1 + e^{-2x}} - 1

πŸ“¦ Code

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 60)
y = np.tanh(x)

plt.plot(x, y)
plt.show()

순방ν–₯ 패슀 (Forward Pass)

  • 순방ν–₯ νŒ¨μŠ€λŠ” μž…λ ₯ μ‹ ν˜Έκ°€ μž…λ ₯μΈ΅μ—μ„œ μ‹œμž‘ν•˜μ—¬ 은닉측을 거쳐 좜λ ₯측으둜 μ „νŒŒλ˜λŠ” 과정을 μ˜λ―Έν•œλ‹€.

  • μ€λ‹‰μΈ΅μ˜ 첫 번째 μœ λ‹›μ— λŒ€ν•œ 계산식은 λ‹€μŒκ³Ό κ°™λ‹€.

    h1=f(w11x1+w21x2+β‹―+wn1xn+b1)h_1 = f(w_{11}x_1 + w_{21}x_2 + \cdots + w_{n1}x_n + b_1)
    • wijw_{ij}: μž…λ ₯ wiw_iμ—μ„œ 은닉 μœ λ‹› hjh_j둜 κ°€λŠ₯ κ°€μ€‘μΉ˜
    • b1b_1: λ°”μ΄μ–΄μŠ€ ν•­
    • f()f(): ν™œμ„±ν™” ν•¨μˆ˜ (예: sigmoid)


πŸ”— MLP의 순방ν–₯ 패슀_default

XOR 문제의 첫 번째 ν›ˆλ ¨ μƒ˜ν”Œ (x1,x2,y)=(0,0,0)(x_1, x_2, y) = (0,0,0)에 λŒ€ν•œ 은닉측과 좜λ ₯측의 계산 흐름

βœ… 전체 ꡬ쑰 μš”μ•½

  • μž…λ ₯μΈ΅: x1=0.0,x2=0.0x_1=0.0, x_2=0.0
  • 은닉측: h1,h2h_1, h_2 ( μ‹œκ·Έλͺ¨μ΄λ“œ ν•¨μˆ˜ μ‚¬μš© )
  • 좜λ ₯μΈ΅: yy
    λͺ¨λ“  λ…Έλ“œλŠ” κ°€μ€‘μΉ˜ ww와 λ°”μ΄μ–΄μŠ€ bbλ₯Ό 톡해 μ—°κ²°λ˜μ–΄ 있음.

πŸ“Œ 은닉 μœ λ‹› h1h_1 계산

- μ„ ν˜• κ²°ν•© κ°’: $$ z_1 = w_1 \cdot x_1 + w_3 \cdot x_2 + b_1 = 0.1 \cdot 0 + 0.3 \cdot 0 + 0.1 = 0.1 $$
  • ν™œμ„±ν™” ν•¨μˆ˜(sigmoid) 적용:

    a1=11+eβˆ’z1=11+eβˆ’0.1β‰ˆ0.524979a_1 = \frac{1}{1 + e^{-z_1}} = \frac{1}{1 + e^{-0.1}} \approx 0.524979
  • 즉, h1h_1의 좜λ ₯은 μ•½ 0.5249790.524979

πŸ“Œ 은닉 μœ λ‹› h2h_2 계산

  • μ„ ν˜• κ²°ν•© κ°’:

    z2=w2β‹…x1+w4β‹…x2+b2=0.2β‹…0+0.4β‹…0+0.2=0.2z_2 = w_2 \cdot x_1 + w_4 \cdot x_2 + b_2 = 0.2 \cdot 0 + 0.4 \cdot 0 + 0.2 = 0.2
  • ν™œμ„±ν™” ν•¨μˆ˜(sigmoid) 적용:

    a2=11+eβˆ’z2=11+eβˆ’0.2β‰ˆ0.549834a_2 = \frac{1}{1 + e^{-z_2}} = \frac{1}{1 + e^{-0.2}} \approx 0.549834
  • 즉, h2h_2의 좜λ ₯은 μ•½ 0.5498340.549834

πŸ“Œ 좜λ ₯μΈ΅ 계산 κ³Όμ •

  • a1=0.524979a_1 = 0.524979
  • a2=0.549834a_2 = 0.549834

- 좜λ ₯ μœ λ‹›μ— λ“€μ–΄κ°€λŠ” μ„ ν˜• κ²°ν•©: $$ z_y = w_5 \cdot a_1 + w_6 \cdot a_2 + b_3 $$ $$ z_y = 0.5 \cdot 0.524979 + 0.6 \cdot 0.549834 + 0.3 = 0.892389 $$
  • ν™œμ„±ν™” ν•¨μˆ˜(sigmoid) 적용:

    ay=11+eβˆ’zy=11+eβˆ’0.892389β‰ˆ0.709383a_y = \frac{1}{1 + e^{-z_y}} = \frac{1}{1 + e^{-0.892389}} \approx 0.709383
  • 즉, 좜λ ₯값은 μ•½ 0.71

πŸ“Œ 였차 확인

  • μ •λ‹΅(label)은 0

  • μ‹ κ²½λ§μ˜ 좜λ ₯은 0.709383

➑️ μ˜€μ°¨κ°€ μƒλ‹Ήνžˆ 큼 β†’ μ—­μ „νŒŒλ₯Ό 톡해 κ°€μ€‘μΉ˜λ₯Ό μ‘°μ •ν•΄μ•Ό 함


πŸ”— MLP의 순방ν–₯ 패슀_ν–‰λ ¬κΈ°λ°˜

πŸ“Œ μž…λ ₯ β†’ 은닉측 (첫 번째 λ ˆμ΄μ–΄)

  • 슀칼라 계산:

    z1=w1β‹…x1+w3β‹…x2+b1z2=w2β‹…x1+w4β‹…x2+b2z_1 = w_1 \cdot x_1 + w_3 \cdot x_2 + b_1 \\ z_2 = w_2 \cdot x_1 + w_4 \cdot x_2 + b_2
  • ν–‰λ ¬λ‘œ ν‘œν˜„:

    Z1=XW1+B1Z_1 = X W_1 + B_1
    • X=[x1,x2]X = [x_1, x_2]
    • W1=[w1w2w3w4]W_1 = \begin{bmatrix} w_1 & w_2 \\ w_3 & w_4 \end{bmatrix}
    • B1=[b1,b2]B_1 = [b_1, b_2]
  • ν™œμ„±ν™” ν•¨μˆ˜ (sigmoid) 적용:

Z1=[z1Β z2]=XW1+B1=[x1x2][w1w2w3w4]+[b1b2]Z_1 = [z_1 \ z_2] = X W_1 + B_1 = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} w_1 & w_2 \\ w_3 & w_4 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix}
A1=f(Z1)=[f(z1),f(z2)]=[a1,a2]A_1 = f(Z_1) = [f(z_1), f(z_2)] = [a_1, a_2]

πŸ“Œ 은닉측 β†’ 좜λ ₯μΈ΅ (두 번째 λ ˆμ΄μ–΄)

  • 슀칼라 계산:

    zy=w5β‹…a1+w6β‹…a2+b3z_y = w_5 \cdot a_1 + w_6 \cdot a_2 + b_3
  • ν–‰λ ¬λ‘œ ν‘œν˜„:

    Z2=A1W2+B2=[a1,a2][w5w6]+[b3]Z_2 = A_1 W_2 + B_2 = [a_1, a_2] \begin{bmatrix} w_5 \\ w_6 \end{bmatrix} + [b_3]
  • ν™œμ„±ν™” ν•¨μˆ˜ (sigmoid) 적용:

    A2=f(Z2)=[y]A_2 = f(Z_2) = [y]

➑️ 계산 효율 ν–₯상
➑️ 파이썬, ν…μ„œν”Œλ‘œμš° μ½”λ“œ μž‘μ„± 용이
➑️ λ―ΈλΆ„(μ—­μ „νŒŒ)도 κΉ”λ”ν•˜κ²Œ 정리됨


πŸ“¦ Code

import numpy as np

# μ‹œκ·Έλͺ¨μ΄λ“œ ν•¨μˆ˜
def actf(x):
    return 1 / (1 + np.exp(-x))

# μ‹œκ·Έλͺ¨μ΄λ“œ ν•¨μˆ˜μ˜ λ―ΈλΆ„
def actf_deriv(x):
    return x * (1 - x)

# 신경망 ꡬ쑰 μ •μ˜
inputs = 2      # μž…λ ₯ μœ λ‹› 개수
hiddens = 2     # 은닉 μœ λ‹› 개수
outputs = 1     # 좜λ ₯ μœ λ‹› 개수

# ν•™μŠ΅λ₯  μ„€μ •
learning_rate = 0.2

# XOR ν›ˆλ ¨ 데이터 (μž…λ ₯κ³Ό μ •λ‹΅)
X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])

T = np.array([
    [0],
    [1],
    [1],
    [0]
])
# κ°€μ€‘μΉ˜μ™€ λ°”μ΄μ–΄μŠ€ μ΄ˆκΈ°ν™”
W1 = np.array([[0.10, 0.20],
               [0.30, 0.40]])

W2 = np.array([[0.50],
               [0.60]])

B1 = np.array([0.1, 0.2])
B2 = np.array([0.3])

# 순방ν–₯ μ „νŒŒ ν•¨μˆ˜
def predict(x):
    layer0 = x                      # μž…λ ₯μΈ΅
    Z1 = np.dot(layer0, W1) + B1   # 은닉측 μ„ ν˜• κ²°ν•©
    layer1 = actf(Z1)              # 은닉측 ν™œμ„±ν™”
    Z2 = np.dot(layer1, W2) + B2   # 좜λ ₯μΈ΅ μ„ ν˜• κ²°ν•©
    layer2 = actf(Z2)              # 좜λ ₯μΈ΅ ν™œμ„±ν™” (μ˜ˆμΈ‘κ°’)
    return layer0, layer1, layer2

# μˆœμ„œλŒ€λ‘œ 계산됨
def test():
    for x, y in zip(X, T):
        x = np.reshape(x, (1, -1))  # μž…λ ₯을 2μ°¨μ›μœΌλ‘œ λ³€ν™˜ (1ν–‰, nμ—΄)
        layer0, layer1, layer2 = predict(x)
        print(x, y, layer2)

# ν…ŒμŠ€νŠΈ μ‹€ν–‰
test()
[[0 0]] [1] [[0.70938314]]
[[0 1]] [0] [[0.72844306]]
[[1 0]] [0] [[0.71791234]]
[[1 1]] [1] [[0.73598705]]
# ν•™μŠ΅μ΄ μ—†μœΌλ―€λ‘œ λ‚œμˆ˜λ§Œ 좜λ ₯λœλ‹€.

κ²½μ‚¬ν•˜κ°•λ²• (Gradient Descent)

πŸ“‰ 손싀 ν•¨μˆ˜ (Loss Function)

  • μ‹ κ²½λ§μ—μ„œ ν•™μŠ΅μ„ μ‹œν‚¬ λ•Œ λŠ” μ‹€μ œ 좜λ ₯κ³Ό μ›ν•˜λŠ” 좜λ ₯ μ‚¬μ΄μ˜ 였차λ₯Ό μ΄μš©ν•œλ‹€.

  • μ‹ κ²½λ§μ—μ„œλ„ ν•™μŠ΅μ˜ μ„±κ³Όλ₯Ό λ‚˜νƒ€λ‚΄λŠ” μ§€ν‘œλ₯Ό μ†μ‹€ν•¨μˆ˜(loss function)


πŸ“Œ ν‰κ· μ œκ³± 였차 (MSE)

μ˜ˆμΈ‘κ°’κ³Ό μ •λ‹΅ κ°„μ˜ 평균 제곱 였차

E(w)=12βˆ‘i(yiβˆ’ti)2E(w) = \frac{1}{2} \sum_i (y_i - t_i)^2
  • yiy_i: 좜λ ₯μΈ΅ μœ λ‹› ii의 μ˜ˆμΈ‘κ°’
  • tit_i: ν•΄λ‹Ή μœ λ‹›μ˜ μ •λ‹΅κ°’
  • E(w)E(w): ν˜„μž¬ κ°€μ€‘μΉ˜ ww에 λŒ€ν•œ 였차 κ°’

⚠️ μ•žμ˜ 1/21/2 λŠ” 미뢄을 κ°„λ‹¨ν•˜κ²Œ λ§Œλ“€κΈ° μœ„ν•΄ λΆ™λŠ” μƒμˆ˜

import numpy as np

# μ˜ˆμΈ‘κ°’κ³Ό μ •λ‹΅κ°’
y = np.array([0.0, 0.0, 0.8, 0.1, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0])
target = np.array([0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])

# 평균 제곱 였차 ν•¨μˆ˜ μ •μ˜
def MSE(target, y):
    return 0.5 * np.sum((y - target) ** 2)

# 정닡에 κ°€κΉŒμš΄ 예츑
print(MSE(target, y))  # 0.029999...

# μ˜ˆμΈ‘κ°’μ΄ μ •λ‹΅κ³Ό 크게 λ‹€λ₯Έ 경우
y = np.array([0.9, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
print(MSE(target, y))  # 0.81

πŸ“Œ κ²½μ‚¬ν•˜κ°•λ²• (Gradient Descent)

  • μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜μ€ 신경망 ν•™μŠ΅ 문제λ₯Ό μ΅œμ ν™” 문제(optimization)둜 μ ‘κ·Όν•œλ‹€.

  • 손싀 ν•¨μˆ˜μ˜ 기울기(1μ°¨ λ―ΈλΆ„κ°’)λ₯Ό μ‚¬μš©ν•˜μ—¬, 였차λ₯Ό μ€„μ΄λŠ” λ°©ν–₯으둜 κ°€μ€‘μΉ˜λ₯Ό μ‘°μ •ν•˜λŠ” μ΅œμ ν™” μ•Œκ³ λ¦¬μ¦˜μ΄λ‹€.

    Wβˆ—=arg⁑min⁑WE(W)W^* = \arg\min_W E(W)
    • WW: μ‹ κ²½λ§μ˜ κ°€μ€‘μΉ˜ νŒŒλΌλ―Έν„°
    • E(W)E(W): κ°€μ€‘μΉ˜μ— λ”°λ₯Έ 손싀 ν•¨μˆ˜
    • argminarg min: 손싀을 μ΅œμ†Œν™”ν•˜λŠ” WW 값을 κ΅¬ν•œλ‹€λŠ” 의미

πŸ’‘ 신경망이 μ •λ‹΅κ³Όμ˜ 였차λ₯Ό κ°€μž₯ μž‘κ²Œ λ§Œλ“€λ„λ‘ κ°€μ€‘μΉ˜λ₯Ό ν•™μŠ΅ν•˜λŠ” 것!

βœ”οΈ 손싀 ν•¨μˆ˜μ˜ κ·Έλž˜λ””μ–ΈνŠΈ

βˆ‡E=(βˆ‚Eβˆ‚w1,…,βˆ‚Eβˆ‚wn)\nabla E = \left( \frac{\partial E}{\partial w_1}, \dots, \frac{\partial E}{\partial w_n} \right)

둜 λ―ΈλΆ„ν•œ 값이며 wwμ—μ„œ μ ‘μ„ μ˜ 기울기λ₯Ό λœ»ν•œλ‹€.

βˆ‚Eβˆ‚wi>0\frac{\partial E}{\partial w_i} > 0:
β†’ κ°€μ€‘μΉ˜λ₯Ό μ¦κ°€μ‹œν‚€λ©΄ μ˜€μ°¨λ„ 증가
β†’ λ”°λΌμ„œ κ°€μ€‘μΉ˜λ₯Ό μ€„μ΄λŠ” λ°©ν–₯으둜 μ—…λ°μ΄νŠΈ

βˆ‚Eβˆ‚wi<0\frac{\partial E}{\partial w_i} < 0:
β†’ κ°€μ€‘μΉ˜λ₯Ό μ¦κ°€μ‹œν‚€λ©΄ μ˜€μ°¨λŠ” κ°μ†Œ
β†’ λ”°λΌμ„œ κ°€μ€‘μΉ˜λ₯Ό λŠ˜λ¦¬λŠ” λ°©ν–₯으둜 μ—…λ°μ΄νŠΈ

➑️ 였차λ₯Ό μ΅œμ†Œν™”ν•˜κΈ° μœ„ν•΄ 항상 기울기의 λ°˜λŒ€ λ°©ν–₯으둜 κ°€μ€‘μΉ˜λ₯Ό μ‘°μ •!
Loss function: y=(xβˆ’3)2+10y = (x - 3)^2 + 10
gradient: yβ€²=2(xβˆ’3)=2xβˆ’6y' = 2(x - 3) = 2x - 6

x=10x = 10 β†’ loss function: y=(10βˆ’3)2+10=49+10=59y = (10 - 3)^2 + 10 = 49 + 10 = 59,
gradient: yβ€²=2(10βˆ’3)=14y' = 2(10 - 3) = 14 β†’ x=7.2βˆ’0.2(8.4)=5.52x = 7.2 - 0.2(8.4) = 5.52

πŸ“¦ Code

x = 10
learning_rate = 0.2
precision = 0.00001
max_iterations = 100

# μ†μ‹€ν•¨μˆ˜λ₯Ό λžŒλ‹€μ‹μœΌλ‘œ μ •μ˜ν•œλ‹€.
loss_func = lambda x: (x - 3) ** 2 + 10

# κ·Έλž˜λ””μ–ΈνŠΈλ₯Ό λžŒλ‹€μ‹μœΌλ‘œ μ •μ˜ν•œλ‹€.
# μ†μ‹€ν•¨μˆ˜μ˜ 1μ°¨ 미뢄값이닀.
gradient = lambda x: 2 * x - 6

# κ·Έλž˜λ””μ–ΈνŠΈ κ°•ν•˜λ²•
for i in range(max_iterations):
    x = x - learning_rate * gradient(x)
    print("μ†μ‹€ν•¨μˆ˜κ°’(", x, ") =", loss_func(x))

print("μ΅œμ†Œκ°’ =", x)
μ†μ‹€ν•¨μˆ˜κ°’( 7.199999999999999 )= 27.639999999999993
μ†μ‹€ν•¨μˆ˜κ°’( 5.52 )= 16.350399999999997
μ†μ‹€ν•¨μˆ˜κ°’( 4.512 )= 12.286143999999998
μ†μ‹€ν•¨μˆ˜κ°’( 3.9071999999999996 )= 10.82301184
μ†μ‹€ν•¨μˆ˜κ°’( 3.54432 )= 10.2962842624
...
μ†μ‹€ν•¨μˆ˜κ°’( 3.0000000000000004 )= 10.0
μ΅œμ†Œκ°’ = 3.0000000000000004
from mpl_toolkits.mplot3d import axis3d
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-5, 5, 0.5)
y = np.arange(-5, 5, 0.5)
X, Y = np.meshgrid(x, y)  # μ°Έκ³  λ°•μŠ€
Z = X**2 + Y**2  # λ„˜νŒŒμ΄ μ—°μ‚°

fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='3d')

# 3차원 κ·Έλž˜ν”„λ₯Ό κ·Έλ¦°λ‹€.
ax.plot_surface(X, Y, Z)

plt.show()

import matplotlib.pyplot as plt
import numpy as np

# x, y μ’Œν‘œ μ„€μ •
x = np.arange(-5, 5, 0.5)
y = np.arange(-5, 5, 0.5)

# 2D κ·Έλ¦¬λ“œ 생성
X, Y = np.meshgrid(x, y)

# ν•¨μˆ˜ f(x, y) = xΒ² + y²의 κ·Έλž˜λ””μ–ΈνŠΈμ˜ λ°˜λŒ€ λ°©ν–₯ 계산
U = -2 * X
V = -2 * Y

# 벑터 ν•„λ“œ μ‹œκ°ν™”
plt.figure()
Q = plt.quiver(X, Y, U, V, units='width')  # ν™”μ‚΄ν‘œλ‘œ ν‘œν˜„
plt.title("Gradient Descent Directions")
plt.grid(True)
plt.show()
f(x,y)=x2+y2βˆ‡f=(βˆ‚fβˆ‚x,βˆ‚fβˆ‚y)=(2x,2y)f(x, y) = x^2 + y^2 \\ \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y)


🏷️ λ‹€μŒλ‚΄μš© "μ—­μ „νŒŒ"

λ‹€μŒ ν¬μŠ€νŒ…μ— μ—­μ „νŒŒ Backpropagation에 λŒ€ν•œ λ‚΄μš”μ„ 닀루겠닀.

profile
🐾

0개의 λŒ“κΈ€