Neural Network / Backpropagation

임정민·2024년 6월 1일
0

딥러닝 개념

목록 보기
4/13
post-thumbnail

*본 게시글은 유튜브 '김성범[ 교수 / 산업경영공학부 ]' [ 핵심 머신러닝 ]뉴럴네트워크모델 2 (Backpropagation 알고리즘) 자료를 참고한 점임을 알립니다.

Neural Network

1. 전체 뉴럴 네트워크 정의

  • 한번에 바로 미분하기에는 복잡
  • 출력층과 은닉층, 은닉층과 입력층 단위로 구분

2. 출력층과 은닉층 사이

  • Forward 진행 과정

netk=h1wk1+h2wk2++hjwkj++hpwkp+wk0net_k = h_1 w_{k1} + h_2 w_{k2} + \cdots + h_j w_{kj} + \cdots + h_p w_{kp} + w_{k0}
ok=sigmoid(netk)=11+exp(netk)o_k = sigmoid(net_k) = \frac{1}{1 + \exp(-net_k)}
En(w)=12n=1m(tkok)2E_n(w) = \frac{1}{2} \sum_{n=1}^{m} (t_k - o_k)^2
  • Backpropagation를 통해 wkjw_{kj}을 업데이트 하기 위해 Δwkj=αEnwkj\Delta w_{kj} = - \alpha \frac{\partial E_n}{\partial w_{kj}}를 계산

Enwkj=Ennetknetkwkj=Enokoknetknetkwkj=Enokoknetkhj\frac{\partial E_n}{\partial w_{kj}} = \frac{\partial E_n}{\partial net_k} \frac{\partial net_k}{\partial w_{kj}} = \frac{\partial E_n}{\partial o_k} \frac{\partial o_k}{\partial net_k} \frac{\partial net_k}{\partial w_{kj}} = \frac{\partial E_n}{\partial o_k} \frac{\partial o_k}{\partial net_k} h_j

  • 위를 (단일 k에 대해) 자세히 전개하면
1.Enok=ok12k=1m(tkok)2=ok12(tkok)2=(tkok)1.\frac{\partial E_n}{\partial o_k} = \frac{\partial}{\partial o_k} \frac{1}{2} \sum_{k=1}^m (t_k - o_k)^2 = \frac{\partial}{\partial o_k} \frac{1}{2} (t_k - o_k)^2 = -(t_k - o_k)
2.oknetk=netk(11+exp(netk))=(exp(netk))(1+exp(netk))2=exp(netk)(1+exp(netk))22.\frac{\partial o_k}{\partial net_k} = \frac{\partial}{\partial net_k} \left( \frac{1}{1 + \exp(-net_k)} \right) = \frac{ -(-\exp(-net_k)) }{ (1 + \exp(-net_k))^2 } = \frac{ \exp(-net_k) }{ (1 + \exp(-net_k))^2 }
oknetk=ok(1ok)\frac{\partial o_k}{\partial net_k} = o_k (1 - o_k)
(σ함수편미분의특징)*(\,\sigma\,함수\,편미분의\,특징)
  • 최종적으로 가중치를 업데이트할 때

    Δwkj=αEnwkj=α(tkok)ok(1ok)hj\Delta w_{kj} = -\alpha \frac{\partial E_n}{\partial w_{kj}} = \alpha (t_k - o_k) o_k (1 - o_k) h_j
(α:학습률,tk:실제값,ok:예측값hj:은닉층logit)(\alpha:학습률,\,t_{k} : 실제값,\,o_{k}:예측값\,h_{j}:은닉층logit값)

3. 은닉층과 입력층 사이

  • Forward 진행 과정
    netj=x1wj1+x2wj2++xiwji++xdwjd+wj0net_j = x_1 w_{j1} + x_2 w_{j2} + \cdots + x_i w_{ji} + \cdots + x_d w_{jd} + w_{j0}
hj=sigmoid(netj)=11+exp(netj)h_j = sigmoid(net_j) = \frac{1}{1 + \exp(-net_j)}
netk=h1wk1+h2wk2++hjwkj++hpwkp+wk0net_k = h_1 w_{k1} + h_2 w_{k2} + \cdots + h_j w_{kj} + \cdots + h_p w_{kp} + w_{k0}
ok=sigmoid(netk)=11+exp(netk)o_k = sigmoid(net_k) = \frac{1}{1 + \exp(-net_k)}
En(w)=12n=1m(tkok)2E_n(w) = \frac{1}{2} \sum_{n=1}^m (t_k - o_k)^2
  • Backpropagation를 통해 wjiw_{ji}을 업데이트 하기 위해 Δwji=αEnwji\Delta w_{ji} = - \alpha \frac{\partial E_n}{\partial w_{ji}}를 계산
Enwji=Ennetjnetjwji=Ennetjxi\frac{\partial E_n}{\partial w_{ji}} = \frac{\partial E_n}{\partial net_j} \frac{\partial net_j}{\partial w_{ji}} = \frac{\partial E_n}{\partial net_j} x_i
  • 이를 자세히 전개하면
Enwji=Ennetjxi\frac{\partial E_n}{\partial w_{ji}} = \frac{\partial E_n}{\partial net_j} x_i
Ennetj=netj(12k=1m(tkok)2)=12k=1mnetj(tkok)2\frac{\partial E_n}{\partial net_j} = \frac{\partial}{\partial net_j} \left( \frac{1}{2} \sum_{k=1}^m (t_k - o_k)^2 \right) = \frac{1}{2} \sum_{k=1}^m \frac{\partial}{\partial net_j} (t_k - o_k)^2
Ennetj=12k=1m(tkok)2netj\frac{\partial E_n}{\partial net_j} = \frac{1}{2} \sum_{k=1}^m \frac{\partial (t_k - o_k)^2}{\partial net_j}
=12k=1m(tkok)2okoknetknetkhjhjnetj= \frac{1}{2} \sum_{k=1}^m \frac{\partial (t_k - o_k)^2}{\partial o_k} \frac{\partial o_k}{\partial net_k} \frac{\partial net_k}{\partial h_j} \frac{\partial h_j}{\partial net_j}
=k=1m(tkok)(oknetknetkhjhjnetj)= \sum_{k=1}^m (t_k - o_k) \left( - \frac{\partial o_k}{\partial net_k} \frac{\partial net_k}{\partial h_j} \frac{\partial h_j}{\partial net_j} \right)
  • 위 수식을 구체적으로 풀어쓰자면
    1.hjnetj=hj(1hj)1.\frac{\partial h_j}{\partial net_j} = h_j (1 - h_j)
(σ함수편미분의특징)*(\,\sigma\,함수\,편미분의\,특징)
2.netkhj=wkj2.\frac{\partial net_k}{\partial h_j} = w_{kj}
3.oknetk=ok(1ok)3.\frac{\partial o_k}{\partial net_k} = o_k (1 - o_k)
(σ함수편미분의특징)*(\,\sigma\,함수\,편미분의\,특징)
  • 이를 정리하자면
Ennetj=hj(1hj)k=1mwkjok(1ok)(tkok)\frac{\partial E_n}{\partial net_j} = -h_j (1 - h_j) \sum_{k=1}^m w_{kj} o_k (1 - o_k) (t_k - o_k)
  • 최종적으로 가중치를 업데이트할 때
Δwji=αEnwji=α(Ennetjxi)\Delta w_{ji} = -\alpha \frac{\partial E_n}{\partial w_{ji}} = -\alpha \left( \frac{\partial E_n}{\partial net_j} x_i \right)
Δwji=αxihj(1hj)k=1mwkjok(1ok)(tkok)\Delta w_{ji} = \alpha x_i h_j (1 - h_j) \sum_{k=1}^m w_{kj} o_k (1 - o_k) (t_k - o_k)
(α:학습률,xi:입력값,hj:은닉층logit,wkj:은닉층weight,ok:예측값,tk:실제값)(\alpha:학습률,\,x_{i}:입력값,\,h_{j}:은닉층\,logit값,w_{kj}:은닉층\, weight\,값,\,o_{k}:예측값,\,t_{k} : 실제값)

4. 출력층과 은닉층 / 은닉층과 입력층 Δw\Delta w 정리

Δwkj=αEnwkj\Delta w_{kj} = -\alpha \frac{\partial E_n}{\partial w_{kj}}
=α(tkok)ok(1ok)hj= \alpha (t_k - o_k) o_k (1 - o_k) h_j

Δwji=αEnwji\Delta w_{ji} = -\alpha \frac{\partial E_n}{\partial w_{ji}}
=αxihj(1hj)k=1mwkjok(1ok)(tkok)= \alpha x_i h_j (1 - h_j) \sum_{k=1}^m w_{kj} o_k (1 - o_k) (t_k - o_k)

참고 자료

profile
https://github.com/min731

0개의 댓글

관련 채용 정보