FC 신경망의 역전파 계산하기

milkbuttercheese·2023년 4월 19일
0

기타수학

목록 보기
8/8
  • nn 개의 층으로 이루어진 신경망이 있다고 하자
  • 그리고 다음의 두 공식을 따른다
    - z(l)=w(l)a(l1)+b(l)\boldsymbol{z}^{(l)}=\boldsymbol{w}^{(l)}\boldsymbol{a} ^{(l-1)}+\boldsymbol{b}^{(l)}
    a(l)=σ(z(l))\boldsymbol{a}^{(l)}=\sigma(\boldsymbol{z}^{(l)})

  • - Jw(l)=Ja(n)(a(n)z(n)z(n)a(n1))(a(n1)z(n1)z(n1)a(n2))z(l)w(l)\cfrac{\partial {J}}{\partial {\boldsymbol{w}^{(l)}}}=\cfrac{\partial {J}}{\partial {\boldsymbol{a}}^{(n)}}\cdot (\cfrac{\partial {\boldsymbol{a}^{(n)}}}{\partial {\boldsymbol{z}}^{(n)}}\cdot \cfrac{\partial {\boldsymbol{z}}^{(n)}}{\partial {\boldsymbol{a}}^{(n-1)}})\cdot (\cfrac{\partial {\boldsymbol{a}}^{(n-1)}}{\partial {\boldsymbol{z}}^{(n-1)}}\cdot \cfrac{\partial {\boldsymbol{z}}^{(n-1)}}{\partial {\boldsymbol{a}}^{(n-2)}})\cdots \cfrac{\partial {\boldsymbol{z}}^{(l)}}{\partial {\boldsymbol{w}^{(l)}}}
    - =Jz(n)(σ(z(l))w(l))(σ(z(l1))w(l1))a(l1)=\cfrac{\partial {J}}{\partial {\boldsymbol{z}^{(n)}}}\cdot (\sigma '(\boldsymbol{z}^{(l)})\cdot \boldsymbol{w}^{(l)})\cdot(\sigma'(\boldsymbol{z}^{(l-1)})\cdot \boldsymbol{w}^{(l-1)})\cdots \boldsymbol{a}^{(l-1)}
  • δ(l)=Jz(l)\delta ^{(l)}=\cfrac{\partial {J}}{\partial {\boldsymbol{z}}^{(l)}} 을 error signal이라 칭하면서 다음과 같이 표기할 수 있다
    - δ(l)=Jz(l)=Jz(l+1)z(l+1)a(l)a(l)z(l)\delta ^{(l)}=\cfrac{\partial {J}}{\partial {\boldsymbol{z}}^{(l)}}=\cfrac{\partial {J}}{\partial {\boldsymbol{z}}^{(l+1)}}\cfrac{\partial {\boldsymbol{z}^{(l+1)}}}{\partial {\boldsymbol{a}}^{(l)}}\cdot \cfrac{\partial {\boldsymbol{a}}^{(l)}}{\partial {\boldsymbol{z}}^{(l)}}
    - =δ(l+1)w(l+1)σ(z(l))=\delta ^{(l+1)}\cdot \boldsymbol{w}^{(l+1)}\cdot \sigma'(\boldsymbol{z}^{(l)})
    - Jw(l)=Jz(l)z(l)w(l)\cfrac{\partial {J}}{\partial {\boldsymbol{w}}^{(l)}}=\cfrac{\partial {J}}{\partial {\boldsymbol{z}}^{(l)}}\cfrac{\partial {\boldsymbol{z}}^{(l)}}{\partial {\boldsymbol{w}}^{(l)}}
    - =δ(l)a(l1)=\delta ^{(l)}\boldsymbol{a}^{(l-1)}
  • 부록
    - δ(l)=[Jz1(l+1)Jz2(l+1)][z1(l+1)a1(l)z1(l+1)a2(l)z2(l+1)a1(l)][a1(l)z1(l)a1(l)z2(l)a2(l)z1(l)]\delta ^{(l)}=\begin{bmatrix} \cfrac{\partial {J}}{\partial {z ^{(l+1)}_{1}}} & \cfrac{\partial {J}}{\partial {z}^{(l+1)}_{2}} & \cdots \end{bmatrix} \begin{bmatrix} \cfrac{\partial {z} ^{(l+1)}_{1}}{\partial {a ^{(l)}_{1}}} & \cfrac{\partial {z _{1}^{(l+1)}}}{\partial {a} _{2}^{(l)}} & \cdots\\ \cfrac{\partial {z ^{(l+1)}_{2}}}{\partial {a}^{(l)}_{1}} \\ \cdots \end{bmatrix}\begin{bmatrix} \cfrac{\partial {a}^{(l)}_{1}}{\partial {z}^{(l)}_{1}} & \cfrac{\partial {a _{1}}^{(l)}}{\partial {z _{2}^{(l)}}} & \cdots \\ \cfrac{\partial {a}^{(l)}_{2}}{\partial {z}^{(l)}_{1}} \\ \cdots \end{bmatrix}
    - =δ(l+1)w(l+1)σ(z(l))=\delta ^{(l+1)}\cdot \boldsymbol{w}^{(l+1)}\cdot \sigma'(\boldsymbol{z}^{(l)})
profile
안녕하세요!

0개의 댓글