[DL] Implementing 3 Layered NN for tutorial

조현호_ChoHyeonHo·2025년 1월 1일

Before we start...

Reason we using non-linear function is...

선형 함수의 문제는 층을 아무리 깊게 해도 '은닉충이 없는 네트워크'로도 똑같은 기능을 할 수 있다는 데 있습니다. 구체적으로 (약간 직감적으로) 설명해주는 간단한 예를 생각해봤습니다. 선형 함수인 h(x) = cx를 활성화 함수로 사용한 3층 네트워크를 떠올려 보세요. 이를 식으로 나타내면 y(x) = h(h(h(x)))가 됩니다. 이 계산은 y(x) = c c c * x처럼 곱셈을 세 번 수행하지만, 실은 y(x) = ax와 똑같은 식입니다. a = c^3이라고만 하면 끝이죠. 즉, 은닉층이 없는 네트워크로 표현할 수 있습니다. 이 예처럼 선형함수를 이용해서는 여러 층으로 구성하는 이점을 살릴 수 없습니다. 그래서 층을 쌓는 혜택을 얻고 싶다면 활성화 함수로는 반드시 비선형 함수를 사용해야 합니다.

ReLU: Rectified Linear Unit

Figure: ReLU graph

rectify: to correct somthing wrong.(=corrected linear unit)

Equation: ReLU

h(x) = \begin{cases} x\ (x>0) \\ 0\ (x \leq 0) \end{cases}

Code: ReLU in python

def relu(x):
    return np.maximum(0, x)

np.maximum == max()

Matrix Multiplication on Neural Network

Figure: Matrix Multiplication

Code

>>> X = np.array([1,2])
>>> X.shape
(2,)
>>> W = np.array([[1, 3, 5], [2, 4, 6]])
>>> print(W)
[[1 3 5]
 [2 4 6]]
>>> W.shape
(2, 3)
>>> Y = np.dot(X, W)
>>> print(Y)
[ 5 11 17]

Building 3 layered NN

Figure: 3 layered NN

Simple Expression Rule

Figure: Specified NN

Let's calclate the network

1. Defining Matrixes

How do we get $a_1$ ? It's the sum of the input values.

a_1^{(1)} = w_{11}^{(1)}x_1 + w_{12}^{(1)}x_2 + b_1^{(1)}

This can be simplified using matrix multiplication

A^{(1)}=XW^{(1)}+B^{(1)}

This equation can be defined like below.

A^{(1)} = (a_1^{(1)}\ a_2^{(1)}\ a_3^{(1)}),\ X = (x_1\ x_2),\ B^{(1)}=(b_1^{(1)}\ b_2^{(1)}\ b_3^{(1)})

W^{(1)}=\begin{cases} w_{11}^{(1)} + w_{21}^{(1)} + w_{31}^{(1)}\\ w_{12}^{(1)} + w_{22}^{(1)} + w_{32}^{(1)} \end{cases} = All \ weights\ of\ the\ first\ layer

This can be defined in python as below.

X = np.array([1,0, 0.5])
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]]) # Each of elements is the weight of X[0], X[1]
B1 = np.array([0.1, 0.2, 0.3])

2. First Layer

A^{(1)}=XW^{(1)}+B^{(1)}

And this equation can be done like that.

A1 = np.dot(X, W1)  + B1
print(A1)

Resulting

[0.3 0.7 1.1]

Now, we have calculated $a_1^{(1)}$ . This will be put in activate function $h(x)$ and the result is represented as $z_1^{(1)}$ . As we use sigmoid function as an activate function here, we'll put A1 in sigmoid()

Z1 = sigmoid(A1)

This results float-pointing numbers [0.57444252 0.66818777 0.75026011].

3. Second Layer

Same.

W2 = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
B2 = np.array([0.1, 0.2])
A2 = np.dot(Z1, W2)  + B2
Z2 = sigmoid(A2)

4. Output Layer

Only the activate function becomes another(depends on the purpose). Rests are same.

def identity_function(x):
    return x

We use identity_function() here.

W3 = np.array([0.1, 0.3], [0.2, 0.4])
B3 = np.array([0.1, 0.2])

A3 = np.dot(Z2, W3) + B3
Y = identity_function(A3)

5. Entire code

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def identity_function(x):
    return x

def init_network():
    network = {}
    network['W1'] = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]]) # Each of elements is the weight of X[0], X[1]
    network['b1'] = np.array([0.1, 0.2, 0.3])
    network['W2'] = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
    network['b2'] = np.array([0.1, 0.2])
    network['W3'] = np.array([[0.1, 0.3], [0.2, 0.4]])
    network['b3'] = np.array([0.1, 0.2])
    return network

def forward(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']
    
    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = identity_function(a3)
    
    return y

network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)
print(y)

조현호_ChoHyeonHo

Behold the rabbit hole

이전 포스트

[Java] Static method & Instance method

다음 포스트