[DL] XOR Problem

융·2023년 6월 26일

[Machine Learning & Deep Learning]

목록 보기

7/16

퍼셉트론(Perceptron)

퍼셉트론(Perceptron)은 인공신경망(Artificial Neural Network)의 가장 간단한 형태로, 이진 분류 문제를 해결하기 위해 개발된 알고리즘이다.
간단하게 설명하면, 퍼셉트론은 입력과 가중치의 선형 조합을 계산하고, 그 결과에 활성화 함수를 적용하여 이진 출력을 생성합니다.

Image by Deep neural networks,or Perceptron vs dogs and cats

XOR Problem

퍼셉트론은 초기에는 단층 구조로 제안되었으며 비선형문제에 한계를 가지고 있었고 XOR 문제는 퍼셉트론이 잘 해결하지 못하는 비선형 분류 문제 중 하나로 알려져 있다.

XOR은 배타적 논리합을 의미하며, 두 개의 입력이 다를 때 1을 출력하고, 같을 때 0을 출력하는 논리 연산을 말한다.

Image by Neural Network Multilayer Perceptron

다층 퍼셉트론 (MLP)

다층 퍼셉트론(multi-layer perceptron)이 등장하여 비선형 분류 문제를 해결할 수 있게 됐다.
다층 퍼셉트론은 입력층(input layer), 은닉층(hidden layer), 출력층(output layer)으로 구성되며 은닉층은 단일 퍼셉트론과 달리 여러 개의 노드로 구성되고 비선형성을 추가하는 역할을 한다.

Image by Introduction to how an Multilayer Perceptron works but without complicated math

XOR 문제 해결하기

XOR 연산 데이터 생성

X = np.array([
    [0,0],
    [1,0],
    [0,1],
    [1,1]
])

y = np.array([
    [0],
    [1],
    [1],
    [0]
])

모델 생성

첫 번째 레이어는 입력층과 은닉층 역할을 수행하고, 두 번째 레이어는 출력층 역할을 수행한다.

model = tf.keras.Sequential([
    tf.keras.layers.Dense(2, activation='sigmoid', input_shape=(2,)), # 입력 데이터 2개 , 출력 데이터 2개
    tf.keras.layers.Dense(1, activation='sigmoid') # input_shape=(2,) 생략 ->  앞의 레이어 층의 출력 데이터 형태와 동일
])

옵티마이저와 손실 함수 설정

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.1), loss='mse')

모델 학습

hist = model.fit(X, y, epochs=5000, # epochs : 지정한 횟수만큼 학습
                 batch_size=64) # batch_size : 한번에 학습하는 데이터의 수를 지정

model.predict(X)

array([[0.0930924 ],
       [0.9271625 ],
       [0.9132939 ],
       [0.08152722]], dtype=float32)

layer weight

-----------------------------------------------------
<tf.Variable 'dense_2/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[ 5.015614 ,  4.6440105],
       [-4.8828483, -4.8464456]], dtype=float32)>
-----------------------------------------------------
<tf.Variable 'dense_2/bias:0' shape=(2,) dtype=float32, numpy=array([ 2.504902 , -2.5688493], dtype=float32)>
-----------------------------------------------------
<tf.Variable 'dense_3/kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[-6.4844756],
       [ 6.8972507]], dtype=float32)>
-----------------------------------------------------
<tf.Variable 'dense_3/bias:0' shape=(1,) dtype=float32, numpy=array([2.9653125], dtype=float32)>