[Deep Learning] MNIST Categorical Classification

김희진·2021년 3월 31일

DeepLearning

🎠 데이터 불러오기

🎠 데이터 전처리

🎠 모델링

🎠 결과

DeepLearning

목록 보기

8/12

MNIST는 손으로 쓴 숫자 글씨를 모아놓은 데이터세트이다. 흑백 이미지이고 범주가 10개로 구성되어 있으며 28 * 28 픽셀이다.

🎠 데이터 불러오기

from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

import matplotlib.pyplot as plt

digit = X_train[1]
plt.imshow(digit, cmap = 'gray')
plt.show()

위 코드로 데이터 중 하나를 이미지 형태로 확인해볼 수 있다.

🎠 데이터 전처리

X_train = X_train.reshape((60000, 28 * 28))
X_test = X_test.reshape((10000, 28 * 28))

Dense 레이어에 넣기 위해 위와 같은 형태로 바꿔준다.

X_train = X_train.astype(float) / 255
X_test = X_test.astype(float) / 255

0 ~ 255 사이의 값을 가지기 때문에 위와 같이 normalization 해준다.

from keras.utils import to_categorical

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

y가 0 ~ 9 사이의 값을 가지기 때문에 One-Hot Encoding을 통해 값을 변경해준다.

🎠 모델링

from keras import models
from keras import layers

mnist = models.Sequential()
mnist.add(layers.Dense(512, activation = 'relu', input_shape = (28 * 28,)))
mnist.add(layers.Dense(256, activation = 'relu'))
mnist.add(layers.Dense(10, activation = 'softmax'))

mnist.compile(loss = 'categorical_crossentropy', optimizer = 'rmsprop', metrics = ['accuracy'])

다중 분류 문제이기 때문에 categorical_crossentropy로 설정한다.

hist = mnist.fit(X_train, y_train,
epochs = 100,
batch_size = 128,
validation_split = 0.2)

별도로 validation data를 지정하지 않았기 때문에 validation_spilt을 통해 0.2의 validation data를 지정한다.

🎠 결과

위 모델의 loss, val_loss를 시각화해보면 Training Loss는 Epoch가 증가할수록 감소하는 모습을 보이지만 Validation Loss는 오히러 더 증가하는 것을 볼 수 있다. 이를 통해 우리는 위 모델이 Overfitting 되었다는 것을 알 수 있다.