ํด๋น ๊ธ์ FastCampus - '[skill-up] ์ฒ์๋ถํฐ ์์ํ๋ ๋ฅ๋ฌ๋ ์ ์น์ ๊ฐ์๋ฅผ ๋ฃ๊ณ ,
์ถ๊ฐ ํ์ตํ ๋ด์ฉ์ ๋ง๋ถ์ฌ ์์ฑํ์์ต๋๋ค.

์ด์ง ๋ถ๋ฅ (Binary classification)
- Sigmoid์ ์ถ๋ ฅ ๊ฐ์ 0 ~ 1์ด๋ฏ๋ก, ํ๋ฅ ๊ฐ P(y|x)์ผ๋ก ์๊ฐํด๋ณผ ์ ์์.
- ์ ๊ฒฝ๋ง์ True ํด๋์ค์ ํ๋ฅ ๊ฐ์ ๋ฑ์ด๋ธ๋ค๊ณ ์ ์ํ์ฌ Classificaiton์ ํ๋ฅ ๋ฌธ์ ๋ก ์นํํ ์ ์์
One-hot encoding
- ํ๋์ ๊ฐ๋ง 1์ด๊ณ ๋๋จธ์ง๋ 0์ธ ๋ฒกํฐ๋ก ํํํ๋ ๋ฐฉ์
- ex)
Dog [1, 0, 0]
Cat [0, 1, 0]
Bird [0, 0, 1]- ์นดํ ๊ณ ๋ฆฌ ์๊ฐ ๋ง์์ง๋ฉด ๋ฒกํฐ๊ฐ ๊ณ ์ฐจ์(sparse) โ ๋ฉ๋ชจ๋ฆฌ ๋นํจ์จ
- ๋์์ผ๋ก๋ embedding ๋ฐฉ๋ฒ์ด ๋ง์ด ์ฐ์ (ํนํ ๋ฅ๋ฌ๋์์)
ํญ๋ชฉ One-hot Encoding Embedding ํํ ๋ฐฉ์ 0๊ณผ 1์ ํฌ์ ๋ฒกํฐ ์ฐ์์ ์ธ ์ค์ ๋ฒกํฐ ์ฐจ์ ํด๋์ค ๊ฐ์๋งํผ (๋ณดํต ๋งค์ฐ ํผ) ํ์ดํผํ๋ผ๋ฏธํฐ๋ก ์ง์ ๋ ์ ์ฐจ์ (์: 5์ฐจ์) ์ ๋ณด๋ ์๋ฏธ ์์ (์์ยท์ ์ฌ๋ ็ก) ์ ์ฌ๋, ์๋ฏธ๋ฅผ ๋ฒกํฐ ๊ณต๊ฐ์ ๋ฐ์ ๋ฉ๋ชจ๋ฆฌ ํจ์จ ๋งค์ฐ ๋นํจ์จ์ (sparse) ํจ์จ์ (dense) ์์ [1, 0, 0], [0, 1, 0], [0, 0, 1] [0.12, 0.3, ..., -0.4], [0.51, ..., 0.06], [0.02, ..., -0.11]
๋ค์ค ํด๋์ค ๋ถ๋ฅ์์ ์ฌ์ฉ๋๋ ์์ค ํจ์
Binary Cross Entropy์ ์ผ๋ฐํ ํํ
Softmax์ ํจ๊ป ์ฌ์ฉ๋์ด ๋ถ๋ฅ ๋ฌธ์ ์ ์ ํฉ
์ค์ ์ ๋ต ํด๋์ค์ ํด๋นํ๋ ํ๋ฅ ๊ฐ์ด ๋์์ง๋๋ก ํ์ต
๊ฐ log-softmax์ ๊ฒฐ๊ณผ

| Task | Output | Activation | Loss Function |
|---|---|---|---|
| Regression | Real Value | Linear (None) | MSE Loss |
| Binary Classification | 0 or 1 | Sigmoid | Binary Cross Entropy |
| Multi-class Classification | Class | Softmax | Cross Entropy Loss |
# Classification with Deep Neural Networks
## Load MNIST Dataset
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import pandas as pd
from sklearn.metrics import confusion_matrix
from torchvision import datasets, transforms
train = datasets.MNIST(
'../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
]),
)
test = datasets.MNIST(
'../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
]),
)
def plot(x):
img = (np.array(x.detach().cpu(), dtype='float')).reshape(28,28)
plt.imshow(img, cmap='gray')
plt.show()
plot(train.data[0])
x = train.data.float() / 255.
y = train.targets
print(x.shape, y.shape)
x = x.view(x.size(0), -1)
print(x.shape, y.shape)
input_size = x.size(-1)
output_size = int(max(y)) + 1
print('input_size: %d, output_size: %d' % (input_size, output_size))
# Train / Valid ratio
ratios = [.8, .2]
train_cnt = int(x.size(0) * ratios[0])
valid_cnt = int(x.size(0) * ratios[1])
test_cnt = len(test.data)
cnts = [train_cnt, valid_cnt]
print("Train %d / Valid %d / Test %d samples." % (train_cnt, valid_cnt, test_cnt))
indices = torch.randperm(x.size(0))
x = torch.index_select(x, dim=0, index=indices)
y = torch.index_select(y, dim=0, index=indices)
x = list(x.split(cnts, dim=0))
y = list(y.split(cnts, dim=0))
x += [(test.data.float() / 255.).view(test_cnt, -1)]
y += [test.targets]
for x_i, y_i in zip(x, y):
print(x_i.size(), y_i.size())
## Build Model & Optimizer
model = nn.Sequential(
nn.Linear(input_size, 500),
nn.LeakyReLU(),
nn.Linear(500, 400),
nn.LeakyReLU(),
nn.Linear(400, 300),
nn.LeakyReLU(),
nn.Linear(300, 200),
nn.LeakyReLU(),
nn.Linear(200, 100),
nn.LeakyReLU(),
nn.Linear(100, 50),
nn.LeakyReLU(),
nn.Linear(50, output_size),
nn.LogSoftmax(dim=-1),
)
model
crit = nn.NLLLoss()
optimizer = optim.Adam(model.parameters())
## Move to GPU if it is available
device = torch.device('cpu')
if torch.cuda.is_available():
device = torch.device('cuda')
model = model.to(device)
x = [x_i.to(device) for x_i in x]
y = [y_i.to(device) for y_i in y]
## Train
n_epochs = 1000
batch_size = 256
print_interval = 10
from copy import deepcopy
lowest_loss = np.inf
best_model = None
early_stop = 50
lowest_epoch = np.inf
train_history, valid_history = [], []
for i in range(n_epochs):
indices = torch.randperm(x[0].size(0))
x_ = torch.index_select(x[0], dim=0, index=indices)
y_ = torch.index_select(y[0], dim=0, index=indices)
x_ = x_.split(batch_size, dim=0)
y_ = y_.split(batch_size, dim=0)
train_loss, valid_loss = 0, 0
y_hat = []
for x_i, y_i in zip(x_, y_):
y_hat_i = model(x_i)
loss = crit(y_hat_i, y_i.squeeze())
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss += float(loss) # This is very important to prevent memory leak.
train_loss = train_loss / len(x_)
with torch.no_grad():
x_ = x[1].split(batch_size, dim=0)
y_ = y[1].split(batch_size, dim=0)
valid_loss = 0
for x_i, y_i in zip(x_, y_):
y_hat_i = model(x_i)
loss = crit(y_hat_i, y_i.squeeze())
valid_loss += float(loss)
y_hat += [y_hat_i]
valid_loss = valid_loss / len(x_)
train_history += [train_loss]
valid_history += [valid_loss]
if (i + 1) % print_interval == 0:
print('Epoch %d: train loss=%.4e valid_loss=%.4e lowest_loss=%.4e' % (
i + 1,
train_loss,
valid_loss,
lowest_loss,
))
if valid_loss <= lowest_loss:
lowest_loss = valid_loss
lowest_epoch = i
best_model = deepcopy(model.state_dict())
else:
if early_stop > 0 and lowest_epoch + early_stop < i + 1:
print("There is no improvement during last %d epochs." % early_stop)
break
print("The best validation loss from epoch %d: %.4e" % (lowest_epoch + 1, lowest_loss))
model.load_state_dict(best_model)
## Loss History
plot_from = 0
plt.figure(figsize=(20, 10))
plt.grid(True)
plt.title("Train / Valid Loss History")
plt.plot(
range(plot_from, len(train_history)), train_history[plot_from:],
range(plot_from, len(valid_history)), valid_history[plot_from:],
)
plt.yscale('log')
plt.show()
## Let's see the result!
test_loss = 0
y_hat = []
with torch.no_grad():
x_ = x[-1].split(batch_size, dim=0)
y_ = y[-1].split(batch_size, dim=0)
for x_i, y_i in zip(x_, y_):
y_hat_i = model(x_i)
loss = crit(y_hat_i, y_i.squeeze())
test_loss += loss # Gradient is already detached.
y_hat += [y_hat_i]
test_loss = test_loss / len(x_)
y_hat = torch.cat(y_hat, dim=0)
print("Validation loss: %.4e" % test_loss)
correct_cnt = (y[-1].squeeze() == torch.argmax(y_hat, dim=-1)).sum()
total_cnt = float(y[-1].size(0))
print('Accuracy: %.4f' % (correct_cnt / total_cnt))
pd.DataFrame(confusion_matrix(y[-1], torch.argmax(y_hat, dim=-1)),
index=['true_%d' % i for i in range(10)],
columns=['pred_%d' % i for i in range(10)])
