๐Ÿน ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•

๋ฏผ๋‹ฌํŒฝ์ด์šฐ์œ ยท2024๋…„ 9์›” 18์ผ

๐Ÿน ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ดˆ

๋ชฉ๋ก ๋ณด๊ธฐ
4/4
post-thumbnail

๐Ÿ’ก 1. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ์ข…๋ฅ˜

1-1. ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•

  • ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Vanilla Gradient Descent)
  • ๋ฐ์ดํ„ฐ์…‹ ์ „์ฒด๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์†์‹คํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐ
  • ํ•œ ๋ฒˆ์˜ Epoch์— ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ๋ฅผ ๋‹จ ํ•œ ๋ฒˆ๋งŒ ์ˆ˜ํ–‰
  • ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธํž ๋•Œ ํ•œ ๋ฒˆ์˜ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ๊ณ ๋ คํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ ํ•™์Šต ์‹œ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Œ

1-2. ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•

  • ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Stochastoc Gradient Descent)์€ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์ด ๋ชจ๋ธ ํ•™์Šต ์‹œ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•
  • batch size๋ฅผ 1๋กœ ์„ค์ •ํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๊ณ  ์ ์€ ๋ฉ”๋ชจ๋ฆฌ๋กœ ํ•™์Šต์„ ์ง„ํ–‰
  • ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์˜ ์—…๋ฐ์ดํŠธ ํญ์ด ๋ถˆ์•ˆ์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์€ ๊ฒฝ์šฐ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ์Œ

1-3. ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•

  • ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Mini-Batch Gradient Descent)์€ Batch Size๋ฅผ ์„ค์ •ํ•œ size๋กœ ์‚ฌ์šฉ
  • ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•๋ณด๋‹ค ๋ชจ๋ธ ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ , ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•๋ณด๋‹ค ์•ˆ์ •์ ์ธ ์žฅ์ ์ด ์žˆ์Œ
  • ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ํ™œ์šฉ๋˜๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•
  • ์ผ๋ฐ˜์ ์œผ๋กœ Batch Size๋ฅผ 4, 8, 16, 32, 64, 128๊ณผ ๊ฐ™์ด 2์˜ n์ œ๊ณฑ์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š”๊ฒŒ ๊ด€๋ก€์ 

๐Ÿ’ก 2. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜

2-1. SGD(ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•)

  • ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ์กฐ์ • ์‹œ ์ „์ฒด ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹ˆ๋ผ ๋žœ๋ค์œผ๋กœ ์„ ํƒํ•œ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋งŒ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•

2-2. ๋ชจ๋ฉ˜ํ…€(Momentum)

  • ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ๋„์ž…๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • ๊ด€์„ฑ์ด๋ผ๋Š” ๋ฌผ๋ฆฌํ•™ ๋ฒ•์น™์„ ์‘์šฉํ•œ ๋ฐฉ๋ฒ•
  • ์ ‘์„ ์˜ ๊ธฐ์šธ๊ธฐ์— ํ•œ ์‹œ์  ์ด์ „์˜ ์ ‘์„ ์˜ ๊ธฐ์šธ๊ธฐ ๊ฐ’์„ ์ผ์ •ํ•œ ๋น„์œจ๋งŒํผ ๋ฐ˜์˜
  • ์ด์ „ ๊ธฐ์šธ๊ธฐ์˜ ์ด๋™ ํ‰๊ท ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ ๊ธฐ์šธ๊ธฐ๋ฅผ ์—…๋ฐ์ดํŠธ
  • ๊ฐ€์†๋„๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•๋ณด๋‹ค ๋น ๋ฅด๊ฒŒ ์ตœ์†Œ๊ฐ’์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ์Œ

2-3. ์•„๋‹ค๊ทธ๋ผ๋“œ(Adagrad)

  • ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋™์ผํ•œ ํ•™์Šต๋ฅ (lr)์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋น„ํšจ์œจ์ ์ด๋‹ค๋ผ๋Š” ์ƒ๊ฐ์—์„œ ๋งŒ๋“ค์–ด์ง„ ํ•™์Šต ๋ฐฉ๋ฒ•
  • ์ฒ˜์Œ์—๋Š” ํฌ๊ฒŒ ํ•™์Šตํ•˜๋‹ค๊ฐ€ ์กฐ๊ธˆ์”ฉ ์ž‘๊ฒŒ ํ•™์Šต์‹œํ‚ด
  • ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋งž์ถคํ˜• ํ•™์Šต๋ฅ ์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ํฌ์†Œํ•œ ๋ฐ์ดํ„ฐ์—์„œ ์œ ๋ฆฌํ•จ
  • ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ํ•™์Šต๋ฅ ์ด ๊ณ„์† ๊ฐ์†Œํ•˜์—ฌ ํ•™์Šต์„ ๋ฉˆ์ถœ ์ˆ˜ ์žˆ์Œ

2-4. ์•„๋‹ด(Adam)

  • ๋ชจ๋ฉ˜ํ…€ + ์•„๋‹ค๊ทธ๋ผ๋“œ
  • ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์ ์‘ํ˜• ํ•™์Šต๋ฅ ์„ ์ ์šฉํ•˜๋ฉฐ, ๊ณผ๊ฑฐ์˜ ๊ธฐ์šธ๊ธฐ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด ํ˜„์žฌ์˜ ํ•™์Šต๋ฅ ์„ ์กฐ์ ˆ
  • AdamW: Adam์˜ ๋ณ€ํ˜•์œผ๋กœ L2์ •๊ทœํ™”(๊ฐ€์ค‘์น˜ ๊ฐ์‡ )๋ฅผ ๋ณ„๋„๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋” ๋‚˜์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ œ๊ณต, L2 ์ •๊ทœํ™”๊ฐ€ ํ•™์Šต๋ฅ  ์กฐ์ •๊ณผ ์„ž์—ฌ ๋ถˆ์•ˆ์ •ํ•œ ํ•™์Šต์„ ์ดˆ๋ž˜ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ

๐Ÿ’ก 3. ์™€์ธ ํ’ˆ์ข… ์˜ˆ์ธกํ•˜๊ธฐ

  • sklearn.datasets.load_wine: ์ดํƒˆ๋ฆฌ์•„์˜ ๊ฐ™์€ ์ง€์—ญ์—์„œ ์žฌ๋ฐฐ๋œ ์„ธ๊ฐ€์ง€ ๋‹ค๋ฅธ ํ’ˆ์ข…์œผ๋กœ ๋งŒ๋“  ์™€์ธ์„ ํ™”ํ•™์ ์œผ๋กœ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์…‹
  • 13๊ฐœ์˜ ์„ฑ๋ถ„์„ ๋ถ„์„ํ•˜์—ฌ ์–ด๋–ค ์™€์ธ์ธ์ง€ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ชจ๋ธ์„ ๊ตฌ์ถ•
  • ๋ฐ์ดํ„ฐ๋ฅผ ์„ž์€ ํ›„ train ๋ฐ์ดํ„ฐ๋ฅผ 80%, test ๋ฐ์ดํ„ฐ๋ฅผ 20%๋กœ ํ•˜์—ฌ ์‚ฌ์šฉ
  • Adam์„ ์‚ฌ์šฉ
    • optimizer = optim.Adam(model.parameters(), lr=0.01)
  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ 0๋ฒˆ ์ธ๋ฑ์Šค๊ฐ€ ์–ด๋–ค ์™€์ธ์ธ์ง€ ์•Œ์•„๋ณด์ž. ์ •ํ™•๋„๋ฅผ ์ถœ๋ ฅ
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
x_data, y_data = load_wine(return_X_y=True, as_frame=True)

x_data = torch.FloatTensor(x_data.values)
y_data = torch.LongTensor(y_data.values)

print(x_data.shape)
print(y_data.shape)
> torch.Size([178, 13])
> torch.Size([178])
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2, random_state=2024)

print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
torch.Size([142, 13]) torch.Size([142])
torch.Size([36, 13]) torch.Size([36])
model = nn.Sequential(
    nn.Linear(13, 3)
)

optimizer = optim.Adam(model.parameters(), lr=0.01)

epochs = 1000

for epoch in range(epochs + 1):
  y_pred = model(x_train)
  loss = nn.CrossEntropyLoss()(y_pred, y_train)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

  if epoch % 100 == 0:
    y_prob = nn.Softmax(1)(y_pred)
    y_pred_index = torch.argmax(y_prob, axis=1)
    y_train_index = y_train
    accuracy = (y_pred_index == y_train_index).float().sum() / len(y_train) * 100
    print(f'Epoch {epoch:4d}/{epochs} Loss:{loss: .6f} Accuracy: {accuracy: .2f}%')
Epoch    0/1000 Loss: 100.665215 Accuracy:  25.35%
Epoch  100/1000 Loss: 0.302390 Accuracy:  89.44%
Epoch  200/1000 Loss: 0.197123 Accuracy:  92.25%
Epoch  300/1000 Loss: 0.158662 Accuracy:  94.37%
Epoch  400/1000 Loss: 0.137645 Accuracy:  95.77%
Epoch  500/1000 Loss: 0.123086 Accuracy:  97.18%
Epoch  600/1000 Loss: 0.111958 Accuracy:  98.59%
Epoch  700/1000 Loss: 0.102981 Accuracy:  98.59%
Epoch  800/1000 Loss: 0.095471 Accuracy:  98.59%
Epoch  900/1000 Loss: 0.089016 Accuracy:  98.59%
Epoch 1000/1000 Loss: 0.083348 Accuracy:  98.59%
y_pred = model(x_test)
y_pred[:5]
> tensor([[-28.7244, -30.6228, -22.7633],
        [-51.6261, -58.9351, -60.1704],
        [-17.1980, -12.4382, -12.0121],
        [-54.0891, -59.6118, -59.6391],
        [-30.0164, -31.9313, -35.4247]], grad_fn=<SliceBackward0>)
y_prob = nn.Softmax(1)(y_pred)
y_prob[:5]
> tensor([[2.5695e-03, 3.8493e-04, 9.9705e-01],
        [9.9914e-01, 6.6892e-04, 1.9448e-04],
        [3.3733e-03, 3.9373e-01, 6.0290e-01],
        [9.9218e-01, 3.9641e-03, 3.8571e-03],
        [8.6818e-01, 1.2793e-01, 3.8885e-03]], grad_fn=<SliceBackward0>)
print(f'0๋ฒˆ ํ’ˆ์ข…์ผ ํ™•๋ฅ : {y_prob[0][0]:.2f}')
print(f'1๋ฒˆ ํ’ˆ์ข…์ผ ํ™•๋ฅ : {y_prob[0][1]:.2f}')
print(f'2๋ฒˆ ํ’ˆ์ข…์ผ ํ™•๋ฅ : {y_prob[0][2]:.2f}')
> 0๋ฒˆ ํ’ˆ์ข…์ผ ํ™•๋ฅ : 0.00
> 1๋ฒˆ ํ’ˆ์ข…์ผ ํ™•๋ฅ : 0.00
> 2๋ฒˆ ํ’ˆ์ข…์ผ ํ™•๋ฅ : 1.00
y_pred_index = torch.argmax(y_prob, axis=1)
accuracy = (y_test == y_pred_index).float().sum() / len(y_test) * 100
print(f'ํ…Œ์ŠคํŠธ ์ •ํ™•๋„๋Š” {accuracy: .2f}% ์ž…๋‹ˆ๋‹ค!')
> ํ…Œ์ŠคํŠธ ์ •ํ™•๋„๋Š”  94.44% ์ž…๋‹ˆ๋‹ค!
profile
์–ด๋–ป๊ฒŒ ํ–„์Šคํ„ฐ๊ฐ€ ๊ฐœ๋ฐœ์ž

0๊ฐœ์˜ ๋Œ“๊ธ€