๐Ÿน ํŒŒ์ดํ† ์น˜๋กœ ๊ตฌํ˜„ํ•œ ์„ ํ˜•ํšŒ๊ท€

๋ฏผ๋‹ฌํŒฝ์ด์šฐ์œ ยท2024๋…„ 8์›” 6์ผ

๐Ÿน ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ดˆ

๋ชฉ๋ก ๋ณด๊ธฐ
2/4
post-thumbnail

๐Ÿ’ก 1. ๋‹จํ•ญ ์„ ํ˜• ํšŒ๊ท€

ํ•œ ๊ฐœ์˜ ์ž…๋ ฅ์ด ๋“ค์–ด๊ฐ€์„œ ํ•œ ๊ฐœ์˜ ์ถœ๋ ฅ์ด ๋‚˜์˜ค๋Š” ๊ตฌ์กฐ

import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
 torch.manual_seed(2024)
> <torch._C.Generator at 0x7f09a27705f0>
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[2], [4], [6]])
print(x_train, x_train.shape)
print(y_train, y_train.shape)
> tensor([[1.],
        [2.],
        [3.]]) torch.Size([3, 1])
> tensor([[2.],
        [4.],
        [6.]]) torch.Size([3, 1])
plt.figure(figsize=(6, 4))
plt.scatter(x_train, y_train)

# y = Wx + b
model = nn.Linear(1, 1) # ๋ฐ์ดํ„ฐ ํ•˜๋‚˜ ๋“ค์–ด์˜ค๋ฉด ์ถœ๋ ฅ ํ•˜๋‚˜
model
> Linear(in_features=1, out_features=1, bias=True)
y_pred = model(x_train) # ์•„์ง์€ ํ•™์Šต ์•ˆ๋ผ์žˆ์Œ
y_pred
> tensor([[0.7260],
        [0.7894],
        [0.8528]], grad_fn=<AddmmBackward0>)
list(model.parameters()) # W: 0.0634, b: 0.6625
# y = Wx + b
# x=1, 0.0634*1 + 0.6625 = 0.7259
# x=2, 0.0634*2 + 0.6625 = 0.7893
> [Parameter containing:
 tensor([[0.0634]], requires_grad=True),
 Parameter containing:
 tensor([0.6625], requires_grad=True)]
((y_pred - y_train)**2).mean() # ์˜ค์ฐจ ๊ณ„์‚ฐ
> tensor(12.8082, grad_fn=<MeanBackward0>)
loss = nn.MSELoss()(y_pred, y_train) # ํ•จ์ˆ˜๋กœ ์˜ค์ฐจ ๊ณ„์‚ฐ
loss
> tensor(12.8082, grad_fn=<MseLossBackward0>)
mse = nn.MSELoss() # ๊ฐ์ฒด ์ƒ์„ฑ
mse(y_pred, y_train)
> tensor(12.8082, grad_fn=<MseLossBackward0>)

๐Ÿ’ก 2. ์ตœ์ ํ™”(Optimization)

  • ํ•™์Šต ๋ชจ๋ธ์˜ ์†์‹คํ•จ์ˆ˜(loss function)์˜ ์ตœ์†Œ๊ฐ’์„ ์ฐพ์•„๊ฐ€๋Š” ๊ณผ์ •
  • ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฑธ์ณ ์˜ˆ์ธก ๊ฐ’์„ ๋ฐ›์Œ -> ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ์ •๋‹ต๊ณผ์˜ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด ์†์‹คํ•จ์ˆ˜์ด๊ณ , ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ ฏ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์ด ์ตœ์ ํ™”

2-1. ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(Gradient Descent)

  • ๋”ฅ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ•™์Šต ์‹œ ์‚ฌ์šฉ๋˜๋Š” ์ตœ์ ํ™” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜
  • ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด a์™€ b๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ •์„ 'ํ•™์Šต'์ด๋ผ๊ณ  ๋ถ€๋ฆ„

2-2. ํ•™์Šต๋ฅ (Learning rate)

  • ํ•œ ๋ฒˆ์˜ W๋ฅผ ์›€์ง์ด๋Š” ๊ฑฐ๋ฆฌ(increment step)
  • 0~1 ์‚ฌ์ด์˜ ์‹ค์ˆ˜
  • ํ•™์Šต๋ฅ ์ด ๋„ˆ๋ฌด ํฌ๋ฉด ํ•œ ์ง€์ ์œผ๋กœ ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋ฐœ์‚ฐํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์กด์žฌ
  • ํ•™์Šต๋ฅ ์ด ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ์ˆ˜๋ ด์ด ๋Šฆ์–ด์ง€๊ณ , ์‹œ์ž‘์ ์„ ์–ด๋””๋กœ ์žก๋А๋ƒ์— ๋”ฐ๋ผ ์ˆ˜๋ ด ์ง€์ ์ด ๋‹ฌ๋ผ์ง

2-3. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ํ•œ๊ณ„

  • ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰๊ณผ ์ปดํ“จํ„ฐ ์ž์›์„ ์†Œ๋ชจ
  • ๋ฐ์ดํ„ฐ(์ž…๋ ฅ๊ฐ’) ํ•˜๋‚˜๊ฐ€ ๋ชจ๋ธ์„ ์ง€๋‚  ๋•Œ๋งˆ๋‹ค ๋ชจ๋“  ๊ฐ€์ค‘์น˜๋ฅผ ํ•œ ๋ฒˆ์”ฉ ์—…๋ฐ์ดํŠธ ํ•จ
  • ๊ฐ€์ค‘์น˜๊ฐ€ ์ ์€ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ฌธ์ œ๊ฐ€ ์—†์œผ๋‚˜, ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ๋งค์šฐ ๋งŽ๋‹ค๋ฉด ๋ชจ๋“  ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•ด ์—ฐ์‚ฐ์„ ์ ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰์„ ์š”๊ตฌ
  • Global Minimum์€ ๋ชฉํ‘œ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„ ์ „์ฒด๋ฅผ ๊ณ ๋ คํ–ˆ์„ ๋•Œ ์ตœ์†Ÿ๊ฐ’์„ ์˜๋ฏธํ•˜๊ณ , Local Minimum์€ ๊ทธ๋ž˜ํ”„ ๋‚ด ์ผ๋ถ€๋งŒ ๊ณ ๋ คํ–ˆ์„ ๋•Œ ์ตœ์†Ÿ๊ฐ’์„ ์˜๋ฏธ -> ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์œผ๋กœ ์ตœ์ ์˜ ๊ฐ’์ธ ์ค„ ์•Œ์•˜๋˜ ๊ฐ’์ด Local Minimum์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Œ
# SGD(Stochastic Gradient Descent)
# ๋žœ๋คํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์”ฉ ๋ฝ‘์•„์„œ loss๋ฅผ ๋งŒ๋“ฆ
# ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๊ณ  ๋‹ค์‹œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘๊ณ  ๋ฐ˜๋ณต
# ๋น ๋ฅด๊ฒŒ ๋ฐฉํ–ฅ์„ ๊ฒฐ์ •
optimizer = optim.SGD(model.parameters(), lr=0.01) # ์˜ตํ‹ฐ๋งˆ์ด์ € ๊ฐ์ฒด
loss = nn.MSELoss()(y_pred, y_train)
# optimizer๋ฅผ ์ดˆ๊ธฐํ™”
# loss.backward() ํ˜ธ์ถœ๋  ๋•Œ ์ดˆ๊ธฐ์„ค์ •์€ gradient๋ฅผ ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ์œผ๋กœ ๋˜์–ด ์žˆ์Œ
# ํ•™์Šต loop๋ฅผ ๋Œ ๋•Œ ์ด์ƒ์ ์œผ๋กœ ํ•™์Šต์ด ์ด๋ฃจ์–ด์ง€๊ธฐ ์œ„ํ•ด์„œ ํ•œ ๋ฒˆ์˜ ํ•™์Šต์ด ์™„๋ฃŒ๋˜๋ฉด graidient๋ฅผ ํ•ญ์ƒ 0์œผ๋กœ ๋งŒ๋“ค์–ด์ค˜์•ผ ํ•จ
optimizer.zero_grad()
# ์—ญ์ „ํŒŒ: ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ๋ฏธ๋ถ„ํ•˜์—ฌ gradient ๊ณ„์‚ฐ
loss.backward()
# ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ: ๊ณ„์‚ฐ๋œ gradient๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ
optimizer.step()
# list(model.parameters()) # W: 0.0634, b: 0.6625
list(model.parameters()) # W: 0.2177 b: 0.7267
> [Parameter containing:
 tensor([[0.2177]], requires_grad=True),
 Parameter containing:
 tensor([0.7267], requires_grad=True)]
# ๋ฐ˜๋ณต ํ•™์Šต์„ ํ†ตํ•ด ์˜ค์ฐจ๊ฐ€ ์žˆ๋Š” W, b๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด์„œ ์˜ค์ฐจ๋ฅผ ๊ณ„์† ์ค„์—ฌ๋‚˜๊ฐ
# epochs: ๋ฐ˜๋ณต ํ•™์Šต ํšŸ์ˆ˜(์—ํฌํฌ)
epochs = 1000

for epoch in range(epochs + 1):
    y_pred = model(x_train)
    loss = nn.MSELoss()(y_pred, y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if epoch % 100 == 0:
        print(f'Epoch: {epoch}/{epochs} Loss: {loss:.6f}')
> Epoch: 0/1000 Loss: 10.171454
> Epoch: 100/1000 Loss: 0.142044
> Epoch: 200/1000 Loss: 0.087774
> Epoch: 300/1000 Loss: 0.054239
> Epoch: 400/1000 Loss: 0.033517
> Epoch: 500/1000 Loss: 0.020711
> Epoch: 600/1000 Loss: 0.012798
> Epoch: 700/1000 Loss: 0.007909
> Epoch: 800/1000 Loss: 0.004887
> Epoch: 900/1000 Loss: 0.003020
> Epoch: 1000/1000 Loss: 0.001866
print(list(model.parameters())) # W: 1.9499, b: 0.1138
> [Parameter containing:
tensor([[1.9499]], requires_grad=True), Parameter containing:
tensor([0.1138], requires_grad=True)]
x_test = torch.FloatTensor([[5]])
y_pred = model(x_test)
y_pred
> tensor([[9.8635]], grad_fn=<AddmmBackward0>)

๐Ÿ’ก 3. ๋‹ค์ค‘ ์„ ํ˜• ํšŒ๊ท€

  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ž…๋ ฅ์ด ๋“ค์–ด๊ฐ€์„œ ํ•œ ๊ฐœ์˜ ์ถœ๋ ฅ์ด ๋‚˜์˜ค๋Š” ๊ตฌ์กฐ
X_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[220], [270], [265], [290], [200]])
print(X_train, X_train.shape)
print(y_train, y_train.shape)
> tensor([[ 73.,  80.,  75.],
        [ 93.,  88.,  93.],
        [ 89.,  91.,  90.],
        [ 96.,  98., 100.],
        [ 73.,  66.,  70.]]) torch.Size([5, 3])
> tensor([[220.],
        [270.],
        [265.],
        [290.],
        [200.]]) torch.Size([5, 1])
# y = w1x1 + w2x2 + w3x3 + b
model = nn.Linear(3, 1) # ์ž…๋ ฅ3๊ฐœ ์ถœ๋ ฅ1๊ฐœ
model
> Linear(in_features=3, out_features=1, bias=True)
optimizer = optim.SGD(model.parameters(), lr=0.00001)
epochs = 20000

for epoch in range(epochs + 1):
  y_pred = model(X_train)
  loss = nn.MSELoss()(y_pred, y_train)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

  if epoch % 100 == 0:
      print(f'Epoch: {epoch}/{epochs} Loss: {loss:.6f}')
> Epoch: 0/20000 Loss: 75967.109375
> Epoch: 100/20000 Loss: 15.825180
> Epoch: 200/20000 Loss: 15.499573
...
list(model.parameters()) # W: 0.6814, 0.8616, 1.3889 b: -0.2950
> [Parameter containing:
 tensor([[0.6814, 0.8616, 1.3889]], requires_grad=True),
 Parameter containing:
 tensor([-0.2950], requires_grad=True)]
x_test = torch.FloatTensor([[100, 100, 100]])
y_pred = model(x_test)
y_pred
> tensor([[292.8991]], grad_fn=<AddmmBackward0>)
```~~ํ…์ŠคํŠธ~~
profile
์–ด๋–ป๊ฒŒ ํ–„์Šคํ„ฐ๊ฐ€ ๊ฐœ๋ฐœ์ž

0๊ฐœ์˜ ๋Œ“๊ธ€