โ‘ฃ ๐Ÿค– Machine Learning 1์ผ์ฐจ - SGD, ๊ฒฐ์ •๊ณ„์ˆ˜

JItzelยท2025๋…„ 12์›” 10์ผ

๐Ÿก Machine_learning

๋ชฉ๋ก ๋ณด๊ธฐ
4/14

SGD(๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•)์™€ ๋ชจ๋ธ ํ‰๊ฐ€(๊ฒฐ์ •๊ณ„์ˆ˜ R2R^2)

1. SGD(Stochastic Gradient Descent) ๋ž€?

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์€ ์†์‹ค ํ•จ์ˆ˜(Loss Function)์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜์—ฌ, ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋‚ฎ์€ ์ชฝ์œผ๋กœ ๊ณ„์† ์ด๋™์‹œ์ผœ ๊ทน์†Œ๊ฐ’(์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ)์„ ์ฐพ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

  • LinearRegression: ์ •๊ทœ ๋ฐฉ์ •์‹(OLS)์„ ์‚ฌ์šฉํ•ด ํ•œ ๋ฒˆ์— ํ•ด๋ฅผ ๊ตฌํ•จ (๋ฐ์ดํ„ฐ๊ฐ€ ์ž‘์„ ๋•Œ ์œ ๋ฆฌ)
  • SGDRegressor: ๊ธฐ์šธ๊ธฐ๋ฅผ ๋”ฐ๋ผ ์กฐ๊ธˆ์”ฉ ์ด๋™ํ•˜๋ฉฐ ํ•ด๋ฅผ ๊ตฌํ•จ (๋ฐ์ดํ„ฐ๊ฐ€ ๋งค์šฐ ํด ๋•Œ, ์ ์ง„์  ํ•™์Šต์ด ํ•„์š”ํ•  ๋•Œ ์œ ๋ฆฌ)

์˜ˆ์ œ: SGD๋กœ ์ž๋™์ฐจ ์ œ๋™๊ฑฐ๋ฆฌ ์˜ˆ์ธกํ•˜๊ธฐ

  • ๊ธฐ์กด์˜ LinearRegression ๋Œ€์‹  SGDRegressor๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ

1) ๋ฐ์ดํ„ฐ ์ค€๋น„

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from sklearn.linear_model import SGDRegressor # SGD ๋ชจ๋ธ ์ž„ํฌํŠธ

# ์‹œ๊ฐํ™” ํฐํŠธ ์„ค์ •
matplotlib.rcParams['font.family']='Malgun Gothic'
matplotlib.rcParams['axes.unicode_minus'] = False

# ๋ฐ์ดํ„ฐ ๋กœ๋“œ
carDF = pd.read_csv('data/cars.csv', index_col='Unnamed: 0')

# ๋…๋ฆฝ๋ณ€์ˆ˜(X)์™€ ์ข…์†๋ณ€์ˆ˜(y) ์„ค์ •
x = carDF.iloc[:, :-1]
y = carDF.iloc[:, [-1]] 
# Tip: SGDRegressor์—์„œ y๋Š” ๋ณดํ†ต 1์ฐจ์› ๋ฐฐ์—ด(Series or ravel)์„ ๊ถŒ์žฅํ•˜์ง€๋งŒ, 
# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•ํƒœ๋„ ํ—ˆ์šฉ๋จ. (warning์ด ๋œฐ ์ˆ˜ ์žˆ์Œ)

2) ๋ชจ๋ธ ํ•™์Šต (Verbose ์˜ต์…˜ ํ™œ์šฉ)

SGD๋Š” ๋ฐ˜๋ณต ํ•™์Šต์„ ํ•˜๋ฏ€๋กœ ์–ด๋–ป๊ฒŒ ํ•™์Šต๋˜๋Š”์ง€ ๊ณผ์ •์„ ์ง€์ผœ๋ณด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

  • verbose=1: ํ•™์Šต ๊ณผ์ •(๋กœ๊ทธ)์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • n_iter_no_change=100: 100๋ฒˆ ๋ฐ˜๋ณตํ•˜๋Š” ๋™์•ˆ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์—†์œผ๋ฉด ํ•™์Šต์„ ์กฐ๊ธฐ ์ข…๋ฃŒ(Early Stopping)
model = SGDRegressor(verbose=1, n_iter_no_change=100)
model.fit(x, y)

# --- ์‹คํ–‰ ๊ฒฐ๊ณผ ๋กœ๊ทธ (์˜ˆ์‹œ) ---
# -- Epoch 1
# Norm: 3.82, NNZs: 1, Bias: 0.389145, T: 50, Avg. loss: 622.827902
# Total training time: 0.00 seconds.
# ...
# -- Epoch 170
# Norm: 3.85, NNZs: 1, Bias: -12.900648, T: 8500, Avg. loss: 156.947071
# Convergence after 170 epochs took 0.00 seconds

โ†’\rightarrow Avg. loss(ํ‰๊ท  ์˜ค์ฐจ)๊ฐ€ ์ ์  ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

3) ๊ฒฐ๊ณผ ํ™•์ธ ๋ฐ ์‹œ๊ฐํ™”

# ํ•™์Šต๋œ ๊ธฐ์šธ๊ธฐ์™€ ์ ˆํŽธ
print("๊ธฐ์šธ๊ธฐ(w):", model.coef_)
print("์ ˆํŽธ(b):", model.intercept_)
# ๊ธฐ์šธ๊ธฐ(w): [3.84672995]
# ์ ˆํŽธ(b): [-12.90064768]

# ์‹œ๊ฐํ™”
pred = model.predict(x)

plt.scatter(x, y, label='์‹ค์ œ๊ฐ’')
plt.plot(x, pred, 'r--', label='SGD ์˜ˆ์ธก์„ ')
plt.legend()
plt.show()


2. ๊ฒฐ์ •๊ณ„์ˆ˜ (R2R^2 Score, R-squared)

  • ๋ชจ๋ธ์„ ๋งŒ๋“ค์—ˆ๋‹ค๋ฉด "์ด ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ•œ๊ฐ€?" ๋ฅผ ํ‰๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค. ํšŒ๊ท€ ๋ถ„์„์—์„œ ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ํ‰๊ฐ€์ง€ํ‘œ๊ฐ€ ๋ฐ”๋กœ ๊ฒฐ์ •๊ณ„์ˆ˜(R2R^2)์ด๋‹ค.

๊ฐœ๋… ์ •๋ฆฌ
ํšŒ๊ท€ ๋ชจ๋ธ์ด ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋Š”์ง€๋ฅผ 0 ~ 1 ์‚ฌ์ด์˜ ์ˆซ์ž๋กœ ๋‚˜ํƒ€๋ƒ„.

  • 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก: ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ์„ค๋ช…ํ•จ (์ข‹์Œ)
  • 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก: ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท  ์ •๋„๋ฐ–์— ์„ค๋ช… ๋ชปํ•จ (๋‚˜์จ)
  • ๋ณดํ†ต 0.5 ์ด์ƒ์ด๋ฉด ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค๊ณ  ํŒ๋‹จํ•˜๋ฉฐ, 0.7~0.8 ์ด์ƒ์ด๋ฉด ๊ฝค ๋†’์€ ์„ฑ๋Šฅ์œผ๋กœ ๋ณธ๋‹ค.

์ˆ˜์‹ ์ดํ•ดํ•˜๊ธฐ (SSE, SST, SSR)

  • ๊ฒฐ์ •๊ณ„์ˆ˜๋ฅผ ์ดํ•ดํ•˜๋ ค๋ฉด ๋ถ„์‚ฐ์˜ ๋ถ„ํ•ด ๊ณผ์ •์„ ์•Œ์•„์•ผ ํ•œ๋‹ค.

SST=SSR+SSESST = SSR + SSE

1. SST (Total Sum of Squares, ์ด ์ œ๊ณฑํ•ฉ):

  • ์‹ค์ œ๊ฐ’(yy)์ด ํ‰๊ท (yห‰\bar{y})์œผ๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ํฉ์–ด์ ธ ์žˆ๋Š”๊ฐ€? (๋ฐ์ดํ„ฐ ์ž์ฒด์˜ ๋ถ„์‚ฐ)
  • โˆ‘(yโˆ’yห‰)2\sum(y - \bar{y})^2

2. SSE (Sum of Squared Errors, ์˜ค์ฐจ ์ œ๊ณฑํ•ฉ):

  • ์‹ค์ œ๊ฐ’(yy)๊ณผ ์˜ˆ์ธก๊ฐ’(y^\hat{y})์˜ ์ฐจ์ด. ์ฆ‰, ๋ชจ๋ธ์ด ๋งž์ถ”์ง€ ๋ชปํ•œ ์—๋Ÿฌ.
  • โˆ‘(yโˆ’y^)2\sum(y - \hat{y})^2
  • ์ž”์ฐจ(Residual)๊ฐ€ ์ž‘์„์ˆ˜๋ก 0์— ๊ฐ€๊นŒ์›Œ์ง.

3. SSR (Sum of Squares due to Regression, ํšŒ๊ท€ ์ œ๊ณฑํ•ฉ):

  • ์˜ˆ์ธก๊ฐ’(y^\hat{y})์ด ํ‰๊ท (yห‰\bar{y})์œผ๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”๊ฐ€?
  • ๋ชจ๋ธ์ด ํ‰๊ท ๋ณด๋‹ค ์–ผ๋งˆ๋‚˜ ๋” ์ž˜ ์„ค๋ช…ํ–ˆ๋Š”์ง€๋ฅผ ์˜๋ฏธ.

์ตœ์ข… ๊ณต์‹

R2=1โˆ’SSESST=SSRSSTR^2 = 1 - \frac{SSE}{SST} = \frac{SSR}{SST}

์˜ˆ์‹œ) ์ง‘๊ฐ’ ์˜ˆ์ธก

  • ๋ชจ๋ธ์ด ์—‰๋ง์ผ ๋•Œ: ์ง‘๊ฐ’๊ณผ ์ƒ๊ด€์—†๋Š” ํ‰์ˆ˜๋กœ ์˜ˆ์ธก์„ ํ•œ๋‹ค๋ฉด, ๊ทธ๋ƒฅ ํ‰๊ท ๊ฐ’์œผ๋กœ ์ฐ๋Š” ๊ฒƒ์ด๋‚˜ ๋‹ค๋ฆ„์—†๋‹ค. ์ด๋•Œ ์˜ค์ฐจ(SSE)๋Š” ์ „์ฒด ๋ถ„์‚ฐ(SST)๊ณผ ๊ฐ™์•„์ ธ์„œ 1โˆ’1=01 - 1 = 0์ด ๋œ๋‹ค.
  • ๋ชจ๋ธ์ด ์™„๋ฒฝํ•  ๋•Œ: ์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์ด ๋˜‘๊ฐ™๋‹ค๋ฉด ์˜ค์ฐจ(SSE)๊ฐ€ 0์ด ๋œ๋‹ค. ์ด๋•Œ 1โˆ’0=11 - 0 = 1์ด ๋œ๋‹ค.

3. Scikit-learn์—์„œ R2R^2 ๊ตฌํ•˜๊ธฐ

1) r2_score ํ•จ์ˆ˜ ์‚ฌ์šฉ

  • sklearn.metrics ๋ชจ๋“ˆ์„ ์‚ฌ์šฉ. (์‹ค์ œ๊ฐ’, ์˜ˆ์ธก๊ฐ’) ์ˆœ์„œ๋กœ ๋„ฃ์–ด์คŒ.
from sklearn.metrics import r2_score

# r2_score(y_true, y_pred)
# ์ฃผ์˜: y๋Š” 1์ฐจ์› ๋ฐฐ์—ด ํ˜•ํƒœ์—ฌ์•ผ ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค. y๊ฐ€ 2์ฐจ์›์ด๋ฉด ๊ฒฝ๊ณ ๊ฐ€ ๋œฐ ์ˆ˜ ์žˆ์Œ.
score = r2_score(y, pred)
print(f"๊ฒฐ์ •๊ณ„์ˆ˜: {score}")

2) model.score() ๋ฉ”์„œ๋“œ ์‚ฌ์šฉ (์ถ”์ฒœ)

  • ๋ชจ๋ธ ๊ฐ์ฒด ์ž์ฒด์— ๋‚ด์žฅ๋œ ํ•จ์ˆ˜. (์ž…๋ ฅ๊ฐ’ X, ์‹ค์ œ๊ฐ’ y)๋ฅผ ๋„ฃ์œผ๋ฉด ์•Œ์•„์„œ ์˜ˆ์ธก ํ›„ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐ.
# model.score(X, y)
model_score = model.score(x, y)
print(f"๋ชจ๋ธ ์ ์ˆ˜: {model_score}")

# ์ถœ๋ ฅ ์˜ˆ์‹œ
# 0.6334328883976291

โ†’\rightarrow ํ•ด์„: ์ด ๋ชจ๋ธ์€ ์ž๋™์ฐจ ์†๋„์— ๋”ฐ๋ฅธ ์ œ๋™๊ฑฐ๋ฆฌ ๋ณ€๋™์˜ ์•ฝ 63.3%๋ฅผ ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋‹ค.


์š”์•ฝ

  • SGD(๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•): ์˜ค์ฐจ๋ฅผ ์ค„์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ธฐ์šธ๊ธฐ๋ฅผ ์กฐ๊ธˆ์”ฉ ์ˆ˜์ •ํ•˜๋ฉฐ ์ตœ์ ๊ฐ’์„ ์ฐพ๋Š”๋‹ค. (verbose=1๋กœ ๊ณผ์ • ํ™•์ธ ๊ฐ€๋Šฅ)
  • ๊ฒฐ์ •๊ณ„์ˆ˜(R2R^2): ํšŒ๊ท€ ๋ชจ๋ธ์˜ ์„ค๋ช…๋ ฅ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ๋‹ค. (1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ข‹์Œ)
  • ์ˆ˜์‹: R2=1โˆ’์„ค๋ช…ย ๋ชปย ํ•œย ์˜ค์ฐจ(SSE)์ „์ฒดย ๋ณ€๋™(SST)R^2 = 1 - \frac{\text{์„ค๋ช… ๋ชป ํ•œ ์˜ค์ฐจ(SSE)}}{\text{์ „์ฒด ๋ณ€๋™(SST)}}
profile
์†Œ๊ธˆ์— ์ ˆ์ธ ์ƒ์„ , ๋ชธ์„ ๋’ค์ฒ™์ด๋‹ค ๐ŸŸ

0๊ฐœ์˜ ๋Œ“๊ธ€