๐ŸŽฒ[AI] LinearRegression

manduยท2025๋…„ 5์›” 4์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
9/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

1. Motivation

  • ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋ชจ์‚ฌํ•˜๋Š” ๊ฒƒ
  • ์„ธ์ƒ์—๋Š” ์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์Œ
    • ์˜ˆ: ํ‚ค vs ๋ชธ๋ฌด๊ฒŒ, ๋‚˜์ด vs ์—ฐ๋ด‰, ๋ฌด๊ฒŒ vs ๊ฐ€๊ฒฉ, ์—ฐ์‹ vs ๊ฐ€๊ฒฉ

2. ๋ฌธ์ œ ์ •์˜

  • ์ž…๋ ฅ x๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ทธ์— ๋งž๋Š” ์ถœ๋ ฅ y๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜์ž
  • ์„ ํ˜• ๊ด€๊ณ„๋ฅผ ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด, ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค

3. ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ

  • ๋ชจ๋ธ ์‹: y = xW + b

    • x: ์ž…๋ ฅ ๋ฒกํ„ฐ (feature)
    • W: ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ
    • b: ํŽธํ–ฅ ๋ฒกํ„ฐ
    • y: ์˜ˆ์ธก ๊ฒฐ๊ณผ
  • ๋ชฉ์ : ์‹ค์ œ y์™€ ์˜ˆ์ธก๊ฐ’ ลท ์‚ฌ์ด์˜ ์˜ค์ฐจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” W, b๋ฅผ ์ฐพ๋Š” ๊ฒƒ


4. ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” ๋ฐฉ๋ฒ• (Gradient Descent ๊ธฐ๋ฐ˜)

  • ํŒŒ๋ผ๋ฏธํ„ฐ: ฮธ = {W, b}
  • ฮธ๋ฅผ ํ†ตํ•ด ์ •์˜๋œ ์˜ˆ์ธก ํ•จ์ˆ˜: fฮธ(xi)f_ฮธ(x_i)
  • Mean Squared Error (MSE)๋ฅผ ํ™œ์šฉํ•œ ์†์‹ค ํ•จ์ˆ˜(Loss Function): 1Nโˆ—ฮฃ(yiโˆ’y^i)2{1 \over N} * ฮฃ (y_i - ลท_i)^2
  • Loss function์„ ฮธ์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•˜์—ฌ ์–ป์€ gradient descent๋กœ ฮธ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ
  • ฮธ = {W, b}์ด๊ณ , |W| = (n, m), |b| = (m,)
  • Loss Function์„ ฮธ์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•œ๋‹ค๋Š” ๊ฒƒ์€, ๊ฐ๊ฐ์˜ Wk,iW_{k,i} element์™€ bjb_j element์— ๋Œ€ํ•ด ํŽธ๋ฏธ๋ถ„ํ•˜์—ฌ gradient descent๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ
  • ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ ฮธ๋Š” W์™€ b์˜ ๋ชจ๋“  ์›์†Œ๋ฅผ ํฌํ•จํ•œ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ์ƒ๊ฐ

5. Pytorch ์‹ค์Šต ์ฝ”๋“œ

ฮธโ†ฮธโˆ’ฮทโ–ฝฮธL(ฮธ)ฮธ โ† ฮธ - ฮทโ–ฝ_ฮธL(ฮธ) x 1,000 iterations
y=f(x)y = f(x)
xโˆˆR506X5x โˆˆ R^{506 X 5}
yโˆˆR506X1y โˆˆ R^{506 X 1}

D=(xi,yi)i=1N=506D = {(x_i, y_i)}^{N=506}_{i=1}
y^=fฮธ(x)=xโˆ—W+bลท = f_ฮธ(x) = x*W + b
ฮธ={W,b}ฮธ = \{W, b\}
WโˆˆR5X1W โˆˆ R^{5X1}
bโˆˆR1X1b โˆˆ R^{1X1}
L(ฮธ)=1Nโˆ—ฮฃ(yiโˆ’y^i)2L(ฮธ) = {1 \over N} * ฮฃ (y_i - ลท_i)^2

boston = fetch_openml(name='boston', version=1, as_frame=True)
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df["TARGET"] = boston.target

# df.tail()
# sns.pairplot(df)
# plt.show()

cols = ["TARGET", "INDUS", "RM", "LSTAT", "NOX", "DIS"]

# df[cols].describe()
# sns.pairplot(df[cols])
# plt.show()


data = torch.from_numpy(df[cols].values).float()

print(data.shape)

x = data[:, 1:]
y = data[:, :1]

print(x.shape, y.shape)

n_epochs = 2000 #ํ•™์Šต ๋ฐ˜๋ณต ํšŸ์ˆ˜
learning_rate = 1e-3
print_interval = 100

# x.shape: (batch_size, input features) 
# y.shape: (batch_size, output targets) 
model = nn.Linear(x.size(-1), y.size(-1)) # args: (input features, output features)

optimizer = optim.SGD(model.parameters(), # ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Stochastic Gradient Descent)
                      lr=learning_rate)

for i in range(n_epochs):
    y_hat = model(x)
    loss = F.mse_loss(y_hat, y)
    
    optimizer.zero_grad() # gradient ์ดˆ๊ธฐํ™” ๊ผญ ํ•ด์ค˜์•ผ ํ•จ! ์•„๋‹ˆ๋ฉด ๋”ํ•ด์ง
    loss.backward() # ๊ธฐ์šธ๊ธฐ ๊ตฌํ•˜๊ธฐ    
    optimizer.step() # optimizer์—๊ฒŒ ๊ตฌํ•ด์ง„ ๊ธฐ์šธ๊ธฐ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ ํ•˜๋„๋ก
    
    if (i + 1) % print_interval == 0:
        print('Epoch %d: loss=%.4e' % (i + 1, loss))


torch.cat([y, y_hat], dim=1)

df = pd.DataFrame(torch.cat([y, y_hat], dim=1).detach_().numpy(),
                  columns=["y", "y_hat"])

sns.pairplot(df, height=3)
plt.show()

profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€