๐ŸŽฒ[AI] Logistic Regression

manduยท2025๋…„ 5์›” 4์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
10/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.


1. Logistic Regression

  • ์ด์ง„ ๋ถ„๋ฅ˜(Binary Classification) ๋ฌธ์ œ์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค
    ์–ด๋–ป๊ฒŒ ? โ†’ Linear Regression + Sigmoid Function
  • Logistic Regression: ์„ ํ˜• ํšŒ๊ท€ ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์— ๋„ฃ์–ด ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ ,
    ๊ทธ ํ™•๋ฅ ๋กœ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋ธ
  • ์‹œ๊ทธ๋ชจ์ด๋“œ ์ถœ๋ ฅ๊ฐ’์„ ํ™•๋ฅ ๊ฐ’ P(y|x)์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Œ
  • Logistic == ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜(Logistic function) == Sigmoid function
  • ์ด๋ฆ„์€ Regression์ด์ง€๋งŒ ์‚ฌ์‹ค์€ ์ด์ง„ ๋ถ„๋ฅ˜(Binary clssification) ๋ฌธ์ œ
y_hat = ฯƒ(xW + b) 
True if y_hat >= 0.5 else False

Regression vs Classification

ํ•ญ๋ชฉํšŒ๊ท€ (Regression)๋ถ„๋ฅ˜ (Classification)
์ถœ๋ ฅ์‹ค์ˆ˜ ๊ฐ’ ๋ฒกํ„ฐ๋ฒ”์ฃผํ˜• ๊ฐ’
์†์‹ค ํ•จ์ˆ˜MSE LossBCE / Cross Entropy
๋งˆ์ง€๋ง‰ ๊ณ„์ธตLinearSigmoid / Softmax
์˜ˆ์—ฐ๋ด‰ ์˜ˆ์ธก๊ฐ์—ผ ์—ฌ๋ถ€ ์˜ˆ์ธก

2. Sigmoid & Hypoerbolic Tangent Function

  • ฯƒ(x) = 11+eโˆ’x1 \over 1 + e^{-x} โ†’ ์ถœ๋ ฅ๊ฐ’์„ 0๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜
    Tanh(x) = exโˆ’eโˆ’xex+eโˆ’xe^{x} - e^{-x} \over e^{x} + e^{-x} โ†’ ์ถœ๋ ฅ๊ฐ’์„ -1๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜
  • ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์—์„œ๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ Sigmoid ์‚ฌ์šฉ
  • ์ด ๊ฐ’์„ ํ™•๋ฅ  P(y|x)๋กœ ํ•ด์„ ๊ฐ€๋Šฅ

3. Loss function: Binary Cross Entropy (BCE)

  • ์‹ค์ œ ์ •๋‹ต์ด 1์ด๋ผ๋ฉด ํ™•๋ฅ  ลท๋Š” 1์— ๊ฐ€๊น๊ฒŒ
    ์‹ค์ œ ์ •๋‹ต์ด 0์ด๋ผ๋ฉด ํ™•๋ฅ  ลท๋Š” 0์— ๊ฐ€๊น๊ฒŒ ํ•™์Šต๋˜๋„๋ก ์œ ๋„ํ•˜๋Š” ์†์‹ค ํ•จ์ˆ˜

  • BCE(y1:N,y^1:N)=โˆ’1Nโˆ‘(yiTlogโก(y^i)+(1โˆ’yi)Tlogโก(1โˆ’y^i))BCE(y_{1:N},ลท_{1:N}) = - \frac{1}{N} \sum (y_i^T \log(ลท_i) + (1 - y_i)^T \log(1 - ลท_i))
    xiฯตRnx_i ฯต R^n, yiฯต{0,1}my_i ฯต \{0,1\}^m

  • ์ˆ˜์‹์˜ ์•ž ๋ถ€๋ถ„์€ True๋งŒ์„ ์‚ด๋ฆฌ๋Š” ๋ถ€๋ถ„, ๋’ท ๋ถ€๋ถ„์€ False๋งŒ์„ ์‚ด๋ฆฌ๋Š” ๋ถ€๋ถ„
    โ†’ - ๊ฐ€ ์•ž์— ์žˆ์œผ๋ฏ€๋กœ minimize ํ•จ์ˆ˜๊ฐ€ ๋˜๋Š” ๊ฒƒ

  • ์‹œ๊ทธ๋ชจ์ด๋“œ์˜ ์ถœ๋ ฅ์€ 0๊ณผ 1 ์‚ฌ์ด์˜ ํ™•๋ฅ ์ด๋ฏ€๋กœ BCE ์†์‹คํ•จ์ˆ˜ ์‚ฌ์šฉ (MSE๋ฅผ ์จ๋„ ํ’€๋ฆฌ๊ธด ํ•˜์ง€๋งŒ ์ตœ์ ํ™” X)

  • BCELoss๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ํ™•๋ฅ /ํ†ต๊ณ„, ์ •๋ณด ์ด๋ก ๊ณผ ๋ฐ€์ ‘ํ•œ ๊ด€๋ จ์ด ์žˆ์Œ


4. ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”

  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์†์‹ค ํ•จ์ˆ˜๋ฅผ W, b์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ
  • ์†์‹คํ•จ์ˆ˜ detail

5. Pytorch ์‹ค์Šต ์ฝ”๋“œ

ฮธโ†ฮธโˆ’ฮทโ–ฝฮธL(ฮธ)ฮธ โ† ฮธ - ฮทโ–ฝ_ฮธL(ฮธ) x 200,000 iterations
y=f(x)y = f(x)
xโˆˆR569X10x โˆˆ R^{569 X 10}
yโˆˆR569X1y โˆˆ R^{569 X 1}

D=(xi,yi)i=1N=569D = {(x_i, y_i)}^{N=569}_{i=1}
y^=fฮธ(x)=ฯƒ(xโˆ—W+b)ลท = f_ฮธ(x) = ฯƒ(x*W + b)
ฮธ={W,b}ฮธ = \{W, b\}
WโˆˆR10X1W โˆˆ R^{10X1}
bโˆˆR1X1b โˆˆ R^{1X1}
L(ฮธ)=โˆ’1Nโˆ‘(yiTlogโก(y^i)+(1โˆ’yi)Tlogโก(1โˆ’y^i))L(ฮธ) = - \frac{1}{N} \sum (y_i^T \log(ลท_i) + (1 - y_i)^T \log(1 - ลท_i))

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

# print(cancer.DESCR)

df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['class'] = cancer.target

# df.tail()

### Pair plot with mean features
# sns.pairplot(df[['class'] + list(df.columns[:10])])
# plt.show()

### Pair plot with std features
# sns.pairplot(df[['class'] + list(df.columns[10:20])])
# plt.show()

### Pair plot with worst features
# sns.pairplot(df[['class'] + list(df.columns[20:30])])
# plt.show()

### Select features

cols = ["mean radius", "mean texture",
        "mean smoothness", "mean compactness", "mean concave points",
        "worst radius", "worst texture",
        "worst smoothness", "worst compactness", "worst concave points",
        "class"]

for c in cols[:-1]:
    sns.histplot(df, x=c, hue=cols[-1], bins=50, stat='probability')
    plt.show()

## Train Model with PyTorch

data = torch.from_numpy(df[cols].values).float()

print(data.shape)

x = data[:, :-1]
y = data[:, -1:]

print(x.shape, y.shape)

n_epochs = 200000
learning_rate = 1e-2
print_interval = 10000

class MyModel(nn.Module): #nn.Module์„ ์ƒ์†๋ฐ›์•„์„œ ๋‚˜๋งŒ์˜ custom ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ
    
    def __init__(self, input_dim, output_dim):
        self.input_dim = input_dim
        self.output_dim = output_dim
        
        super().__init__()
        
        self.linear = nn.Linear(input_dim, output_dim)
        self.act = nn.Sigmoid()
        
    def forward(self, x):
        # |x| = (batch_size, input_dim)
        # |y| = (batch_size, output_dim)
        y = self.act(self.linear(x))
        
        return y

model = MyModel(input_dim=x.size(-1),
                output_dim=y.size(-1))
crit = nn.BCELoss() # Define BCELoss instead of MSELoss.

optimizer = optim.SGD(model.parameters(), # ๋‚˜์ค‘์— ๊ธฐ์šธ๊ธฐ ๊ตฌํ•  parameter๋“ค ๋“ฑ๋ก
                      lr=learning_rate)

for i in range(n_epochs):
    y_hat = model(x)
    loss = crit(y_hat, y)
    
    optimizer.zero_grad()
    loss.backward() # model.parameters()์— ํฌํ•จ๋œ ๋ชจ๋“  ํ…์„œ(W, b, ...)์— ๋Œ€ํ•ด gradient (โˆ‚loss/โˆ‚parameter)๋ฅผ ์ž๋™ ๊ณ„์‚ฐ
    
    optimizer.step()
    
    if (i + 1) % print_interval == 0:
        print('Epoch %d: loss=%.4e' % (i + 1, loss))

correct_cnt = (y == (y_hat > .5)).sum()
total_cnt = float(y.size(0))

print('Accuracy: %.4f' % (correct_cnt / total_cnt))

df = pd.DataFrame(torch.cat([y, y_hat], dim=1).detach().numpy(),
                  columns=["y", "y_hat"])

sns.histplot(df, x='y_hat', hue='y', bins=50, stat='probability')
plt.show()

profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€