๐ŸŽฒ[AI] Precision & Recall

manduยท2025๋…„ 5์›” 12์ผ

[AI]

๋ชฉ๋ก ๋ณด๊ธฐ
18/20

ํ•ด๋‹น ๊ธ€์€ FastCampus - '[skill-up] ์ฒ˜์Œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์œ ์น˜์› ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ ,
์ถ”๊ฐ€ ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋ง๋ถ™์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.


1. Threshold์™€ ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ๊ด€๊ณ„

  • Sigmoid ์ถœ๋ ฅ๊ฐ’์€ ํ™•๋ฅ ๋กœ ํ•ด์„๋˜๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ 0.5๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ด์ง„ ๋ถ„๋ฅ˜ ์ˆ˜ํ–‰
  • ํ•˜์ง€๋งŒ ์ƒํ™ฉ์— ๋”ฐ๋ผ threshold๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Œ

Threshold ์กฐ์ •์— ๋”ฐ๋ฅธ ์˜ํ–ฅ

  • Threshold โ†‘: ๋ชจ๋ธ์€ ๋” ๋ณด์ˆ˜์ ์œผ๋กœ True๋ฅผ ์˜ˆ์ธก (Precision โ†‘, Recall โ†“)
  • Threshold โ†“: ๋ชจ๋ธ์€ ๋” ๊ด€๋Œ€ํ•˜๊ฒŒ True๋ฅผ ์˜ˆ์ธก (Recall โ†‘, Precision โ†“)

2. Precision & Recall

์ •์˜

์‹ค์ œ๊ฐ’์˜ˆ์ธก๊ฐ’ Positive์˜ˆ์ธก๊ฐ’ Negative
PositiveTrue Positive (TP)False Negative (FN)
NegativeFalse Positive (FP)True Negative (TN)
  • Accuracy=TP+TNTP+FP+FN+TNAccuracy = {TP + TN \over TP + FP + FN + TN}
  • Precision=TP+TNTP+FPPrecision = {TP + TN \over TP + FP}
    โ†’ ์˜ˆ์ธกํ•œ Positive ์ค‘์—์„œ ์‹ค์ œ๋กœ Positive์ธ ๋น„์œจ

  • Recall=TPTP+FNRecall = {TP \over TP + FN}
    โ†’ ์‹ค์ œ Positive ์ค‘์—์„œ ๋ชจ๋ธ์ด ๋งž๊ฒŒ ์˜ˆ์ธกํ•œ ๋น„์œจ
    (Tip: ํšŒ์ˆ˜์œจ ์ด๋ผ๊ณ  ์™ธ์›Œ๋ผ! โ†’ ์‹ค์ œ Positive์—์„œ ํšŒ์ˆ˜ํ•œ ๋น„์œจ)

  • ๋‹ค๋งŒ ์ด ๊ฐ’๋“ค์€ ๋ชจ๋‘ Threshold์— ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค!

ํ™œ์šฉ ์˜ˆ์‹œ

  • ์›์ž๋ ฅ ๋ฐœ์ „์†Œ ๋ˆ„์ถœ ๊ฐ์ง€: Recall ์ค‘์š” (๋†“์น˜๋ฉด ์œ„ํ—˜)
  • ์ฃผ์‹ ๋งค์ˆ˜ ํŒ๋‹จ: Precision ์ค‘์š” (์ž˜๋ชป๋œ ์˜ˆ์ธก์œผ๋กœ ์†ํ•ด ๊ฐ€๋Šฅ)

3. F1 Score

  • Precision๊ณผ Recall์˜ ์กฐํ™” ํ‰๊ท 

  • Precision๊ณผ Recall ๊ฐ„ trade-off๋ฅผ ๊ณ ๋ คํ•œ ๋‹จ์ผ ์ง€ํ‘œ

  • F1Score=2โˆ—(Precisionโˆ—Recall)(Precision+Recall)F1 Score = {2 * (Precision * Recall) \over (Precision + Recall)}


4. AUROC (Area Under Receiver Operating Characteristic Curve)

  • ๋ถ„๋ฅ˜๊ธฐ์˜ True Positive Rate (TPR) vs False Positive Rate (FPR) ๊ณก์„  ์•„๋ž˜ ๋ฉด์ 

  • TPR(Trueย Positiveย Rate)=TPTP+FNTPR (True \ Positive \ Rate) = {TP \over TP + FN}

  • FPR(Falseย Positiveย Rate)=FPFP+TNFPR (False\ Positive \ Rate) = {FP \over FP + TN}

  • ๋ชจ๋ธ์˜ robustness๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ

  • 0~1 ์‚ฌ์ด ๋ชจ๋“  Threshold ๊ฒฝ์šฐ์— ๋Œ€ํ•ด Confusion Matrix๋ฅผ ๋งŒ๋“ค๊ณ , ๊ทธ ์ ๋“ค์„ ์‹น ๋‹ค ์ฐ์–ด๋ฒ„๋ฆฌ๋ฉด ๋จ.

  • ํด๋ž˜์Šค ๊ฐ„์˜ ๋ถ„ํฌ ๋ถ„๋ฆฌ๊ฐ€ ์ž˜ ๋ ์ˆ˜๋ก AUROC๋Š” 1์— ๊ฐ€๊นŒ์›€

  • AUROC = 0.5์ด๋ฉด ๋žœ๋ค ์ถ”์ธก๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ

  • ๋‘ ๋ถ„ํฌ๊ฐ€ ๋ฉ€๋ฆฌ ์žˆ์„์ˆ˜๋ก Curve ๋„“์ด๊ฐ€ ์ปค์ง


5. Pytorch ์‹ค์Šต ์ฝ”๋“œ

  • early stop ์ ์šฉ
  • Confusion Matrix ์‹œ๊ฐํ™”
  • Precision ,Recall, F1_score, AUROC ๊ณ„์‚ฐ
cancer = load_breast_cancer()

df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['class'] = cancer.target

display(df.tail())
print(df.describe())

data = torch.from_numpy(df.values).float()
print(data.shape)

x = data[:, :-1]
y = data[:, -1:]

print(x.shape, y.shape)

# Train / Valid / Test ratio
ratios = [.6, .2, .2]

train_cnt = int(data.size(0) * ratios[0])
valid_cnt = int(data.size(0) * ratios[1])
test_cnt = data.size(0) - train_cnt - valid_cnt
cnts = [train_cnt, valid_cnt, test_cnt]

print("Train %d / Valid %d / Test %d samples." % (train_cnt, valid_cnt, test_cnt))

indices = torch.randperm(data.size(0))

x = torch.index_select(x, dim=0, index=indices)
y = torch.index_select(y, dim=0, index=indices)

x = x.split(cnts, dim=0)
y = y.split(cnts, dim=0)

for x_i, y_i in zip(x, y):
    print(x_i.size(), y_i.size())

## Preprocessing

scaler = StandardScaler()
scaler.fit(x[0].numpy())

x = [torch.from_numpy(scaler.transform(x[0].numpy())).float(),
     torch.from_numpy(scaler.transform(x[1].numpy())).float(),
     torch.from_numpy(scaler.transform(x[2].numpy())).float()]

df = pd.DataFrame(x[0].numpy(), columns=cancer.feature_names)
df.tail()

## Build Model & Optimizer

model = nn.Sequential(
    nn.Linear(x[0].size(-1), 25),
    nn.LeakyReLU(),
    nn.Linear(25, 20),
    nn.LeakyReLU(),
    nn.Linear(20, 15),
    nn.LeakyReLU(),
    nn.Linear(15, 10),
    nn.LeakyReLU(),
    nn.Linear(10, 5),
    nn.LeakyReLU(),
    nn.Linear(5, y[0].size(-1)),
    nn.Sigmoid(),
)

print(model)

optimizer = optim.Adam(model.parameters())

## Train

n_epochs = 10000
batch_size = 32
print_interval = 100
early_stop = 1000


lowest_loss = np.inf
best_model = None

lowest_epoch = np.inf

train_history, valid_history = [], []

for i in range(n_epochs):
    indices = torch.randperm(x[0].size(0))
    x_ = torch.index_select(x[0], dim=0, index=indices)
    y_ = torch.index_select(y[0], dim=0, index=indices)
    
    x_ = x_.split(batch_size, dim=0)
    y_ = y_.split(batch_size, dim=0)
    
    train_loss, valid_loss = 0, 0
    y_hat = []
    
    for x_i, y_i in zip(x_, y_):
        y_hat_i = model(x_i)
        loss = F.binary_cross_entropy(y_hat_i, y_i)

        optimizer.zero_grad()
        loss.backward()

        optimizer.step()        
        train_loss += float(loss) # This is very important to prevent memory leak.

    train_loss = train_loss / len(x_)
        
    with torch.no_grad():
        x_ = x[1].split(batch_size, dim=0)
        y_ = y[1].split(batch_size, dim=0)
        
        valid_loss = 0
        
        for x_i, y_i in zip(x_, y_):
            y_hat_i = model(x_i)
            loss = F.binary_cross_entropy(y_hat_i, y_i)
            
            valid_loss += float(loss)
            
            y_hat += [y_hat_i]
            
    valid_loss = valid_loss / len(x_)
    
    train_history += [train_loss]
    valid_history += [valid_loss]
        
    if (i + 1) % print_interval == 0:
        print('Epoch %d: train loss=%.4e  valid_loss=%.4e  lowest_loss=%.4e' % (
            i + 1,
            train_loss,
            valid_loss,
            lowest_loss,
        ))
        
    if valid_loss <= lowest_loss:
        lowest_loss = valid_loss
        lowest_epoch = i
        
        best_model = deepcopy(model.state_dict())
    else:
        if early_stop > 0 and lowest_epoch + early_stop < i + 1:
            print("There is no improvement during last %d epochs." % early_stop)
            break

print("The best validation loss from epoch %d: %.4e" % (lowest_epoch + 1, lowest_loss))
model.load_state_dict(best_model)

## Loss History

plot_from = 2

plt.figure(figsize=(20, 10))
plt.grid(True)
plt.title("Train / Valid Loss History")
plt.plot(
    range(plot_from, len(train_history)), train_history[plot_from:],
    range(plot_from, len(valid_history)), valid_history[plot_from:],
)
plt.yscale('log')
plt.show()

## Let's see the result!

test_loss = 0
y_hat = []

with torch.no_grad():
    x_ = x[2].split(batch_size, dim=0)
    y_ = y[2].split(batch_size, dim=0)

    for x_i, y_i in zip(x_, y_):
        y_hat_i = model(x_i)
        loss = F.binary_cross_entropy(y_hat_i, y_i)

        test_loss += loss # Gradient is already detached.

        y_hat += [y_hat_i]

test_loss = test_loss / len(x_)
y_hat = torch.cat(y_hat, dim=0)

print("Test loss: %.4e" % test_loss)

y_act = y[2]
y_pred = y_hat > 0.5

# Confusion Matrix
cm = confusion_matrix(y_act, y_pred)
print("Confusion Matrix:")
print(cm)

# Accuracy
acc = accuracy_score(y_act, y_pred)
print(f"Accuracy: {acc:.4f}")

# Recall (๊ธฐ๋ณธ: binary ์ด๋ฉด 1 class ๊ธฐ์ค€)
rec = recall_score(y_act, y_pred)
print(f"Recall: {rec:.4f}")

# F1 Score
f1 = f1_score(y_act, y_pred)
print(f"F1 Score: {f1:.4f}")

df = pd.DataFrame(torch.cat([y[2], y_hat], dim=1).detach().numpy(),
                  columns=["y", "y_hat"])

sns.histplot(df, x='y_hat', hue='y', bins=50, stat='probability')
plt.show()

from sklearn.metrics import roc_auc_score

roc_auc_score(df.values[:, 0], df.values[:, 1])
profile
๋งŒ๋‘๋Š” ๋ชฉ๋ง๋ผ

0๊ฐœ์˜ ๋Œ“๊ธ€