Metric

seongyong·2021년 4월 14일

confusion matrix f-beta metric precision recall

Machine Learning

목록 보기

7/12

학습내용

Metric

precision
$precision = T_p / T_p + F_p$

negative를 positive로 잘못 판단하면 큰 손실이 있는 경우(ex 스팸메일)
-> 높은 threshold

recall
$recall = T_p / T_p + F_N$

positive를 negative로 잘못 판단하면 큰 손실이 있는 경우에 사용. ex 암
-> 낮은 threshold

F-beta score
F_beta = $(1 + \beta^2) \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{(\beta^2 \cdot \mathrm{precision}) + \mathrm{recall}}.$
ROC curve
FPR(False Positive Rate)이 변할 때 TPR(True Positive Rate)이 어떻게 변하는지를 나타내는 곡선

Confusion matrix

from sklearn.metrics import classification_report, confusion_matrix

pcm = plot_confusion_matrix(
    pipe,
    X_val,
    y_val,
    cmap = 'Blues'
)
plt.title('Confusion matrix, Threshold : 0.5')
plt.show()

#threshold에 따른 confusion matrix 및 모델 평가

threshold = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

for thr in threshold:
    print('threshold : ', thr)

    y_pred_proba = pipe.predict_proba(X_val)[:,1]
    y_pred = (y_pred_proba > thr)

    con = confusion_matrix(
        y_val,
        y_pred
    )
    print('confusion matrix : \n',con)
    print()
    print(classification_report(y_val, y_pred))
    print()

threshold : 0.1
confusion matrix :
[[3225 3182][ 137 1873]]

          precision    recall  f1-score   support

       0       0.96      0.50      0.66      6407
       1       0.37      0.93      0.53      2010

accuracy                           0.61      8417

macro avg 0.66 0.72 0.60 8417
weighted avg 0.82 0.61 0.63 8417

threshold : 0.2
confusion matrix :
[[4880 1527][ 411 1599]]

          precision    recall  f1-score   support

       0       0.92      0.76      0.83      6407
       1       0.51      0.80      0.62      2010

accuracy                           0.77      8417

macro avg 0.72 0.78 0.73 8417
weighted avg 0.82 0.77 0.78 8417

threshold : 0.3
confusion matrix :
[[5606 801][ 604 1406]]

          precision    recall  f1-score   support

       0       0.90      0.87      0.89      6407
       1       0.64      0.70      0.67      2010

accuracy                           0.83      8417

macro avg 0.77 0.79 0.78 8417
weighted avg 0.84 0.83 0.84 8417

threshold : 0.4
confusion matrix :
[[5955 452][ 725 1285]]

          precision    recall  f1-score   support

       0       0.89      0.93      0.91      6407
       1       0.74      0.64      0.69      2010

accuracy                           0.86      8417

macro avg 0.82 0.78 0.80 8417
weighted avg 0.86 0.86 0.86 8417

threshold : 0.5
confusion matrix :
[[6207 200][ 913 1097]]

          precision    recall  f1-score   support

       0       0.87      0.97      0.92      6407
       1       0.85      0.55      0.66      2010

accuracy                           0.87      8417

macro avg 0.86 0.76 0.79 8417
weighted avg 0.87 0.87 0.86 8417

threshold : 0.6
confusion matrix :
[[6312 95][1128 882]]

          precision    recall  f1-score   support

       0       0.85      0.99      0.91      6407
       1       0.90      0.44      0.59      2010

accuracy                           0.85      8417

macro avg 0.88 0.71 0.75 8417
weighted avg 0.86 0.85 0.83 8417

threshold : 0.7
confusion matrix :
[[6363 44][1309 701]]

          precision    recall  f1-score   support

       0       0.83      0.99      0.90      6407
       1       0.94      0.35      0.51      2010

accuracy                           0.84      8417

macro avg 0.89 0.67 0.71 8417
weighted avg 0.86 0.84 0.81 8417

threshold : 0.8
confusion matrix :
[[6399 8][1556 454]]

          precision    recall  f1-score   support

       0       0.80      1.00      0.89      6407
       1       0.98      0.23      0.37      2010

accuracy                           0.81      8417

macro avg 0.89 0.61 0.63 8417
weighted avg 0.85 0.81 0.77 8417

threshold : 0.9
confusion matrix :
[[6407 0][1877 133]]

          precision    recall  f1-score   support

       0       0.77      1.00      0.87      6407
       1       1.00      0.07      0.12      2010

accuracy                           0.78      8417

macro avg 0.89 0.53 0.50 8417
weighted avg 0.83 0.78 0.69 8417

seongyong

이전 포스트

Logistic Regression

다음 포스트

Metric

Machine Learning

학습내용

Metric

Confusion matrix

Logistic Regression

Model Selection

0개의 댓글