[딥러닝] Evaluation Metric

zzwon1212·2024년 1월 4일

목록 보기

14/33

1. Confusion Matrix

	예측 양성	예측 음성
실제 양성	TP	FN
실제 음성	FP	TN

예측이 True인지 False인지

2. Accuracy (정확도)

\mathrm{Accuracy} = {TP + TN \over TP + FP + TN + FN}

3. Precision (정밀도)

\mathrm{Precision} = {TP \over TP + FP}

모델이 Positive로 예측한 샘플 중에서 실제로 Positive인 비율
False Positive를 줄이는 데에 중점
높은 정밀도가 중요한 경우: 실제 음성인 데이터를 양성으로 잘못 분류하는 것을 방지해야 하는 경우 (e.g. 스팸 메일 필터링)

4. Recall (재현율)

\mathrm{Recall} = {TP \over TP + FN}

실제로 Positive인 샘플 중에서 모델이 Positive로 예측한 비율
False Negative를 피하는 데에 중점
높은 재현율이 중요한 경우: 실제 양성인 데이터를 음성으로 잘못 분류하는 것을 방지해야 하는 경우 (e.g. 암 진단)

5. F1 Score

\mathrm{F1 \, Score = 2 \times {{Precision \times Recall} \over {Precision + Recall}}}

Precision과 Recall의 조화평균

6. PR Curve

neptune.ai - F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?

The higher on y-axis your curve is the better your model performance.

Knowing at which recall your precision starts to fall fast can help you choose the threshold and deliver a better model.

ROC AUC looks at TPR and FPR while PR AUC looks at PPV and TPR. Because of that if you care more about the positive class(especially when the number of positive class is small), then using PR AUC, which is more sensitive to the improvements for the positive class, is a better choice.