평가지표_재현율/정밀도

김혜인·2023년 5월 17일

목록 보기

1/11

재현율과 정밀도의 관계

재현율이 더 중요한 경우
- positive를 negative로 잘못 판단
- FN ▼
- 암 판정 / 환자를 정상이라고 함 >> FN 낮추는데 집중 >> 재현율 중요
정밀도가 더 중요한 경우
- negative를 positive로 잘못 판단
- FP ▼
- 스팸메일 판정 / 업무메일을 스팸이라고 함 >> FP 낮추는데 집중 >> 정밀도 중요

predict_proba : 분류에서만 사용가능 / class별 확률 반환
np.where(조건, 1, 0) : 조건에 임계값 조정해서 사용

tree.predict_proba(X_train[:3])

>>결과값
array([[0.99173554, 0.00826446],
       [0.96610169, 0.03389831],
       [0.98695652, 0.01304348]])

#positive(1)의 확률
rfc.predict_proba(X_train[:3])[:,1]

>>결과값
array([0.06722102, 0.42116418, 0.05364828])

#임계값 조정
np.where(pp> 0.5,1,0)

positive(1) 확률에 대한 임계값 변경 >> 재현율/정밀도 변환

임계값 : 확률 기준값
임계값 이상은 positive, 미만은 negative
기본임계값(0.5)

임계값↓ : positive 예측↑ >> 재현율↑/정밀도↓
임계값↑ : positive 예측↓ >> 재현율↓/정밀도↑
재현율과 정밀도는 반비례
cf) 재현율과 위양성율은 비례

임계값 변화에 따른 recall/precision 확인

precisionrecall_curve(y정답, positive_예측확률)
- 반환값: Tuple - (precision리스트, recall리스트, threshold리스트)

# 모델이 추론한 positive의 확률 조회
#DecisionTree
pos_test_tree = tree.predict_proba(X_test)[:, 1]

#from sklearn.metrics import precision_recall_curve
result = precision_recall_curve(y_test, pos_test_tree) #(정답, pos 확률)
= precision_list, recall_list, thresh_list

# 임계점을 변경
pred_test_06 = np.where(pos_test_tree >0.6, 1,0)
np.unique(pred_test_06, return_counts=True)

PR Curve(Precision Recall Curve-정밀도 재현율 곡선)와 AP Score(Average Precision Score)

이진분류 평가지표
X축 : 재현율 / Y축 : 정밀도
임계값이 1 -> 0 변화할 때 변화를 그래프로 : 네모나게 그림
AP Score : 성능평가 지표를 평가한거
- 선 아래 면적이 높으면 우수

#from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay, average_precision_score

#test set 검증
# 각 모델이 추정한 positive 확률 조회
pos_test_tree = tree.predict_proba(X_test)[:,1]
pos_test_rfc = rfc.predict_proba(X_test)[:,1]

#ap_score   #average_precision_score(y정답, positive확률)
ap_score = average_precision_score(y_test, pos_test_tree)  
ap_score_rfc = average_precision_score(y_test, pos_test_rfc)

#한 그래프에 그리기
ax = plt.gca()
disp_tree = PrecisionRecallDisplay(precision_list1,
                                  recall_list1,
                                  average_precision=ap_score, #ap_score을 보여준다.
                                  estimator_name="DecisionTree")
disp_rfc = PrecisionRecallDisplay(precision_list2,
                                  recall_list2,
                                 average_precision=ap_score_rfc, #ap_score을 보여준다.
                                  estimator_name="RandomForest")
disp_tree.plot(ax=ax)
disp_rfc.plot(ax=ax)
plt.title("Precision Recall Curve")
plt.show()

ROC curve(Receiver Operating Characteristic Curve)와 AUC(Area Under the Curve) score

FPR(False Positive Rate-위양성율)

실제 음성중 양성으로 잘못 예측 한 비율
FP / TN + FP
낮음 : N성능↑ / 높음: N성능↓ $\cfrac{FP}{TN+FP}$

TPR(True Positive Rate-재현율/민감도)

재현율(recall)
실제 양성중 양성으로 맞게 예측한 비율
TP / FN + TP
낮음 : P성능↓ / 높음: P성능↑ $\frac{TP}{FN+TP}$

cf) Positive 임계값 변경 > FPR & TPR(recall)비례해서 변화

ROC Curve

이진분류 평가지표
X축 : FPR / Y축 : TPR
임계값이 1 -> 0 변화할 때 변화를 그래프로
Positive/Negative 모델 성능 평가

AUC Score

ROC Curve 결과 점수화
- ROC Curve 아래쪽 면적 계산
결과값 :0 ~ 1
크려면 : 임계값↑ -FPR↓/TPR↑ >> FPR↓: Negative 분류b/TPR↑: Positive 분류b
FPR↓/TPR↑: 좋은거

ROC, AUC 점수 확인

roccurve(y값, Pos예측확률) : FPR, TPR, Thresholds(임계치)
rocauc_score(y값, Pos예측확률) : AUC 점수 반환

ROC Curve / Precision_Recall Curve

ROC Curve/ROC-AUC score
- 이진분류: 판별 중요도: 양성=음성(개고양이 분류)
Precision Recall Curve/AP Score
- 판별 중요도: 양성>음성(암환자 진단)

#from sklearn.metrics import roc_curve, RocCurveDisplay, roc_auc_score

fpr_list1, tpr_list1, thresh_list1 = roc_curve(y_test, pos_test_tree) #(y정답, pos확률)

auc = roc_auc_score(y_test, pos_test_tree)

disp_tree = RocCurveDisplay(fpr=fpr_list1, tpr=tpr_list1,
                           roc_auc = auc,
                           estimator_name = 'DecisionTree')

cf) 안쓸때 _ 사용(임계값 사용 X)
fpr1, tpr1, _ = roc_curve(y_test, pos_test_tree)

김혜인

평가지표_재현율/정밀도

머신러닝

재현율과 정밀도의 관계

positive(1) 확률에 대한 임계값 변경 >> 재현율/정밀도 변환

임계값 변화에 따른 recall/precision 확인

PR Curve(Precision Recall Curve-정밀도 재현율 곡선)와 AP Score(Average Precision Score)

ROC curve(Receiver Operating Characteristic Curve)와 AUC(Area Under the Curve) score

FPR(False Positive Rate-위양성율)

TPR(True Positive Rate-재현율/민감도)

ROC Curve

AUC Score

ROC, AUC 점수 확인

ROC Curve / Precision_Recall Curve

0개의 댓글