Optimal threshold

seongyong·2021년 4월 23일

Optimal threshold SHAP imbalance class

Machine Learning

목록 보기

12/12

학습내용

Optimal threshold

$tpr = TP/TP + FN$
$fpr = FN/TN + FP$

ROC-AUC
optimal threshold는 tpr-fpr이 가장 큰 것으로 고른다.
간단하게 생각하면 tpr은 클수록 좋은 것이고, fpr은 작을수록 좋은 것이기 때문이다.
f1-score
f1-score를 가장 크게 만들어주는 threshold를 설정하는 것도 하나의 방법이다.

위 두 경우를 모두 고려하여 threshold를 정해주는 것이 좋겠다.

Imbalance class

classification을 진행할 때 어떤 class를 positive로 잡느냐가 꽤나 중요하다(코드 작성할 때 헷갈린다.)
만약 imbalance class라면 classification 모델에 scale_pos_weight를 사용해도 좋을듯하다.

positive를 잘 정해야하는 code

#1. 클래스 확률예측
model.predict_proba()[:,1]

#2. 분류모델의 shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap_values[1]

# index0, 0을 positive로
# index1, 1을 positive로

cross validation

분류모델의 경우에서 cross-validation이나 gridsearch등을 사용할 경우 자동으로 stratified cross validation을 해줌.
회귀모델의 경우는 그냥 KFold

seongyong

이전 포스트

Optimal threshold

Machine Learning

학습내용

Optimal threshold

Imbalance class

positive를 잘 정해야하는 code

cross validation

PDP, Shap

0개의 댓글