ROC 곡선
만약 완벽하게 분류했다면
적당히 잘했다면
분류 성능이 나쁘다면
방금 전 예제에서의 ROC 곡선
AUC
import pandas as pd
red_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/winequality-red.csv'
white_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/winequality-white.csv'
red_wine = pd.read_csv(red_url, sep=';')
white_wine = pd.read_csv(white_url, sep=';')
red_wine['color'] = 1.
white_wine['color'] = 0.
wine = pd.concat([red_wine, white_wine])
wine['taste'] = [1. if grade > 5 else 0. for grade in wine['quality']]
X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)
wine_tree = DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, y_train)
y_pred_tr = wine_tree.predict(X_train)
y_pred_test = wine_tree.predict(X_test)
print('Tran Acc : ', accuracy_score(y_train, y_pred_tr))
print('Test Acc : ', accuracy_score(y_test, y_pred_test))
from sklearn.metrics import accuracy_score, precision_score
from sklearn.metrics import recall_score, f1_score
from sklearn.metrics import roc_auc_score, roc_curve
print('Accuracy :', accuracy_score(y_test, y_pred_test))
print('Recall :', recall_score(y_test, y_pred_test))
print('Precision :', precision_score(y_test, y_pred_test))
print('AUC score :', roc_auc_score(y_test, y_pred_test))
print('F1 score :', f1_score(y_test, y_pred_test))
import matplotlib.pyplot as plt
pred_prob = wine_tree.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, pred_prob)
plt.figure(figsize=(10,8))
plt.plot([0,1],[0,1],'r',ls='dashed')
plt.plot(fpr,tpr)
plt.grid()
plt.show()
유익한 글 잘 봤습니다, 감사합니다.