분류 모델(Classification)의 성능 평가

예린·2024년 3월 30일

머신러닝

목록 보기

3/7

분류 모델은 실제 값도 0과 1, 예측 값도 0과 1이므로, 0인지 1인지 예측하는 것

실제 값을 정확히 예측한 예측 값이 많을 수록 좋은 모델

정확히 예측한 비율로 모델 성능을 평가

1) 혼동 행렬(Confusion Matrix)

# 모듈 불러오기
from sklearn.metrics import confusion_matrix

# 성능 평가
print(confusion_matrix(y_test, y_pred))

# 혼동행렬 시각화
plt.figure(figsize=(5, 2))
sns.heatmap(confusion_matrix(y_test, y_pred),
           annot=True,
           square=True)
plt.show()

2) 정확도(Accuracy)

$\large Accuracy = \frac{TP+TN}{TP+TN+FP+FN}$

# 모듈 불러오기
from sklearn.metrics import accuracy_score

# 성능 평가
print('정확도: ', accuracy_score(y_test, y_pred))

# 분류 모델에서는 score의 값이 accuracy와 같음
model.score(x_test, y_test)

3) 정밀도(Precision)

$\large Precision = \frac{TP}{TP+FP}$

# 모듈 불러오기
from sklearn.metrics import precision_score

# 성능 평가
print("정밀도: ", precision_score(y_test, y_pred)) # 1에 대한 Precision
print("정밀도: ", precision_score(y_test, y_pred, average='binary')) # 1에 대한 Precision(우리 버전에서는 이게 default인데, 더 최신 버전에서는 None이 default라고 함)
print("정밀도: ", precision_score(y_test, y_pred, average=None)) # 0에 대한 Precision, 1에 대한 Precision를 함께 보여줌
print("정밀도: ", precision_score(y_test, y_pred, average='macro')) # 평균
print("정밀도: ", precision_score(y_test, y_pred, average='weighted')) # 가중치 평균

4) 재현율(Recall)

$\large Recall = \frac{TP}{TP+FN}$

# 모듈 불러오기
from sklearn.metrics import recall_score

# 성능 평가
print('재현율: ', recall_score(y_test, y_pred, average=None))

5) F1-Score

$\large F1 = \frac{2\times Precision\times Recall}{Precision+Recall}$

# 모듈 불러오기
from sklearn.metrics import f1_score

# 성능 평가
print('재현율: ', f1_score(y_test, y_pred, average=None))

6) Classification Report

# 모듈 불러오기
from sklearn.metrics import classification_report

# 성능 평가
print(classification_report(y_test, y_pred))

예린

이전 포스트

회귀 모델(Regression)의 성능 평가

다음 포스트