(6-4) 머신러닝 기초 - 선형분류 (2) + 실습

Yongjoo Lee·2021년 1월 14일

linear classification linear model machine learning

Programmers 인공지능 데브코스

목록 보기

27/33

선형분류

확률적 식별 모델 (probabilistic discriminative models)

에러함수의 $\bold w$ 에 대한 gradient를 구해보자.

E_n(\bold w)=-\{t_n\ln y_n+(1-t_n)\ln(1-y_n)\}

라고 정의하면

\nabla E({\bf w}) = \sum_{n=1}^N \nabla E_n({\bf w})

\begin{aligned}\nabla E_n({\bf w}) &= \frac{\partial E_n({\bf w})}{\partial y_n}\frac{\partial y_n}{\partial a_n}\triangledown a_n\\&= \left\{ \frac{1-t_n}{1-y_n} - \frac{t_n}{y_n}\right\} y_n(1-y_n)\phi_n\\&= (y_n - t_n)\phi_n\end{aligned}

\nabla E({\bf w}) = \sum_{n=1}^N (y_n - t_n)\phi_n=\varPhi^T(y-t)

다중클래스 로지스틱 회귀 (Multiclass logistic regression)

p(\mathcal{C}_k|\phi) = y_k(\phi) = \frac{\exp(a_k)}{\sum_j \exp(a_j)}

a_k = {\bf w}_k^T \phi

우도함수

특성벡터 $\phi_n$ 를 위한 목표벡터 $\bold t_n$ 는 클래스에 해당하는 하나의 원소만 1이고 나머지는 0인 1-of-K 인코딩 방법으로 표현된다.

p({\bf T}|{\bf w}_1,...{\bf w}_K) = \prod_{n=1}^{N}\prod_{k=1}^{K} p(\mathcal{C}_k|\phi_n)^{t_{nk}} = \prod_{n=1}^{N}\prod_{k=1}^{K}y_{nk}^{t_{nk}}

$y_{nk}=y_k(\phi_n), \bold T$ 는 $t_{nk}$ 를 원소로 가지고 있는 크기가 $N\times K$ 인 행렬

(예시)

T=\begin{bmatrix}1&0&0\\0&0&1\end{bmatrix}\\\;\\p(\bold T|\bold{w_1,w_2,w_3})=(y_{11}^1\;y_{12}^0\;y_{13}^0)\times(y_{21}^0\;y_{22}^0\;y_{23}^1)\\\;\\p(C_1|\phi_1)=y_{11}\hspace{2em}p(C_3|\phi_2)=y_{23}

따라서

y_{nk}=p(C_k|\phi_n)

음의 로그우도

E({\bf w}_1, ..., {\bf w}_K) = -\ln p({\bf T}|{\bf w}_1, ...,{\bf w}_K) = - \sum_{n=1}^{N} \sum_{k=1}^{K} t_{nk}\ln(y_{nk})

${\bf w}_j$ 에 대한 gradient를 구한다. 먼저 하나의 샘플 $\phi_n$ 에 대한 에러

E_n({\bf w}_1,\ldots,{\bf w}_K) = -\sum_{k=1}^{K} t_{nk}\ln(y_{nk})

를 정의하면

\nabla_{ {\bf w}_j }E({\bf w}_1, ...,{\bf w}_K) = \sum_{n=1}^{N}\nabla_{ {\bf w}_j }E_n({\bf w}_1, ...,{\bf w}_K)

다음 함수들 사이의 관계를 주목하자.

$E_n$ 와 ${\bf w}_j$ 의 관계는 오직 $a_{nj}$ 에만 의존한다( $a_{nk}, k\neq j$ 는 ${\bf w}_j$ 의 함수가 아니다).
$E_n$ 은 $y_{n1},\ldots,y_{nK}$ 의 함수이다.
$y_{nk}$ 는 $a_{n1},\ldots,a_{nK}$ 의 함수이다.

\begin{aligned}\nabla_{ {\bf w}_j }E_n &= \frac{\partial E_n}{\partial a_{nj}} \frac{\partial a_{nj}}{\partial {\bf w}_j}\\&= \frac{\partial E_n}{\partial a_{nj}}\phi_n\\&= \sum_{k=1}^K \left( \frac{\partial E_n}{\partial y_{nk}} \frac{\partial y_{nk}}{\partial a_{nj}} \right)\phi_n\\&= \phi_n \sum_{k=1}^K \left\{ -\frac{t_{nk}}{y_{nk}}y_{nk}(I_{kj}-y_{nj}) \right\}\\&= \phi_n \sum_{k=1}^K t_{nk}(y_{nj} - I_{kj})\\&= \phi_n \left( y_{nj}\sum_{k=1}^K t_{nk} - \sum_{k=1}^K t_{nk}I_{kj} \right)\\&= \phi_n (y_{nj} - t_{nj})\end{aligned}

따라서

\nabla_{ {\bf w}_j }E({\bf w}_1, ...,{\bf w}_K) = \sum_{n=1}^{N} (y_{nj}-t_{nj})\phi_n

$\phi_n$ : $n$ 개의 입력데이터
$(y_{nj}-t{nj})$ : 예측값과 타겟값의 차이

(실습)

Gradient Descent (batch)

*batch : 전체 데이터를 모두 사용

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
import seaborn as sns

X, t = make_classification(n_samples=500, n_features=2, n_redundant=0, n_informative=1,
                             n_clusters_per_class=1, random_state=14)

t = t[:,np.newaxis]

sns.set_style('white')
sns.scatterplot(X[:,0],X[:,1],hue=t.reshape(-1));

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2F92531599-e8db-4d2a-bdd0-b1da75b1ff4e%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2F92531599-e8db-4d2a-bdd0-b1da75b1ff4e%2Fimage.png)

이러한 두 개의 그룹으로 나누어져 있는 데이터가 주어졌을 때

로지스틱 회귀 모델 학습을 통해 얼마나 잘 구별해내는 지 살펴볼 것임

📌 sigmoid 함수 정의

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

📌 compute cost 함수 정의

cost를 계산하는 함수(에러함수) 정의 (w가 주어졌을 때 입력값 X와 목표값 t 에 대해)

E_n(\bold w)=-\{t_n\ln y_n+(1-t_n)\ln(1-y_n)\}

코드에서는 1/N 을 곱함

def compute_cost(X, t, w):
    N = len(t)
    h = sigmoid(X @ w)
    epsilon = 1e-5
    cost = (1/N)*(((-t).T @ np.log(h + epsilon))-((1-t).T @ np.log(1-h + epsilon)))
    return cost

📌 gradient descent 함수 정의

learning_rate : 얼만큼의 비율로 업데이트 할 것인지
iterations : 몇번동안 업데이트 할 것인지

\nabla E({\bf w}) = \sum_{n=1}^N (y_n - t_n)\phi_n=\varPhi^T(y-t)

def gradient_descent(X, t, w, learning_rate, iterations):
    N = len(t)
    cost_history = np.zeros((iterations,1))

    for i in range(iterations):
        w = w - (learning_rate/N) * (X.T @ (sigmoid(X @ w) - t))
        cost_history[i] = compute_cost(X, t, w)

    return (cost_history, w)

📌 predict 함수 정의

입력 데이터에 대해서 파라미터 w 에대한 y 값을 구하는 함수

def predict(X, w):
    return np.round(sigmoid(X @ w))

N = len(t)

X = np.hstack((np.ones((N,1)),X))
M = np.size(X,1)
w = np.zeros((M,1))

iterations = 1000
learning_rate = 0.01

initial_cost = compute_cost(X, t, w)

print("Initial Cost is: {} \n".format(initial_cost))
# Initial Cost is: [[0.69312718]]

(cost_history, w_optimal) = gradient_descent(X, t, w, learning_rate, iterations)

print("Optimal Parameters are: \n", w_optimal, "\n")
# Optimal Parameters are: 
#  [[-0.07024012]
#  [ 1.9275589 ]
#  [ 0.02285894]]

plt.figure()
sns.set_style('white')
plt.plot(range(len(cost_history)), cost_history, 'r')
plt.title("Convergence Graph of Cost Function")
plt.xlabel("Number of Iterations")
plt.ylabel("Cost")
plt.show()

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2F18f377ef-fafc-4028-b07c-8ff8688bd853%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2F18f377ef-fafc-4028-b07c-8ff8688bd853%2Fimage.png)

👉 업데이트할 수록 비용이 줄어듦

Accuracy 계산

예측값이 타겟값과 얼마나 비슷한 지 계산

## Accuracy

y_pred = predict(X, w_optimal)
score = float(sum(y_pred == t))/ float(len(t))

print(score)
# 0.954

👉 95% 정도 일치

slope = -(w_optimal[1] / w_optimal[2])
intercept = -(w[0] / w_optimal[2])

sns.set_style('white')
sns.scatterplot(X[:,1],X[:,2],hue=t.reshape(-1))

ax = plt.gca()
ax.autoscale(False)
x_vals = np.array(ax.get_xlim())
y_vals = intercept + (slope * x_vals)
plt.plot(x_vals, y_vals, c="k");

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2Fc574ead2-e2ed-4547-947e-244656a53629%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2Fc574ead2-e2ed-4547-947e-244656a53629%2Fimage.png)

Stochastic Gradient Descent

📌 sgd(Stochastic Gradient Descent) 함수 정의

def sgd(X, t, w, learning_rate, iterations):
    N = len(t)
    cost_history = np.zeros((iterations,1))

    for i in range(iterations):
        i = i % N
        w = w - learning_rate * (X[i, np.newaxis].T * (sigmoid(X[i] @ w) - t[i]))
        cost_history[i] = compute_cost(X[i], t[i], w)

    return (cost_history, w)

X, t = make_classification(n_samples=500, n_features=2, n_redundant=0, n_informative=1,
                             n_clusters_per_class=1, random_state=14)

t = t[:,np.newaxis]

N = len(t)

X = np.hstack((np.ones((N,1)),X))
M = np.size(X,1)
w = np.zeros((M,1))

iterations = 2000
learning_rate = 0.01

initial_cost = compute_cost(X, t, w)

print("Initial Cost is: {} \n".format(initial_cost))
# Initial Cost is: [[0.69312718]]

(cost_history, w_optimal) = sgd(X, t, w, learning_rate, iterations)

print("Optimal Parameters are: \n", w_optimal, "\n")
# Optimal Parameters are: 
#  [[-0.19304782]
#  [ 2.5431236 ]
#  [ 0.01130098]]

plt.figure()
sns.set_style('white')
plt.plot(range(len(cost_history)), cost_history, 'r')
plt.title("Convergence Graph of Cost Function")
plt.xlabel("Number of Iterations")
plt.ylabel("Cost")
plt.show()

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2Fb4c8dd09-ab06-4137-8624-f5c296001110%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2Fb4c8dd09-ab06-4137-8624-f5c296001110%2Fimage.png)

👉 sgd의 경우 초기에 들쑥날쑥한 경향을 보임

데이터 하나하나마다 gradient를 업데이트 하기때문에 다른 데이터에 대해서 안좋아질 수 있다
하지만 마지막에는 굉장히 작은 cost를 가지게 된다.

Accuracy 계산

## Accuracy

y_pred = predict(X, w_optimal)
score = float(sum(y_pred == t))/ float(len(t))

print(score)
# 0.96

Mini-batch Gradient Descent

📌 batch gradient 함수 정의

batch 사이즈를 조절

def batch_gd(X, t, w, learning_rate, iterations, batch_size):
    N = len(t)
    cost_history = np.zeros((iterations,1))
    shuffled_indices = np.random.permutation(N)
    X_shuffled = X[shuffled_indices]
    t_shuffled = t[shuffled_indices]

    for i in range(iterations):
        i = i % N
        X_batch = X_shuffled[i:i+batch_size]
        t_batch = t_shuffled[i:i+batch_size]
        # batch가 epoch 경계를 넘어가는 경우, 앞 부분으로 채워줌
        if X_batch.shape[0] < batch_size:
            X_batch = np.vstack((X_batch, X_shuffled[:(batch_size - X_batch.shape[0])]))
            t_batch = np.vstack((t_batch, t_shuffled[:(batch_size - t_batch.shape[0])]))
        w = w - (learning_rate/batch_size) * (X_batch.T @ (sigmoid(X_batch @ w) - t_batch))
        cost_history[i] = compute_cost(X_batch, t_batch, w)

    return (cost_history, w)

X, t = make_classification(n_samples=500, n_features=2, n_redundant=0, n_informative=1,
                             n_clusters_per_class=1, random_state=14)

t = t[:,np.newaxis]

N = len(t)

X = np.hstack((np.ones((N,1)),X))
M = np.size(X,1)
w = np.zeros((M,1))

iterations = 1000
learning_rate = 0.01

initial_cost = compute_cost(X, t, w)
# Initial Cost is: [[0.69312718]]

print("Initial Cost is: {} \n".format(initial_cost))
# Optimal Parameters are: 
#  [[-0.06983134]
#  [ 1.92943764]
#  [ 0.01000487]]

(cost_history, w_optimal) = batch_gd(X, t, w, learning_rate, iterations, 32)

print("Optimal Parameters are: \n", w_optimal, "\n")

plt.figure()
sns.set_style('white')
plt.plot(range(len(cost_history)), cost_history, 'r')
plt.title("Convergence Graph of Cost Function")
plt.xlabel("Number of Iterations")
plt.ylabel("Cost")
plt.show()

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2F85b04817-a077-494f-b862-26bc0392e7d7%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2F85b04817-a077-494f-b862-26bc0392e7d7%2Fimage.png)

Accuracy 계산

## Accuracy

y_pred = predict(X, w_optimal)
score = float(sum(y_pred == t))/ float(len(t))

print(score)
# 0.954

👉 처음에 실행했던 Gradient Descent (batch) 의 score와 같은 값이 나옴

🔥

딥러닝의 경우 가장 많이 사용하는 방법은 Mini-batch Gradient Descent
- 하나의 샘플만을 보지 않고, 전체 데이터를 다 보지도 않음 (대부분의 데이터는 사이즈가 큼)
- 적절한 데이터 샘플들을 하나의 batch로 묶어서 학습
  - 모델이 복잡한 경우 batch 사이즈가 커지게 되면 연산이 오래걸려 비효율적!
  - 이럴 경우 어쩔 수 없이 batch 사이즈를 작게함

(실습) MNIST 데이터 분류

준비

# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "classification"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

MNIST 데이터

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

mnist.keys()
# dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])

X, y = mnist["data"], mnist["target"]
X.shape
# (70000, 784)

X.values
# array([[0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        ...,
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.]])

X의 3번째 데이터 살펴보기

import matplotlib as mpl
import matplotlib.pyplot as plt

some_digit = X.loc[2].values
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap=mpl.cm.binary)
plt.axis("off")

save_fig("some_digit_plot")
plt.show()

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2Fba0bfdb7-e4dc-4f98-916c-d33f236d7b06%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2Fba0bfdb7-e4dc-4f98-916c-d33f236d7b06%2Fimage.png)

some_digit

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2Fe35efbf0-c0c8-4e79-bbcb-d9fa65ff8592%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2Fe35efbf0-c0c8-4e79-bbcb-d9fa65ff8592%2Fimage.png)

👉 0은 흰색 부분, 0이 아닌값은 검은색 부분

y 변환(object → uint8)

y
# 0        5
# 1        0
# 2        4
# 3        1
# 4        9
#         ..
# 69995    2
# 69996    3
# 69997    4
# 69998    5
# 69999    6
# Name: class, Length: 70000, dtype: category
# Categories (10, object): ['0', '1', '2', '3', ..., '6', '7', '8', '9']

y = y.astype(np.uint8)
y
# 0        5
# 1        0
# 2        4
# 3        1
# 4        9
#         ..
# 69995    2
# 69996    3
# 69997    4
# 69998    5
# 69999    6
# Name: class, Length: 70000, dtype: uint8

여러 데이터 한번에 출력해보기

def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = mpl.cm.binary,
               interpolation="nearest")
    plt.axis("off")

def plot_digits(instances, images_per_row=10, **options):
    size = 28
    images_per_row = min(len(instances), images_per_row)
    images = [instance.reshape(size,size) for instance in instances]
    n_rows = (len(instances) - 1) // images_per_row + 1
    row_images = []
    n_empty = n_rows * images_per_row - len(instances)
    images.append(np.zeros((size, size * n_empty)))
    for row in range(n_rows):
        rimages = images[row * images_per_row : (row + 1) * images_per_row]
        row_images.append(np.concatenate(rimages, axis=1))
    image = np.concatenate(row_images, axis=0)
    plt.imshow(image, cmap = mpl.cm.binary, **options)
    plt.axis("off")

plt.figure(figsize=(9,9))
example_images = X[:100].values
plot_digits(example_images, images_per_row=10)
save_fig("more_digits_plot")
plt.show()

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2Fc7fa93d1-7a8e-41a4-b0a1-ec65efefc768%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2Fc7fa93d1-7a8e-41a4-b0a1-ec65efefc768%2Fimage.png)

y[0]
# 5

👉 첫번째 데이터는 5이다.

학습 데이터, 테스트 데이터 나누기

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

👉 6만개는 학습데이터, 1만개는 테스트데이터로 나눔

이진분류기 (Binary classifier)

문제를 단순화해서 숫자 5만 식별해보자.

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

y_train_5
# 0         True
# 1        False
# 2        False
# 3        False
# 4        False
#          ...  
# 59995    False
# 59996    False
# 59997     True
# 59998    False
# 59999    False
# Name: class, Length: 60000, dtype: bool

😀 로지스틱 회귀 모델 사용

from sklearn.linear_model import LogisticRegression
log_clf = LogisticRegression(random_state=0).fit(X_train, y_train_5)

log_clf.predict([X.loc[0],X.loc[1],X.loc[2]])
# array([ True, False, False])

교차 검증을 사용해서 평가

from sklearn.model_selection import cross_val_score
cross_val_score(log_clf, X_train, y_train_5, cv=3, scoring="accuracy")
# array([0.97525, 0.9732 , 0.9732 ])

👉 모든 교차 검증 폴드에 대해 정확도가 97% 이상이다.

과연 모델이 좋아 보일까?

무조건 5가 아니면 0을 돌려주는 함수 Never5Classifier 정의하여 다시 예측

from sklearn.base import BaseEstimator

class Never5Classifier(BaseEstimator):
    def fit(self, X, y=None):
        pass
    def predict(self, X):
        return np.zeros(len(X), dtype=bool)

never_5_clf = Never5Classifier()
cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")
# array([0.91125, 0.90855, 0.90915])

👉 90% 이상의 정확도로 큰 차이가 없어보임

숫자 5는 대략 10% 정도의 분포를 차지함
따라서 5가 아니다 라고 했을 때 맞을 확률은 당연히 5가 존재하지 않는 비율만큼이기 때문에 90%가 나온다.

never_5_clf.predict(X)
array([False, False, False, ..., False, False, False])

👉 이미지의 10%만 숫자 5이기 때문에 무조건 5가 아닌 것으로 예측하면 정확도는 90%가 된다.

💡 목표값(클래스)들이 불균형인 경우에 정확도(accuracy)는 좋은 지표가 아니다!

오차행렬 (Confusion matrix)

예측값 생성

from sklearn.model_selection import cross_val_predict

y_train_pred = cross_val_predict(log_clf, X_train, y_train_5, cv=3)

y_train_pred.shape
# (60000,)

from sklearn.metrics import confusion_matrix

confusion_matrix(y_train_5, y_train_pred)
# array([[54038,   541],
#        [ 1026,  4395]], dtype=int64)

👉

행은 타겟값
- 첫번째 행은 5가 아닌 모든 경우
- 두번째 행은 5인 경우
열은 모델의 예측값
- 첫번째 열은 5가 아닌 것으로 예측한 경우
- 두번째 열은 5인 것으로 예측한 경우

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2F0b12cf95-3d62-4ff4-bfb8-fb340744656e%2F%EA%B7%B8%EB%A6%BC5.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2F0b12cf95-3d62-4ff4-bfb8-fb340744656e%2F%EA%B7%B8%EB%A6%BC5.png)

$\text{precision} = \frac{TP}{TP+FP} (정밀도)$

$\text{recall} = \frac{TP}{TP+FN} (재현율)$

confusion_matrix(y_train_5, y_train_pred)
# array([[54038,   541],
#        [ 1026,  4395]], dtype=int64)

from sklearn.metrics import precision_score, recall_score

precision_score(y_train_5, y_train_pred) # 4395/(4395+541)
# 0.8903970826580226

recall_score(y_train_5, y_train_pred) # 4395/(4395+1026)
# 0.8107360265633647

confusion_matrix(y_train_5, never_5_clf.predict(X)[:60000])
# array([[54579,     0],
#        [ 5421,     0]], dtype=int64)

precision_score(y_train_5, never_5_clf.predict(X)[:60000])
# 0.0

recall_score(y_train_5, never_5_clf.predict(X)[:60000])
# 0.0

👉

never_5_clf 의 경우 positive 부분이 0 이기 때문에 precision, recall 값이 0이다.
accuracy에서는 큰 차이가 없어보였지만 precision, recall 측면에서는 차이가 많이 난다.

🔥 Accuracy(정확도)만 사용하면 올바르지 못한 검증을 하게 되는 것이므로 대부분의 경우 confusion matrix를 사용하는 것이 좋음

만약 하나의 숫자로 표현해야한다면 f1 score를 사용하거나,
목표값의 분포가 50:50 으로 밸런스가 좋은 경우 Accuracy를 사용해도 괜찮음

Error cases 조사하기

errors = (y_train_pred != y_train_5)
errors
# 0        False
# 1        False
# 2        False
# 3        False
# 4        False
#          ...  
# 59995    False
# 59996    False
# 59997    False
# 59998    False
# 59999    False
# Name: class, Length: 60000, dtype: bool

error 인 경우를 출력

5인지를 판별하는 모델에서
- 5인데 5가 아니라고 판별했거나
- 5가 아닌데 5라고 판별한 경우

plt.figure(figsize=(9,9))
plot_digits(X_train[errors][:100].values, images_per_row=10)

save_fig("more_digits_plot")
plt.show()

![https://velog.velcdn.com/images%2Fleeyongjoo%2Fpost%2F3a3ca9d0-61be-44e1-b8d8-082b5373a2a4%2Fimage.png%5D(https%3A%2F%2Fimages.velog.io%2Fimages%2Fleeyongjoo%2Fpost%2F3a3ca9d0-61be-44e1-b8d8-082b5373a2a4%2Fimage.png)

Yongjoo Lee

하나씩 정리하는 개발공부로그입니다.

이전 포스트

(6-3) 머신러닝 기초 - 선형분류

다음 포스트

(6-4) 머신러닝 기초 - 선형분류 (2) + 실습

Programmers 인공지능 데브코스

선형분류

확률적 식별 모델 (probabilistic discriminative models)

다중클래스 로지스틱 회귀 (Multiclass logistic regression)

(실습)

Gradient Descent (batch)

Stochastic Gradient Descent

Mini-batch Gradient Descent

(실습) MNIST 데이터 분류

이진분류기 (Binary classifier)

오차행렬 (Confusion matrix)

Error cases 조사하기

(6-3) 머신러닝 기초 - 선형분류

(6-5) 머신러닝 기초 - 선형분류 실습

0개의 댓글