๐Ÿฆพ ๋จธ์‹ ๋Ÿฌ๋‹ ๐Ÿฆฟ

parkeuยท2022๋…„ 9์›” 20์ผ
0

ABC๋ถ€ํŠธ์บ ํ”„

๋ชฉ๋ก ๋ณด๊ธฐ
24/55

๐Ÿผ ์ง€๋„ํ•™์Šต

๋ถ„๋ฅ˜ : ์ด์ง„๋ถ„๋ฅ˜, ๋‹ค์ค‘๋ถ„๋ฅ˜ -> y, class, ๋ถ„๋ฅ˜, Type

์ด์ง„๋ถ„๋ฅ˜ : ์งˆ๋ฌธ์˜ ๋‹ต์ด ์˜ˆ/์•„๋‹ˆ์˜ค๋กœ๋งŒ ์–‘์„ฑํด๋ž˜์Šค/์Œ์„ฑํด๋ž˜์Šค
๋‹ค์ค‘๋ถ„๋ฅ˜ : ์…‹ ์ด์ƒ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜
์ •๋‹ต์ง€ - y, class, label, ์ข…์†๋ณ€์ˆ˜, Target
์‹œํ—˜์ง€ - x, ๋…๋ฆฝ๋ณ€์ˆ˜, feature, Data

ํšŒ๊ท€ : ์—ฐ์†์ ์ธ ์ˆซ์ž -> ์ฃผ์‹, ์ง‘๊ฐ’, ๊ด€๊ฐ์ˆ˜ ...

๐Ÿ’ ์‹ค์Šต : Iris ํ’ˆ์ข…๋ถ„๋ฅ˜

  • ์„ธ๊ฐœ ์ค‘ ํ•˜๋‚˜ ์˜ˆ์ธก : ๋‹ค์ค‘๋ถ„๋ฅ˜
  • ๋ฌธ์ œ ์ •์˜ :
    ๋ถ“๊ฝƒ์˜ ํ’ˆ์ข…์„ ๋ถ„๋ฅ˜ -> 3๊ฐœ ํ’ˆ์ข… ์ค‘ ํ•˜๋‚˜ ์˜ˆ์ธกํ•˜๋Š” ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋กœ ์ •์˜
  • ๋ฌธ์ œ์ง‘ -> ๋ฐ์ดํ„ฐ, ํŠน์„ฑ(feature), ๋…๋ฆฝ๋ณ€์ˆ˜(x) : ๊ฝƒ์žŽ, ๊ฝƒ๋ฐ›์นจ์˜ ๊ธธ์ด(cm) 4๊ฐ€์ง€
  • ์ •๋‹ต -> ํด๋ž˜์Šค(class), ๋ ˆ์ด๋ธ”(label), ํƒ€๊นƒ(target), ์ข…์†๋ณ€์ˆ˜(y) : ๋ถ“๊ฝƒ์˜ ํ’ˆ์ข… (setosa, versicolor, vriginica)
  • sepal๊ณผ petal(๊ทธ๋ƒฅ๋‚ด๊ฐ€๋ชจ๋ฅด๊ฒ ใ…‡ใ…“์„œ...^^)
# ๋ฐ์ดํ„ฐ ์ค€๋น„ํ•˜๊ธฐ
from sklearn.datasets import load_iris
iris_dataset = load_iris()

# ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ
iris_dataset['target'] # ์ •๋‹ต, label
iris_dataset['data'].shape # -> (150,4) 4๊ฐœ์˜ feature

# ๊ทธ๋ฆฌ๊ธฐ
import matplotlib.pyplot as plt
import pandas as pd

# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„ -> feature์™€ label์˜ ์—ฐ๊ด€์„ฑ์„ ํ™•์ธ
# NumPy ๋ฐฐ์—ด์„ pandas์˜ DataFrame์œผ๋กœ ๋ณ€๊ฒฝ
iris_df = pd.DataFrame(iris_dataset['data'], columns=iris_dataset.feature_names)
# y_train์— ๋”ฐ๋ผ ์ƒ‰์œผ๋กœ ๊ตฌ๋ถ„๋œ ์‚ฐ์ ๋„ ํ–‰๋ ฌ์„ ๋งŒ๋“ฆ
pd.plotting.scatter_matrix(iris_df, c=iris_dataset['target'], figsize=(15,15), marker='o', hist_kwds={'bins': 20}, s = 60, alpha=.8)
plt.show()

import numpy as np
plt.imshow([np.unique(iris_dataset['target'])])
_ = plt.xticks(ticks=np.unique(iris_dataset['target']), # ๋ฆฌํ„ดํƒ€์ž…์„ ํ˜ธ์ถœํ•˜๋Š” ํ•จ์ˆ˜์˜ ๊ฐ’์„ ์“ฐ๊ณ ์‹ถ์ง€ ์•Š์€๋ฐ ๋ณ€์ˆ˜๋ฅผ ์จ์•ผํ•  ๋•Œ '_'์‚ฌ์šฉlabels=iris_dataset['target_names']) 

iris_df2 = iris_df[['petal length (cm)', 'petal width (cm)']]

# ๊ฐ feature๋“ค์˜ ์‚ฐ์ ๋„ ํ–‰๋ ฌ 4 X 4
pd.plotting.scatter_matrix(iris_df2, c=iris_dataset['target'], figsize=(10,10), marker='o', hist_kwds={'bins': 20}, s = 60, alpha=.8)
plt.show()

# ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
# ํ›ˆ๋ จ๋ฐ์ดํ„ฐ, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ -> 70:30 or 75:25 or 80:20 or 90:10
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris_dataset['data'], iris_dataset['target'], test_size=0.25, random_state=777)

# ํ›ˆ๋ จ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ 150 -> 75% -> 112
X_train.shape
# ํ›ˆ๋ จ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ 150 -> 25% -> 38
X_test.shape

# ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ ์„ค์ • -> k-NN
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1) # ์ด์›ƒ์˜ ๊ฐœ์ˆ˜๋ฅผ 1๊ฐœ๋กœ ์ง€์ •

# ํ•™์Šตํ•˜๊ธฐ
knn.fit(X_train, y_train)
# ์˜ˆ์ธกํ•˜๊ธฐ
y_pred = knn.predict(X_test)

# ๋ชจ๋ธํ‰๊ฐ€ํ•˜๊ธฐ
# ์ •ํ™•๋„ ํ™•์ธ
# 1) mean() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ •ํ™•๋„ ํ™•์ธ
np.mean(y_pred==y_test)
# 2) score() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ •ํ™•๋„ ํ™•์ธ -> ํ…Œ์ŠคํŠธ ์…‹์œผ๋กœ ์˜ˆ์ธกํ•œ ํ›„ ์ •ํ™•๋„ ์ถœ๋ ฅ
knn.score(X_test, y_test)
# 3) ํ‰๊ฐ€ ์ง€ํ‘œ ๊ณ„์‚ฐ
from sklearn import metrics
knn_report = metrics.classification_report(y_test, y_pred)
print(knn_report)


๐Ÿ” ์ผ๋ฐ˜ํ™”, ๊ณผ๋Œ€์ ํ•ฉ, ๊ณผ์†Œ์ ํ•ฉ

  • ์ผ๋ฐ˜ํ™” : ๋ชจ๋ธ์ด ์ฒ˜์Œ ๋ณด๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Œ

  • ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ์ด ์ตœ์ 

  • ๊ณผ๋Œ€์ ํ•ฉ : ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์„ธํŠธ์˜ ๊ฐ ์ƒ˜ํ”Œ์— ๋„ˆ๋ฌด ๊ฐ€๊น๊ฒŒ ๋งž์ถฐ์ ธ์„œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ์ผ๋ฐ˜ํ™” ๋˜๊ธฐ ์–ด๋ ค์šธ ๋•Œ ๋‚˜ํƒ€๋‚จ

  • ๊ณผ์†Œ์ ํ•ฉ : ๋„ˆ๋ฌด ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ์ด ์„ ํƒ๋จ

โœ’๏ธ ๋ชจ๋ธ ๋ณต์žก๋„์™€ ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ์˜ ๊ด€๊ณ„

  • ์šฐ๋ฆฌ๊ฐ€ ์ฐพ์œผ๋ ค๋Š” ๋ชจ๋ธ : ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ์ตœ์ ์ ์— ์žˆ๋Š” ๋ชจ๋ธ
  • ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ๋” ๋งŽ์ด ๋ชจ์œผ๋Š” ๊ฒƒ์ด ๋‹ค์–‘์„ฑ์„ ํ‚ค์›Œ์ฃผ๋ฏ€๋กœ ํฐ ๋ฐ์ดํ„ฐ์…‹์€ ๋” ๋ณต์žกํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์คŒ -> ๊ทธ๋Ÿฌ๋‚˜ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์ค‘๋ณตํ•˜๊ฑฐ๋‚˜ ๋งค์šฐ ๋น„์Šทํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๊ฒƒ์€ ๋„์›€ X

๐Ÿ” ์ง€๋„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜

๐Ÿผ ์ค€๋น„

# ํ•œ๊ธ€ ๊นจ์ง ๋ฐฉ์ง€
import matplotlib as mpl
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'retina'

!apt -qq -y install fonts-nanum

import matplotlib.font_manager as fm
fontpath = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font = fm.FontProperties(fname=fontpath, size=9)
plt.rc('font', family='NanumBarunGothic') 
mpl.font_manager._rebuild()

pip install mglearn

โœ’๏ธ ์ด์ง„๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ์…‹ ํ™•์ธํ•˜๊ธฐ

import mglearn
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

# ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ
X , y = mglearn.datasets.make_forge()

# ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ
print('X.shape : ', X.shape) # -> (26,2)
print('y.shape : ', y.shape) # -> (26,)

plt.figure(dpi=100)
plt.rc('font', family='NanumBarunGothic')

# ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
mglearn.discrete_scatter(X[:,0], X[:,1], y) # ์ฒซ๋ฒˆ์งธ feature, ๋‘๋ฒˆ์งธ feature ์ค‘๊ฐ„์ง€์ ์ด scatter์ฐจํŠธ๋กœ ์ฐํž˜
plt.legend(['ํด๋ž˜์Šค 0', 'ํด๋ž˜์Šค 1'], loc=4)
plt.xlabel("์ฒซ๋ฒˆ์งธ ํŠน์„ฑ")
plt.ylabel("๋‘๋ฒˆ์งธ ํŠน์„ฑ")
plt.show()

-> ๋‘๊ฐœ์˜ ํŠน์„ฑ ์ค‘ ๋‘๋ฒˆ์งธ ํŠน์„ฑ์„ ์„ ํƒํ•˜๋Š”๊ฒŒ ์ข‹์Œ


โœ’๏ธ ํšŒ๊ท€ ๋ฐ์ดํ„ฐ์…‹ ํ™•์ธํ•˜๊ธฐ

X, y = mglearn.datasets.make_wave(n_samples=40)

# ๋ฐ์ดํ„ฐ ํ™•์ธ
print('X.shape : ', X.shape) # -> (40, 1)
print('y.shape : ', y.shape) # -> (40,)

# ์‚ฐ์ ๋„ X, y
plt.figure(dpi = 100)
plt.rc('font', family='NanumBarunGothic')
plt.rcParams['axes.unicode_minus'] = False

plt.plot(X, y, 'o')
plt.ylim(-3, 3)

plt.xlabel("ํŠน์„ฑ")
plt.ylabel("ํƒ€๊นƒ")
plt.show()

-> feature๊ฐ’์ด ์ปค์ง€๋ฉด y๊ฐ’๋„ ์ปค์ง


๐Ÿ” ๋ถ„๋ฅ˜๋ฌธ์ œ ์ •์˜

โœ’๏ธ ์œ„์Šค์ฝ˜์‹  ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์…‹

https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

# ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ
cancer['target'] # 0: ์•…์„ฑ, 1: ์–‘์„ฑ
cancer['data'].shape # -> (569, 30) 569๊ฑด์˜ ๋ฐ์ดํ„ฐ ์ˆ˜์™€ 30๊ฐœ์˜ feature์œผ๋กœ ์ด๋ฃจ์–ด์ง„ dataset 

๐Ÿ‘€ ์ด์ง„๋ถ„๋ฅ˜ ํ•  ๋•Œ ๋‚ด๊ฐ€ ์ฐพ์„ ๋ฐ์ดํ„ฐ์…‹์„ 1๋กœ ์„ค์ •ํ•˜๋Š”๊ฒŒ ์œ ๋ฆฌ


โœ’๏ธ 1970๋…„๋Œ€ ๋ณด์Šคํ„ด ์ฃผ๋ณ€์˜ ์ฃผํƒ ํ‰๊ท  ๊ฐ€๊ฒฉ

from sklearn.datasets import load_boston
boston = load_boston()

# ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ
boston.data.shape # -> (506,13)

boston_feature_names
array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')
'CRIM' : ์ง€์—ญ๋ณ„ ๋ฒ”์ฃ„ ๋ฐœ์ƒ๋ฅ 
'ZN' : 25000 ํ‰๋ฐฉํ”ผํŠธ๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ฑฐ์ฃผ ์ง€์—ญ์˜ ๋น„์œจ
'INDUS' : ๋น„์ƒ์—… ์ง€์—ญ ๋„“์ด ๋น„์œจ
'CHAS' : ์ฐฐ์Šค๊ฐ•์— ๋Œ€ํ•œ ๋”๋ฏธ ๋ณ€์ˆ˜ (๊ฐ€๊นŒ์šฐ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0)
'NOX' : ์ผ์‚ฐํ™”์งˆ์†Œ ๋†๋„ 
'RM' : ๊ฑฐ์ฃผํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์˜ ๊ฐœ์ˆ˜
'AGE' : 1940๋…„ ์ด์ „์— ๊ฑด์ถ•๋œ ์†Œ์œ  ์ฃผํƒ์˜ ๋น„์œจ
'DIS' : 5๊ฐœ ์ฃผ์š” ๊ณ ์šฉ์„ผํ„ฐ๊นŒ์ง€์˜ ๊ฐ€์ค‘ ๊ฑฐ๋ฆฌ
'RAD' : ๊ณ ์†๋„๋กœ ์ ‘๊ทผ ์šฉ์ด๋„
'TAX' : 10000๋‹ฌ๋Ÿฌ๋‹น ์žฌ์‚ฐ์„ธ์œจ
'PTRATIO' : ์ง€์—ญ์˜ ๊ต์‚ฌ์™€ ํ•™์ƒ ์ˆ˜ ๋น„์œจ
'B' : ์ง€์—ญ์˜ ํ‘์ธ ๊ฑฐ์ฃผ ๋น„์œจ
'LSTAT' : ํ•˜์œ„ ๊ณ„์ธต์˜ ๋น„์œจ

๐Ÿ” ํšŒ๊ท€๋ฌธ์ œ ์ •์˜

โœ’๏ธ k-Nearest Neighbors(k-NN)

  • k-NN ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ํ•˜๋‚˜๋ฅผ ์ตœ๊ทผ์ ‘ ์ด์›ƒ์œผ๋กœ ์ฐพ์•„ ์˜ˆ์ธก์— ์‚ฌ์šฉ
  • KNeighborsClassifier
    ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํด๋ž˜์Šค 0๊ณผ 1๋กœ ๋‚˜๋‰˜๋Š” ๊ฒฐ์ • ๊ฒฝ๊ณ„ ํ™•์ธ ๊ฐ€๋Šฅ
    ์ด์›ƒ์„ ์ ๊ฒŒ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ ๋ณต์žก๋„ โ†‘

๐Ÿ” ์ง€๋„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ k-NN

  • ๋ถ„๋ฅ˜, ํšŒ๊ท€๋ชจ๋ธ ๋ชจ๋‘ ์ œ๊ณต

๐Ÿผ ์ค€๋น„

pip install mglearn

import matplotlib as mpl
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'retina'

!apt -qq -y install fonts-nanum

import matplotlib.font_manager as fm
fontpath = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font = fm.FontProperties(fname=fontpath, size=9)
plt.rc('font', family='NanumBarunGothic') 
mpl.font_manager._rebuild()

โœ’๏ธ forge๋ฐ์ดํ„ฐ์…‹ ๋ถ„๋ฅ˜

import mglearn
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

plt.figure(dpi=100)
# n_neighbors : ์ฃผ๋ณ€ ๋ช‡๊ฐœ์˜ neighbor ํƒ์ƒ‰ํ• ๊ฑด์ง€
mglearn.plots.plot_knn_classification(n_neighbors=1)
  • n_neighbors = 1์ผ๋•Œ
  • n_neighbors = 3์ผ๋•Œ

โœ’๏ธ ๋ถ„๋ฅ˜ ๋ฌธ์ œ ์ •์˜ forge ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•œ ์ด์ง„๋ถ„๋ฅ˜ ๋ฌธ์ œ๋กœ ์ •์˜

# ๋ฐ์ดํ„ฐ ์ค€๋น„
X , y = mglearn.datasets.make_forge() # X: ๋ฐ์ดํ„ฐ(feature), y: ๋ ˆ์ด๋ธ”(label, ์ •๋‹ต)

# ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ -> ํ›ˆ๋ จ์…‹, ํ…Œ์ŠคํŠธ์…‹
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7) # 75:25(default)

# ๋ฐ์ดํ„ฐ ํ™•์ธ
X_train.shape # 26 ์ค‘ 19
X_test.shape # 26 ์ค‘ 7

๐Ÿ–Š๏ธ k-NN๋ถ„๋ฅ˜๋ชจ๋ธ ์„ค์ •

from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=3)

# ๋ชจ๋ธ ํ•™์Šต
clf.fit(X_train, y_train)

# scoreํ•จ์ˆ˜ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก ์ •ํ™•๋„ ํ™•์ธ
clf.score(X_train, y_train) # ์ •ํ™•๋„ 0.9473684210526315
clf.score(X_test, y_test) # ์ •ํ™•๋„ 0.8571428571428571 ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ฐจ์ด๋งŽ์ด๋‚˜๋Š” overfitting ์ƒํ™ฉ -> ์ตœ์ ์˜ ๋ชจ๋ธ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์—†๋‹ค.

KNeighborsClassifier ์ด์›ƒ์˜ ์ˆ˜์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅํ‰๊ฐ€

  1. ์ด์›ƒ์˜ ์ˆ˜๋ฅผ 1~10๊นŒ์ง€ ์ฆ๊ฐ€์‹œ์ผœ ํ•™์Šต ์ง„ํ–‰
  2. score() ํ•จ์ˆ˜ ์ด์šฉํ•˜์—ฌ ์˜ˆ์ธก ์ •ํ™•๋„ ์ €์žฅ
  3. ์ฐจํŠธ๋กœ ์ตœ์ ์  ์ฐพ๊ธฐ
# ์ด์›ƒ์˜ ์ˆ˜์— ๋”ฐ๋ฅธ ์ •ํ™•๋„๋ฅผ ์ €์žฅํ•  ๋ฆฌ์ŠคํŠธ ๋ณ€์ˆ˜
train_scores = []
test_scores = []

n_neighbors_settings = range(1,15)
# 1 ~ 10๊นŒ์ง€ n_neighbors์˜ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ์„œ ํ•™์Šต ํ›„ ์ •ํ™•๋„ ์ €์žฅ
for n_neighbor in n_neighbors_settings:
  # ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ํ•™์Šต
  clf = KNeighborsClassifier(n_neighbors=n_neighbor)
  clf.fit(X_train, y_train)
  # ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ ์ €์žฅ
  train_scores.append(clf.score(X_train, y_train))
  # ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ ์ €์žฅ
  test_scores.append(clf.score(X_test, y_test))

# ์˜ˆ์ธก ์ •ํ™•๋„ ๋น„๊ต ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
plt.figure(dpi=100)
plt.plot(n_neighbors_settings, train_scores, label='ํ›ˆ๋ จ์ •ํ™•๋„')
plt.plot(n_neighbors_settings, test_scores, label='ํ…Œ์ŠคํŠธ์ •ํ™•๋„')
plt.ylabel('์ •ํ™•๋„')
plt.xlabel('์ด์›ƒ์˜ ์ˆ˜')
plt.legend()
plt.show()

์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด์›ƒ์˜ ์ˆ˜(๊ฒฐ์ •๊ฒฝ๊ณ„)์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅํ‰๊ฐ€

# ๋ฐ์ดํ„ฐ์ค€๋น„
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

# ๋ฐ์ดํ„ฐ๋ถ„๋ฆฌ
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=7)

# 569 -> 75% -> 426 ํ•™์Šต
X_train.shape

# 569 -> 25% -> 143 ํ•™์Šต
X_test.shape

# ์ด์›ƒ์˜ ์ˆ˜์— ๋”ฐ๋ฅธ ์ •ํ™•๋„๋ฅผ ์ €์žฅํ•  ๋ฆฌ์ŠคํŠธ ๋ณ€์ˆ˜
train_scores = []
test_scores = []

n_neighbors_settings = range(1,21)
# 1 ~ 10๊นŒ์ง€ n_neighbors์˜ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ์„œ ํ•™์Šต ํ›„ ์ •ํ™•๋„ ์ €์žฅ
for n_neighbor in n_neighbors_settings:
  # ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ํ•™์Šต
  clf = KNeighborsClassifier(n_neighbors=n_neighbor)
  clf.fit(X_train, y_train)
  # ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ ์ €์žฅ
  train_scores.append(clf.score(X_train, y_train))
  # ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ ์ €์žฅ
  test_scores.append(clf.score(X_test, y_test))

# ์˜ˆ์ธก ์ •ํ™•๋„ ๋น„๊ต ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
plt.figure(dpi=100)
plt.plot(n_neighbors_settings, train_scores, label='ํ›ˆ๋ จ์ •ํ™•๋„')
plt.plot(n_neighbors_settings, test_scores, label='ํ…Œ์ŠคํŠธ์ •ํ™•๋„')
plt.ylabel('์ •ํ™•๋„')
plt.xlabel('์ด์›ƒ์˜ ์ˆ˜')
plt.legend()
plt.show()

profile
๋ฐฐ๊ณ ํŒŒ์šฉ.

0๊ฐœ์˜ ๋Œ“๊ธ€