๐Ÿ’”(220923) ๊ฒฐ์ •ํŠธ๋ฆฌ ๋ฐ ์•™์ƒ๋ธ” ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ

์ด์€๊ฒฝยท2022๋…„ 9์›” 26์ผ
0

3. ๊ฒฐ์ •ํŠธ๋ฆฌ ๋ถ„๋ฅ˜๋ชจ๋ธ ํ™œ์šฉํ•œ ์œ ๋ฐฉ์•” ์˜ˆ์ธก

0) ๋ฌธ์ œ์ •์˜

  • DecisionTreenClassifier ์‚ฌ์šฉํ•˜์—ฌ ์œ ๋ฐฉ์•” ์–‘์„ฑ(2), ์•…์„ฑ(4) ์˜ˆ์ธก

1) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ

from sklearn import preprocessing # ์ •๊ทœํ™” ํฌํ•จํ•œ ์ „์ฒ˜๋ฆฌ ๊ด€๋ จ
from sklearn.model_selection import train_test_split # ๋ฐ์ดํ„ฐ์…‹ ๋ถ„๋ฆฌ ๊ด€๋ จ
from sklearn import tree # ๊ฒฐ์ •ํŠธ๋ฆฌ ๋ชจ๋ธ ๊ด€๋ จ
from sklearn import metrics # ์„ฑ๋Šฅํ‰๊ฐ€ ๊ด€๋ จ

import pandas as pd
import numpy as np

2) ๋ฐ์ดํ„ฐ ์ค€๋น„ํ•˜๊ธฐ

uci_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/\
breast-cancer-wisconsin/breast-cancer-wisconsin.data'

df = pd.read_csv(uci_path, header = None) 
df.head()
  • url์„ ๋ณ€์ˆ˜์— ๋‹ด๊ณ  ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ ์…‹์„ ์ฝ์–ด ๋“ค์ธ๋‹ค.
  • 'header = None'์€ ์ปฌ๋Ÿผ๋ช…์ด ์—†๋‹ค๋Š” ๋œป์ด๊ณ  ํŒ๋‹ค์Šค๋Š” 0๋ถ€ํ„ฐ ์ˆœ์„œ๋Œ€๋กœ header๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค.
  • ๋ฐ์ดํ„ฐ์…‹ ์ถœ์ฒ˜: https://archive.ics.uci.edu/ml/machine-learning-databases//breast-cancer-wisconsin/
# ์—ด ์ด๋ฆ„ ์ง€์ •ํ•˜๊ธฐ
df.columns = ['id', 'clump', 'cell_size', 'cell_shape', 'adhesion', 'epithlial',
              'bare_nuclei', 'chromatin', 'normal_nucleoli', 'mitoses', 'class']
df.head()
  • ๋ฐ์ดํ„ฐ์…‹ ์ถœ์ฒ˜ ์›น์‚ฌ์ดํŠธ์—์„œ ์ปฌ๋Ÿผ๋ช…์„ ํ™•์ธํ•ด ์—ด ์ด๋ฆ„์„ ์ง€์ •ํ•ด์ค€๋‹ค.

3) ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ

-df.info()๋กœ ํ™•์ธํ•ด๋ณด๋‹ˆ ์ด 699๊ฑด์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ณ  ๋‹ค๋ฅธ ์ปฌ๋Ÿผ์€ ์ •์ˆ˜ํ˜•(์ˆซ์žํ˜•)์ด์—ˆ์ง€๋งŒ, 'bare_nuclei'๋ผ๋Š” ์ปฌ๋Ÿผ๋งŒ object type์ด์—ˆ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ์ˆซ์ž๋งŒ ์ดํ•ดํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

# 1) 'bare_nuclei'์˜ ๊ณ ์œ ๊ฐ’(unique) ํ™•์ธ
df['bare_nuclei'].unique()
  • ๊ฒฐ๊ณผ๊ฐ’: array(['1', '10', '2', '4', '3', '9', '7', '?', '5', '8', '6'], dtype=object)
  • '?' ๋•Œ๋ฌธ์— object ํƒ€์ž…์ด ๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
# 2) '?' -> np.nana์œผ๋กœ ๋ณ€๊ฒฝํ•˜๊ณ  ์ˆ˜๋ฅผ ํ™•์ธํ•˜์ž
df['bare_nuclei'].replace('?', np.nan, inplace = True)
df['bare_nuclei'].isna().sum()
  • ๊ฒฐ๊ณผ๊ฐ’์€ 16์œผ๋กœ ์ด NaN๊ฐ’์ด 16๊ฐœ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
# 3) NaN๊ฐ’์„ ์‚ญ์ œํ•˜๊ณ  ์ „์ฒด ์ˆ˜ ํ™•์ธ
df.dropna(subset=['bare_nuclei'], axis =0, inplace = True)
df.info()
  • ํ•ด๋‹น ์—ด์—์„œ NaN๊ฐ’์ด ์žˆ๋‹ค๋ฉด ๊ทธ ๊ฐ’์ด ์†ํ•œ ํ–‰์„ ๋ชจ๋‘ ์ง€์›Œ์ค€ ํ›„ ๋ฐ์ดํ„ฐ ์ˆ˜๋ฅผ ํ™•์ธํ•ด๋ณด๋‹ˆ 683๊ฐœ๋กœ 16๊ฐœ์˜ NaN๊ฐ’์ด ๋ชจ๋‘ ์ง€์›Œ์ง„ ๊ฒƒ์„ ํ™•์ธํ•จ
# 4) bare_nuclei ์ปฌ๋Ÿผ์˜ ํ˜•๋ณ€ํ™˜
df['bare_nuclei'] = df['bare_nuclei'].astype('int64')
df.info()

  • ๋ชจ๋“  ์ปฌ๋Ÿผ์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์ด ์ •์ˆ˜ํ˜•์ด ๋˜์—ˆ์Œ

4) ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌํ•˜๊ธฐ

X = df[['clump', 'cell_size', 'cell_shape', 'adhesion', 'epithlial',
        'bare_nuclei', 'chromatin', 'normal_nucleoli', 'mitoses']]
y = df['class']
  • id๋Š” ์œ ๋ฐฉ์•”์˜ ํŠน์ง•์ด ์•„๋‹Œ uniqueํ•œ ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์— X feature์—์„œ ์ œ์™ธ
X

  • X๋ฅผ ์ถœ๋ ฅํ•ด๋ณด๋ฉด ๊ฐ ๋ฐ์ดํ„ฐ๋“ค์˜ ๊ตฌ๊ฐ„์ด ๋„“๊ธฐ ๋•Œ๋ฌธ์— ๋จธ์‹ ๋Ÿฌ๋‹์ด ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ค์šด ์กฐ๊ฑด์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ •๊ทœํ™”->์Šค์ผ€์ผ๋ง์ด ํ•„์š”ํ•˜๋‹ค.
#์Šค์ผ€์ผ๋ง ํ›„ ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
X = preprocessing.StandardScaler().fit(X).transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 7)
print('train shape', X_train.shape)
print('test shape', X_test.shape)
  • ๊ฒฐ๊ณผ๊ฐ’:
    train shape (478, 9)
    test shape (205, 9)

5) ๊ฒฐ์ •ํŠธ๋ฆฌ ๋ถ„๋ฅ˜๋ชจ๋ธ ์„ค์ •

tree_model = tree.DecisionTreeClassifier(criterion= 'entropy', max_depth=5)
  • ๋ชจ๋ธ ๊ฐ์ฒด ์ƒ์„ฑ(์ตœ์ ์˜ ์†์„ฑ์„ ์ฐพ๊ธฐ ์œ„ํ•ด criterion='entropy'๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ) -> ์ ์ •ํ•œ ๋ ˆ๋ฒจ ๊ฐ’ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”
  • ์ตœ๋Œ€ ๊ฐ€์ง€ ๊นŠ์ด๋ฅผ 5๊ฐœ๋กœ ๊ทœ์ œํ•จ -> ๊ฐ•๋ ฅํ•œ ์‚ฌ์ „ ๊ฐ€์ง€์น˜๊ธฐ

6) ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ ๋ฐ ์˜ˆ์ธกํ•˜๊ธฐ

tree_model.fit(X_train, y_train)
y_pred = tree_model.predict(X_test)

7) ๋ชจ๋ธ ์„ฑ๋Šฅํ‰๊ฐ€

print('ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.2f}%'.format(tree_model.score(X_train, y_train)*100))
print('ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.2f}%'.format(tree_model.score(X_test, y_test)*100))
  • ์†Œ์ˆ˜์  ๋‘˜์งธ์ž๋ฆฌ๊นŒ์ง€ ์ •ํ™•๋„๋ฅผ ์ธก์ •
  • ๊ฒฐ๊ณผ๊ฐ’:
    ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 98.33%
    ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 94.63%

8) ๋ชจ๋ธ ์„ฑ๋Šฅํ‰๊ฐ€ ์ง€ํ‘œ ๊ณ„์‚ฐ

tree_report = metrics.classification_report(y_test, y_pred)
print(tree_report)

  • ์•…์„ฑ์˜ ์ •ํ™•๋„๋Š” 90%, ์žฌํ˜„์œจ์€ 96%์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

9) ๊ฒฐ์ • ํŠธ๋ฆฌ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ

from sklearn.tree import export_graphviz # ํŠธ๋ฆฌ ๋ชจ๋ธ์˜ ๊ทธ๋ž˜ํ”„
export_graphviz(tree_model, out_file='tree.dot', class_names=['์•…์„ฑ','์–‘์„ฑ'],
                feature_names=df.columns[1:10], impurity=False, filled=True)
                
import graphviz
with open('tree.dot') as f:
  dot_graph = f.read()
display(graphviz.Source(dot_graph))                

  • ๊ฐ€์ง€๊ฐ€ 2๊ฐœ ์ ์–ด๋„ 3๊ฐœ๋งŒ ๊ทธ๋ ค์ ธ๋„ ๋Œ€๋‹ค์ˆ˜์˜ ์•…์„ฑ์„ ๋ถ„๋ฅ˜ํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
  • ๋‹ต์ •๋„ˆ ๊ฐ™์€ ์Šคํƒ€์ผ์ด๋ผ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ณผ๋Œ€์ ํ•ฉ๋˜๋Š” ๊ฒฝํ–ฅ์ด ๊ฐ•ํ•˜๋‹ค.(๊ฒฐ์ •ํŠธ๋ฆฌ ๋‹จ์ )

10) ๊ทธ๋ž˜ํ”„ ๊ทธ๋ ค์„œ ์ตœ์ ์˜ ๊ฐ€์ง€์ˆ˜ ์ฐพ๊ธฐ

train_scores = []
test_scores = []

max_depth = np.arange(1, 10, 1) #๊ฐ€์ง€ ๊ฐœ์ˆ˜ 1-10๊นŒ์ง€ ์„ค์ •

for n in max_depth:
  # ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ํ•™์Šต
  tree_model = tree.DecisionTreeClassifier(criterion= 'entropy', max_depth=n).fit(X_train, y_train)
  # ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ ์ €์žฅ
  train_scores.append(tree_model.score(X_train, y_train))
  # ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ ์ €์žฅ
  test_scores.append(tree_model.score(X_test, y_test))

# ์˜ˆ์ธก ์ •ํ™•๋„ ๋น„๊ต ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ

plt.figure(dpi=150) 

plt.plot(max_depth, train_scores, label='ํ›ˆ๋ จ ์ •ํ™•๋„')
plt.plot(max_depth, test_scores, label='ํ…Œ์ŠคํŠธ ์ •ํ™•๋„')

plt.ylabel('์ •ํ™•๋„')
plt.xlabel('๊ฐ€์ง€๊ฐœ์ˆ˜')
plt.legend()
plt.show() 

  • ๊ฐ€์ง€๊ฐœ์ˆ˜๋Š” 3์ด ์ ์ •ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

4. ์•™์ƒ๋ธ” ๋ถ„๋ฅ˜๋ชจ๋ธ ํ™œ์šฉํ•œ ์‹ฌ์žฅ๋ณ‘ ์˜ˆ์ธก

0) ๋ฌธ์ œ ์ •์˜

  • ํ™˜์ž ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹ฌ์žฅ ์งˆํ™˜ ์œ ๋ฌด(์ •์ƒ:0, ์‹ฌ์žฅ์งˆํ™˜ ์ง„๋‹จ:1)๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋กœ ์ •์˜
  • ๋ฐ์ดํ„ฐ ์ถœ์ฒ˜: https://archive.ics.uci.edu/ml/datasets/heart+disease

1) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ

import numpy as np # ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ด€๋ จ
import pandas as pd

import matplotlib.pyplot as plt # ์‹œ๊ฐํ™” ๊ด€๋ จ
import plotly.express as px
import seaborn as sns

from sklearn.model_selection import train_test_split # ๋ฐ์ดํ„ฐ ์…‹ ๋ถ„๋ฆฌ ๊ด€๋ จ
from sklearn.preprocessing import StandardScaler # ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ๊ด€๋ จ

from sklearn.linear_model import LogisticRegression # ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจ๋ธ
from sklearn.tree import DecisionTreeClassifier # ๊ฒฐ์ •ํŠธ๋ฆฌ ๋ถ„๋ฅ˜ ๋ชจ๋ธ
from sklearn.ensemble import RandomForestClassifier # ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ ๋ถ„๋ฅ˜ ๋ชจ๋ธ(์•™์ƒ๋ธ”)
from sklearn.ensemble import GradientBoostingClassifier # ๊ทธ๋ž˜๋””์–ธํŠธ๋ถ€์ŠคํŒ… ๋ถ„๋ฅ˜ ๋ชจ๋ธ(์•™์ƒ๋ธ”)

from sklearn import metrics # ์„ฑ๋Šฅํ‰๊ฐ€
df = pd.read_csv('/content/heart.csv')
df.head()

  • csv ํŒŒ์ผ์„ ์ฝ์–ด๋“ค์ด๊ณ  ๋ฐ์ดํ„ฐ ํ™•์ธํ•ด๋ณด๋‹ˆ ๋‘๊ฐ€์ง€ ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•ด๋ณด์ž„
  • ๊ตฌ๊ฐ„์ด ๋„“์€ ์ˆซ์žํ˜•์€ ์ •๊ทœํ™” ์ฒ˜๋ฆฌ, ๋ฒ”์ฃผํ˜•์€ ์ธ์ฝ”๋”ฉ

2) ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

df['sex'].unique()
  • 0์•„๋‹ˆ๋ฉด 1์ธ ๊ฒƒ์„ ๋ณด์•„ ๋ฒ”์ฃผํ˜•์ž„ -> ์›ํ•ซ์ธ์ฝ”๋”ฉํ•˜์ž
df['cp'].unique() 
  • 0 - 3 ํ†ต์ฆ์˜ ์ข…๋ฅ˜ -> ์›ํ•ซ์ธ์ฝ”๋”ฉ ํ•˜์ž
df['fbs'].unique() 
  • 0์•„๋‹ˆ๋ฉด 1์ธ ๊ฒƒ์„ ๋ณด์•„ ๋ฒ”์ฃผํ˜•์ž„ -> ์›ํ•ซ์ธ์ฝ”๋”ฉํ•˜์ž
categorical_var = ['sex', 'cp', 'fbs', 'restecg', 'exng', 'slp',	'caa',	'thall']
df[categorical_var] = df[categorical_var].astype('category') 
  • 1) ์นดํ…Œ๊ณ ๋ฆฌ(๋ฒ”์ฃผํ˜•) ์ปฌ๋Ÿผ -> Dtpe ๋ณ€๊ฒฝ -> ์›ํ•ซ์ธ์ฝ”๋”ฉ -> ์˜ˆ์‹œ. sex, fbs ์ปฌ๋Ÿผ
  • ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ ์ปฌ๋Ÿผ์„ ์นดํ…Œ๊ณ ๋ฆฌํ˜• ๋ณ€์ˆ˜์— ๋‹ด๊ณ , ์ˆซ์žํ˜• -> ์นดํ…Œ๊ณ ๋ฆฌ ํƒ€์ž…์œผ๋กœ ๋ณ€๊ฒฝ
numberic_var = [i for i in df.columns if i not in categorical_var][:-1]
  • 2) ์ˆซ์žํ˜•(์—ฐ์†ํ˜•) ์ปฌ๋Ÿผ -> ์ •๊ทœํ™” ํ•„์š”
  • df์˜ ์ปฌ๋Ÿผ ์ค‘ ์นดํ…Œ๊ณ ๋ฆฌํ˜• ๋ณ€์ˆ˜์— ํฌํ•จ๋˜์ง€ ์•Š์€ ์ปฌ๋Ÿผ์€ ์ˆซ์žํ˜• ๋ณ€์ˆ˜์— ๋‹ด๊ธฐ
X= df.iloc[:, :-1]# ์•ž: data row ์ˆ˜ ๋’ค: column ์ˆ˜(':'=์ „๋ถ€๋ฅผ ์˜๋ฏธ)
y = df['output']
  • ์ธ์ฝ”๋”ฉ๊ณผ ์ •๊ทœํ™”๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ์ œ์ง€์™€ ์ •๋‹ต์ง€์— ๋จผ์ € ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ด๋Š”๋‹ค.
  • df์˜ ๋งˆ์ง€๋ง‰ ์—ด์ด์ž ์ •๋‹ต์ง€์ธ output์„ X feature๊ฐ’์—์„œ ์ œ์™ธํ•œ๋‹ค.
#๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

temp_X = pd.get_dummies(X[categorical_var])

X_modified = pd.concat([X, temp_X], axis =1)

X_modified.drop(categorical_var, axis=1, inplace =True)
X_modified
  • 1) ๋ฒ”์ฃผํ˜• ์ปฌ๋Ÿผ๋“ค -> ์›ํ•ซ์ธ์ฝ”๋”ฉ
  • 2) ์›ํ•ซ์ธ์ฝ”๋”ฉ ์ปฌ๋Ÿผ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธฐ์กด df์™€ ์—ด ๊ธฐ์ค€ ๋จธ์ง€
  • 3) ๊ธฐ์กด ๋ฒ”์ฃผํ˜• ์ปฌ๋Ÿผ ์‚ญ์ œ

#์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ ์ „์ฒ˜๋ฆฌ

X_modified[numberic_var] = StandardScaler().fit(X_modified[numberic_var]).transform(X_modified[numberic_var])
  • ๊ตฌ๊ฐ„์ด ๋„“์—ˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ 0๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ์Šค์ผ€์ผ๋ง

3) ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌํ•˜๊ธฐ

X_train, X_test, y_train, y_test = train_test_split(X_modified, y, test_size=0.2, random_state = 7)
print('train shape', X_train.shape)
print('test shape', X_test.shape)
  • ๊ฒฐ๊ณผ๊ฐ’:
    train shape (188, 30)
    test shape (48, 30)

4) ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ ์„ค์ • ๋ฐ ํ•™์Šต

# 1. LogisticRegression
logreg = LogisticRegression().fit(X_train, y_train)
                                  
print('ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.5f}%'.format(logreg.score(X_train, y_train)*100))
print('ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.5f}%'.format(logreg.score(X_test, y_test)*100))
  • ๊ฒฐ๊ณผ๊ฐ’:
    ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 90.42553%
    ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 83.33333%
# 2. DecisionTree
tree = DecisionTreeClassifier(max_depth=5, min_samples_leaf=20, min_samples_split=40).fit(X_train, y_train)

print('ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.5f}%'.format(tree.score(X_train, y_train)*100))
print('ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.5f}%'.format(tree.score(X_test, y_test)*100))
  • ๊ฒฐ๊ณผ๊ฐ’:
    ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 81.91489%
    ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 81.25000%
# 3. RandomForest
# ๊ฒฐ์ •ํŠธ๋ฆฌ์™€ ๋‹ฌ๋ฆฌ ๋žœ๋คํ•˜๊ฒŒ ๊ฐ€์ง€์น˜๊ธฐ ์‹œ์ž‘ feature๋ฅผ ์„ ํƒํ•œ๋‹ค.
random = RandomForestClassifier(n_estimators=300, random_state=7).fit(X_train, y_train) 

print('ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.8f}%'.format(random.score(X_train, y_train)*100))
print('ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.8f}%'.format(random.score(X_test, y_test)*100))
  • ๊ฒฐ๊ณผ๊ฐ’:
    ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 100.00000000%
    ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 85.41666667%
  • ํ˜„์žฌ ํ›ˆ๋ จ ์ •ํ™•๋„๊ฐ€ 100%๋ผ๋Š” ๊ฑด ๊ณผ๋Œ€์ ํ•ฉ์ด๋ผ๋Š” ๋œป
  • ์กฐ์ • ๊ฐ€๋Šฅ ๊ฐ’: n_estimators
# 4. GradientBoosting
boost = GradientBoostingClassifier(max_depth=3, learning_rate=0.05).fit(X_train, y_train)

print('ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: {:.8f}%'.format(boost.score(X_train, y_train)*100)) 
# ํŒจํ„ด ์ผ๋ฐ˜ํ™”๋กœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •ํ™•๋„๋Š” ๋–จ์–ด์ง
print('ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: {:.8f}%'.format(boost.score(X_test, y_test)*100))
  • ๊ฒฐ๊ณผ๊ฐ’:
    ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„: 98.40425532%
    ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„: 87.50000000%
  • ํ˜„์žฌ ํ›ˆ๋ จ ์ •ํ™•๋„๋ฅผ ๋ณด์•„ ๊ณผ๋Œ€์ ํ•ฉ์ด๋ฏ€๋กœ ๊ฐ•๋ ฅํ•œ ์‚ฌ์ „ ๊ฐ€์ง€์น˜๊ธฐ ํ•„์š”
  • ์กฐ์ • ๊ฐ€๋Šฅ ๊ฐ’: max_depth, learning_rate

์ฐธ๊ณ . ๋ฐ์ดํ„ฐ ๋ถ„์„ To be continued...

0๊ฐœ์˜ ๋Œ“๊ธ€