๐ŸŒฒ๊ฒฐ์ •ํŠธ๋ฆฌ(Decision Tree)

์•„๋”ฐ๋ง˜๋งˆยท2021๋…„ 1์›” 16์ผ
2

Machine_Learning(๋ถ„๋ฅ˜)

๋ชฉ๋ก ๋ณด๊ธฐ
1/5
post-thumbnail
post-custom-banner


์ง€๋„ํ•™์Šต ์ค‘์— ์•™์ƒ๋ธ”(Ensemble)์˜ ๊ธฐ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์ผ๋ฐ˜์ ์ธ ML ๋ชจ๋ธ์€ ๊ฒฐ์ • ํŠธ๋ฆฌ์ด๋‹ค. ์•™์ƒ๋ธ”์€ ์˜ˆ์ธก์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฒฐํ•ฉํ•ด ํ™•๋ฅ ์  ๋ณด์™„๊ณผ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ ๋ถ€๋ถ„์— ๊ฐ€์ค‘์น˜๋ฅผ ๊ณ„์† ์—…๋ฐ์ดํŠธํ•˜๋ฉด์„œ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ์ด์— ๊ฒฐ์ •ํŠธ๋ฆฌ๊ฐ€ ์•ฝํ•œ ํ•™์Šต๊ธฐ์— ์ ํ•ฉํ•˜๋‹ค.

๊ฒฐ์ • ํŠธ๋ฆฌ๋ž€??

ML ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์œ„์— ๊ทธ๋ฆผ๋งŒ ๋ด๋„(๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ์ง€๋งŒ) ์–ด๋–ป๊ฒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ง„ํ–‰๋˜๋Š”์ง€ ํ•œ๋ˆˆ์— ์•Œ๊ธฐ ์‰ฝ๋‹ค. ์œ„ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๊ฒฐ์ • ํŠธ๋ฆฌ๋Š” ํŠน์ • ๊ทœ์น™์„ ๋”ฐ๋ผ์„œ ๋ ˆ์ด๋ธ”์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. ์ด ๊ทœ์น™์„ ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ๋งŒ๋“ค์–ด์•ผ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ๋ถ„๋ฅ˜๊ฐ€ ๋  ๊ฒƒ์ธ๊ฐ€๊ฐ€ ์ด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ขŒ์šฐํ•œ๋‹ค.
๋งŒ์•ฝ ๊ทœ์น™์ด ๋งŽ์•„์ง€๊ณ  ๋ชจ๋ธ์ด ๋ณต์žกํ•ด์ง„๋‹ค๋ฉด(ํŠธ๋ฆฌ์˜ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์ง„๋‹ค๋ฉด) ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ๋‹ค.

๊ตฌ์กฐ

  • ๋ฃจํŠธ ๋…ธ๋“œ : ๊ฐ€์žฅ ์ตœ์ƒ๋‹จ ๋…ธ๋“œ.
  • ๊ทœ์น™ ๋…ธ๋“œ : ๋ ˆ์ด๋ธ”์„ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•œ ๊ทœ์น™ ๋…ธ๋“œ. ๊ฐ€๋Šฅํ•œ ์ ์€ ๋ฆฌํ”„ ๋…ธ๋“œ๋กœ ๋†’์€ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๋ ค๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•  ๋•Œ ์ตœ๋Œ€๋ž€ ๋งŽ์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ํ•ด๋‹น ๋ถ„๋ฅ˜์— ์†ํ•˜๊ฒŒ ๊ทœ์น™์ด ์ •ํ•ด์ ธ์•ผ ํ•œ๋‹ค.
  • ๋ฆฌํ”„ ๋…ธ๋“œ : ๋ง๋‹จ ๋…ธ๋“œ. ์—ฌ๊ธฐ์— ๊ฒฐ์ •๋œ ๋ถ„๋ฅ˜๊ฐ’์ด ์กด์žฌ.
  • ๋ธŒ๋žœ์น˜/์„œ๋ธŒ ํŠธ๋ฆฌ : ์ƒˆ๋กœ์šด ๊ทœ์น™ ์กฐ๊ฑด๋งˆ๋‹ค ๊ทœ์น™ ๋…ธ๋“œ ๊ธฐ๋ฐ˜์˜ ์„œ๋ธŒ ํŠธ๋ฆฌ ์ƒ์„ฑ.

์ •๋ณด์˜ ๊ท ์ผ๋„

  • ์ •๋ณด ์ด๋“ : ์—”ํŠธ๋กœํ”ผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์—”ํŠธ๋กœํ”ผ๋Š” ์—ด์—ญํ•™์—์„œ ๋ฐฐ์› ๋“ฏ์ด ํ˜ผ์žก๋„๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๋งŒ์•ฝ ํŠน์ • ์ง‘ํ•ฉ ์•ˆ์— ์„œ๋กœ ๋‹ค๋ฅธ ๊ฐ’์ด ์„ž์—ฌ ์žˆ์œผ๋ฉด ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋†’๊ณ , ์•„๋‹ˆ๋ผ๋ฉด ๋‚ฎ๋‹ค. ์ •๋ณด ์ด๋“์€ 1-์—”ํŠธ๋กœํ”ผ์ด๋‹ค.
    ๊ฒฐ์ •ํŠธ๋ฆฌ๋Š” ์ •๋ณด ์ด๋“์ด ๋†’์€ ์†์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํ• ํ•œ๋‹ค.
  • ์ง€๋‹ˆ ๊ณ„์ˆ˜ : ๊ฒฝ์ œํ•™์—์„œ ๋ถˆํ‰๋“ฑ ์ง€์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ผ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ณ„์ˆ˜. ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์ง€๋‹ˆ๊ณ„์ˆ˜๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก ๋ฐ์ดํ„ฐ ๊ท ์ผ๋„๊ฐ€ ๋†’๋‹ค๊ณ  ํ•ด์„ํ•œ๋‹ค. ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ DecisionTreeClassifier์€ ์ง€๋‹ˆ๊ณ„์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ถ„ํ• ํ•œ๋‹ค.

๊ฒฐ์ • ํŠธ๋ฆฌ ๊ณผ์ •

๊ฒฐ์ • ํŠธ๋ฆฌ ํŠน์ง•

  • ์ •๋ณด์˜ ๊ท ์ผ๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์‰ฝ๊ณ  ์ง๊ด€์ ์ด๋‹ค.
  • ๊ทœ์น™์ด ๋ช…ํ™•ํ•˜๊ณ , ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ณผ์ •์„ ์‹œ๊ฐํ™”๋กœ๋„ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
  • ์ •๋ณด์˜ ๊ท ์ผ๋„๋งŒ ์‹ ๊ฒฝ์“ฐ๋ฉด ๋˜๋ฏ€๋กœ ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ๋ฅผ ์ œ์™ธํ•˜๊ณ  feature์˜ ์Šค์ผ€์ผ๋ง, ์ •๊ทœํ™” ๊ฐ™์€ ์ „์ฒ˜๋ฆฌ ์ž‘์—…์ด ํ•„์š” ์—†๋‹ค. ์•„๋งˆ ์ธ์ฝ”๋”ฉ์€ ํ•ด์•ผํ•  ๋“ฏ.
  • ๊ณผ์ ํ•ฉ์œผ๋กœ ์ •ํ™•๋„๊ฐ€ ๊ธ‰๊ฒฉํžˆ ๋œ์–ด์ง„๋‹ค. ํ”ผ์ฒ˜๊ฐ€ ๋งŽ๊ณ  ๊ท ์ผ๋„๊ฐ€ ๋‹ค์–‘ํ•˜๊ฒŒ ์กด์žฌํ• ์ˆ˜๋ก ํŠธ๋ฆฌ์˜ ๊นŠ์ด๊ฐ€ ์ปค์ง€๊ณ  ๋ณต์žกํ•ด์ง„๋‹ค.

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ

min_samples_split

  • ๋…ธ๋“œ๋ฅผ ๋ถ„ํ• ํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์†Œํ•œ์˜ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ์ˆ˜๋กœ ๊ณผ์ ํ•ฉ ์ œ์–ด์— ์‚ฌ์šฉ.
  • default๋Š” 2์ด๋‹ค. ์ž‘๊ฒŒ ์„ค์ •ํ• ์ˆ˜๋ก ๋ถ„ํ• ๋˜๋Š” ๋…ธ๋“œ๊ฐ€ ๋งŽ์•„์ ธ ๊ณผ์ ํ•ฉ ๊ฐ€๋Šฅ์„ฑ ์ฆ๊ฐ€.

min_samples_leaf

  • ๋ง๋‹จ ๋…ธ๋“œ๊ฐ€ ๋˜๊ธฐ์œ„ํ•œ ์ตœ์†Œํ•œ์˜ ์ƒ˜ํ”Œ ์ˆ˜
  • ๊ณผ์ ํ•ฉ ์ œ์–ด ์šฉ๋„. ๋น„๋Œ€์นญ์  ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ํŠน์ • ํด๋ž˜์Šค์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ทน๋„๋กœ ์ž‘์„ ์ˆ˜ ์žˆ์–ด ์ด ๊ฒฝ์šฐ์—๋Š” ์ž‘๊ฒŒ ์„ค์ •.

max_features

  • ์ตœ์ ์˜ ๋ถ„ํ• ์„ ์œ„ํ•ด ๊ณ ๋ คํ•  ์ตœ๋Œ€ feature ๊ฐœ์ˆ˜. default๋Š” None์œผ๋กœ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ชจ๋“  feature์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„ํ• .
  • int๋กœ ์ง€์ • ์‹œ ๋Œ€์ƒ feature ๊ฐœ์ˆ˜, float๋กœ ์ง€์ • ์‹œ ์ „์ฒด feature์ค‘ ๋Œ€์ƒ feature์˜ ํผ์„ผํŠธ
  • sqrt๋Š” ์ „์ฒด feature์ค‘ ์ œ๊ณฑ๊ทผ ๊ฐ’.
  • auto๋Š” sqrt์™€ ๋™์ผ
  • log๋Š” ์ „์ฒด featur์„ log2()๋กœ ์„ ์ •

max_depth

  • ํŠธ๋ฆฌ์˜ ์ตœ๋Œ€ ๊นŠ์ด ์ง€์ •
  • default๋Š” None. ์ด๋Š” ์™„๋ฒฝํžˆ ํด๋ž˜์Šค ๊ฒฐ์ •๊ฐ’์ด ๋ ๋•Œ ๊นŒ์ง€ ๊นŠ์ด๋ฅผ ๊ณ„์† ํ‚ค์šฐ๊ฑฐ๋‚˜ ๋…ธ๋“œ๊ฐ€ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ min_samples_split๋ณด๋‹ค ์ž‘์•„์งˆ ๋•Œ ๊นŒ์ง€ ๊ณ„์† ๋ถ„ํ• 
  • ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์ง€๋ฉด min_samples_split ์„ค์ •๋Œ€๋กœ ์ตœ๋Œ€ ๋ถ„ํ• ํ•˜์—ฌ ๊ณผ์ ํ•ฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ ๋‹นํ•œ ๊ฐ’์œผ๋กœ ์ œ์–ด.

max_leaf_nodes

  • ๋ง๋‹จ ๋…ธ๋“œ ์ตœ๋Œ€ ๊ฐœ์ˆ˜

๊ฒฐ์ • ํŠธ๋ฆฌ ์‹œ๊ฐํ™”

๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด Graphviz๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€ ๊ฒฐ์ • ํŠธ๋ฆฌ ์‹œ๊ฐํ™” ์™ธ์— Process Mining์„ ํ†ตํ•ด ์ฐพ์€ workflow๋ฅผ ๋ฐฉํ–ฅ์„ฑ ์žˆ๋Š” ๋„คํŠธ์›Œํฌ ํ˜•ํƒœ๋ฅผ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.
์œˆ๋„์šฐ ํ™˜๊ฒฝ์—์„œ ์„ค์น˜๋Š” ์ข€ ๋ณต์žกํ•œ๋ฐ, ๋งฅ๋ถ์—์„œ๋Š” ์ผ๋‹จ ๊ฐ„๋‹จํ•˜๊ฒŒ

pip install graphviz
brew install graphviz

๋ฅผ ํ†ตํ•ด ์„ค์น˜ํ•œ๋‹ค.
์ด์ œ ์„ค์น˜๊ฐ€ ๋๋‚ฌ์œผ๋ฉด ๋ณธ๊ฒฉ์ ์œผ๋กœ iris ๋ฐ์ดํ„ฐ๋กœ ์‹œ๊ฐํ™”๋ฅผ ํ•ด๋ณด์ž!

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd

dt_clf = DecisionTreeClassifier(random_state=156)

iris_data=load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris_data.data, iris_data.target, test_size=0.2, random_state=11)

dt_clf.fit(x_train, y_train)

์šฐ์„  iris๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  DecisionTreeClassifier๊ฐ์ฒด๋ฅผ ์ƒ์„ฑ, ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ train, test ์„ธํŠธ๋กœ ๋ถ„๋ฆฌํ•œ๋‹ค.

from sklearn.tree import export_graphviz

export_graphviz(dt_clf, out_file="tree.dot", class_names=iris_data.target_names,\
               feature_names=iris_data.feature_names, impurity=True, filled=True)

export_graphviz๋ฅผ ํ†ตํ•ด tree.dot์ด๋ผ๋Š” ์ถœ๋ ฅํŒŒ์ผ์„ ๋งŒ๋“ ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด์ œ ์ด ์ถœ๋ ฅํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ ์‹œ๊ฐํ™”๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋!

import graphviz

with open("tree.dot") as f:
    dot_graph = f.read()
    
graphviz.Source(dot_graph)


(๋งฅ ํ™”๋ฉด์ด ์‹œ๊ฐํ™”๊ฐ€ ์ž‘์•„์„œ ์งค๋ ธ๋‹ค...)

max_depth ์„ค์ •

max_depth = 3๋กœ ์˜์‚ฌ๊ฒฐ์ •๋ถ„๋ฅ˜๋ฅผ ์žฌ์„ค์ •ํ•ด๋ดค๋‹ค.

(์—„์ฒญ ๊ฐ„๋‹จํ•ด์กŒ๋‹ค.)

min_sample_split ์„ค์ •

์ด๋ฒˆ์—” min_sample_split = 4์ด๋‹ค.

๋ถ‰์€ ์ƒ์ž๋ฅผ ์ž์„ธํžˆ ๋ณด๋ฉด Sample๊ฐ€ 3๊ฐœ์ด๋ฏ€๋กœ ์„œ๋กœ ๋‹ค๋ฅธ ๊ฐ’์ด ์žˆ์–ด๋„ ๋ถ„๋ฆฌ๋˜์ง€ ์•Š๋Š”๋‹ค.

min_samples_leaf ์„ค์ •

์ด๋ฒˆ์—” min_sample_leaf = 4์ด๋‹ค.

Sample์ด 4์ดํ•˜๋ฉด ๋ฆฌํ”„๋…ธ๋“œ๊ฐ€ ๋˜์–ด ์ง€๋‹ˆ๊ณ„์ˆ˜๊ฐ€ ํฌ๋”๋ผ๋„ ๋”์ด์ƒ ๋ถ„๋ฅ˜๊ฐ€ ๋˜์ง€ ์•Š์•„ ๋ธŒ๋žœ์น˜ ๋…ธ๋“œ๊ฐ€ ์ค„์–ด๋“ค๊ณ  ๊ฒฐ์ •ํŠธ๋ฆฌ ๋ชจ๋ธ์ด ๊ฐ„๊ฒฐํ•ด์ง„๋‹ค.

feature_importances

๊ฒฐ์ • ํŠธ๋ฆฌ๋Š” ๊ท ์ผ๋„์— ๊ธฐ๋ฐ˜ํ•ด ์–ด๋– ํ•œ ์†์„ฑ์„ ์–ด๋–ค ๊ทœ์น™ ์กฐ๊ฑด์œผ๋กœ ์„ ํƒํ•˜๋Š๋ƒ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค. ์ค‘์š”ํ•œ ๋ช‡ ๊ฐœ์˜ feature๊ฐ€ ๋ช…ํ™•ํ•œ ๊ทœ์น™ ๋“œ๋ฆฌ๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ ํฌ๊ฒŒ ๊ธฐ์—ฌํ•˜๋ฉฐ, ๋ชจ๋ธ์„ ์ข€ ๋” ๊ฐ„๊ฒฐํ•˜๊ณ  ์ด์ƒ์น˜์— ๊ฐ•ํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.
feature_importances๋Š” ndarray ํ˜•ํƒœ๋กœ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•˜๋ฉฐ feature ์ˆœ์„œ๋Œ€๋กœ ๊ฐ’์ด ํ• ๋‹น๋œ๋‹ค.

import seaborn as sns
%matplotlib inline

print('Feature importances:\n{0}'.format(np.round(dt_clf.feature_importances_, 3)))

for name, value in zip(iris_data.feature_names, dt_clf.feature_importances_):
    print('{0} : {1:.3f}'.format(name, value))
    
sns.barplot(x=dt_clf.feature_importances_, y=iris_data.feature_names)


๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด์•˜์„ ๋•Œ Petal length(cm)์ด ๊ฐ€์žฅ ์ค‘์š”ํ•œ feature์ด๋‹ค.

๊ฒฐ์ • ํŠธ๋ฆฌ Overfitting

์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ๊ฒฐ์ •ํŠธ๋ฆฌ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•  ๋•Œ ์ผ์œผํ‚ค๋Š” ๊ณผ์ ํ•ฉ์„ ์ง์ ‘ ๋ด๋ณด์ž. ์ด๋ฒˆ์—” ์ž„์˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์ฝ”๋“œ๋ฅผ ์งœ๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
%matplotlib inline

# ์ž„์˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ(features : 2๊ฐœ, classes : 3๊ฐ€์ง€)
plt.title("3 Class values with 2 features Sample creation")

x_features, y_labels = make_classification(n_features=2, n_redundant=0, n_informative=2, \
                                          n_classes=3, n_clusters_per_class=1, random_state=0)
                                          
# x์ถ•์€ ์ฒซ๋ฒˆ์งธ feature, y์ถ•์€ ๋‘๋ฒˆ์งธ feature๋กœ ์‹œ๊ฐํ™”
plt.scatter(x_features[:, 0], x_features[:,1], marker='o', c=y_labels, s=25, edgecolor='k')

  • feature : 2๊ฐœ
  • class : 3๊ฐ€์ง€

์˜ ์ž„์˜์˜ ๋žœ๋ค ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ด๋ณด์•˜๋‹ค. ๊ฒฐ์ • ํŠธ๋ฆฌ์˜ ๊ณผ์ ํ•ฉ์„ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ์ฒ˜์Œ์—” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์—†์ด ์ˆœ์ˆ˜ ๊ฒฐ์ • ํŠธ๋ฆฌ๋กœ ๋ถ„๋ฅ˜ํ•ด๋ณด์ž.

from sklearn.tree import DecisionTreeClassifier

dt_clf = DecisionTreeClassifier().fit(x_features, y_labels)

๊ฒฐ์ • ํŠธ๋ฆฌ ๋ชจ๋ธ์— ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œ์ผฐ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ด๋ฒˆ์—” ๋“ฑ๊ณ ์„  ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ์œ„์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ถ„๋ฅ˜๋˜๋Š”์ง€ ์ง์ ‘ ๋ˆˆ์œผ๋กœ ๋ณผ๊ฒƒ์ด๋‹ค.

def visualize_boundary(model, x, y):
    fig, ax = plt.subplots()
    
    ax.scatter(x[:,0], x[:,1], c=y, s=25, cmap='rainbow', edgecolor='k',
              clim=(y.min(), y.max()), zorder=3)
    ax.axis('tight')   # tight : ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ์ถ•์˜ ๋ฒ”์œ„๋ฅผ ํฌ๊ฒŒ ์„ค์ •
    ax.axis('off')     # off : ์ถ•๊ณผ ๋ผ๋ฒจ์„ ๋ˆ๋‹ค.
    xlim_start, xlim_end = ax.get_xlim()   # ์ถ•์˜ ๋ฒ”์œ„๋ฅผ ์ถ”์ถœ
    ylim_start, ylim_end = ax.get_ylim()
    
    model.fit(x, y)
    
    # ๊ฒฉ์žํ–‰๋ ฌ ์ƒ์„ฑ
    xx, yy = np.meshgrid(np.linspace(xlim_start, xlim_end, num=200), np.linspace(ylim_start, ylim_end, num=200))
    # ๋ชจ๋ธ ์˜ˆ์ธก, ๊ฒฉ์ž ํ–‰๋ ฌ ์ขŒํ‘œ๊ฐ’์„ ๋ชจ๋ธ๋กœ ์˜ˆ์ธกํ•˜์—ฌ contourf์˜ ๋†’์ด์— ํ•ด๋‹นํ•˜๋Š” z ๊ตฌํ•œ๋‹ค.
    z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
    
    n_classes = len(np.unique(y))
    # ๋“ฑ๊ณ ์„  ๊ทธ๋ฆฌ๊ธฐ ํ•จ์ˆ˜
    contours = ax.contourf(xx, yy, z, alpha=0.3,
                          level=np.arange(n_classes + 1) - 0.5,
                          cmap='rainbow', clim=(y.min(), y.max()),
                          zorder=1)
                          
visualize_boundary(dt_clf, x_features, y_labels)


ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์—†์ด ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ ์ด์ƒ์น˜๊นŒ์ง€ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ฒฐ์ • ๊ธฐ์ค€์ด ๋งŽ์•„์กŒ๋‹ค. ๊ฒฐ๊ตญ ๋ชจ๋ธ์ด ๋ณต์žกํ•ด์กŒ๋‹ค๋Š” ๊ฒฐ๊ณผ์ด๋‹ค. ์ด๋Ÿฐ ๋ณต์žกํ•œ ๋ชจ๋ธ์€ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ํŠน์„ฑ์ด ์กฐ๊ธˆ๋งŒ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ์˜ˆ์ธก์„ ํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง„๋‹ค.
์ด๋ฒˆ์—” min_samples_leaf=6์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ๊ธฐ์ค€ ๊ฒฝ๊ณ„๋ฅผ ์‚ดํŽด๋ณด์ž.

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•˜๋‹ˆ ์ด์ƒ์น˜์— ํฌ๊ฒŒ ๋ฐ˜์‘ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์œผ๋กœ ๋ฐํ˜€์กŒ๋‹ค. ์ด๋Ÿฐ ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ๋ชจ๋ธ๋ณด๋‹ค ๋” ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๋‹ค.

profile
๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ์ง€๋งŒ ๊พธ์ค€ํžˆ ๋‹ฌ๋ ค์„œ ๋„์ฐฉ์ง€์ ์— ๋ฌด์‚ฌํžˆ ๋„๋‹ฌํ•˜์ž
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€