[ML] Decision Tree

cha-suyeonยท2022๋…„ 3์›” 9์ผ
0

Reference

๐Ÿ’ป ๋”ฅ๋Ÿฌ๋‹์˜ ๊นŠ์ด ์žˆ๋Š” ์ดํ•ด๋ฅผ ์œ„ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐ•์˜ 2-1
๐Ÿ”— Decision Tree Concept of Purity
๐Ÿ”— What is a Decision Tree?

์ €๋Š” ํ•ด๋‹น ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  greedy recursive splitting ์ „๋žต์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๋ถ€๋ถ„์—์„œ ์ดํ•ด๊ฐ€ ๋ช…ํ™•ํžˆ ๋˜์ง€ ์•Š์•„,

๋จธ์‹ ๋Ÿฌ๋‹ ์ง€๋„ํ•™์Šต์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ Decision tree๋ฅผ ๋‹ค์‹œ ํ•œ ๋ฒˆ ์ •๋ฆฌํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

ํ•ด๋‹น ํฌ์ŠคํŒ…์˜ ๋‚ด์šฉ๊ณผ ์ด๋ฏธ์ง€๋Š” ์œ„์˜ ๊ฐ•์˜์™€ ์•„๋ž˜ ํฌ์ŠคํŒ…์„ ์ฐธ๊ณ ํ–ˆ์Œ์„ ๋ฐํž™๋‹ˆ๋‹ค.


์œ„ ์ด๋ฏธ์ง€๋ฅผ ์‚ดํŽด๋ด…์‹œ๋‹ค.

์€ํ–‰์—์„œ ๋Œ€์ถœ์„ ๋ฐ›์œผ๋ ค๋ฉด ๊ณ ๋ คํ•ด์•ผ ํ•  ์ ๋“ค์ด ์ž‡์Šต๋‹ˆ๋‹ค.

์ˆ˜์ž…์ด ์–ด๋Š ์ •๋„์ธ์ง€, ์ง์žฅ์„ ์–ผ๋งˆ๋‚˜ ์˜ค๋ž˜ ๋‹ค๋…”๋Š”์ง€, ์‹ ์šฉ์นด๋“œ๋กœ ๊ฒฐ์ œ๋ฅผ ํ•˜๋Š”์ง€ ๋“ฑ์˜ ์—ฌ๋ถ€๋ฅผ ๋ง์ด์ฃ .

์ด๊ฑด ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ํ˜•ํƒœ์˜ decision tree์ž…๋‹ˆ๋‹ค.


decision tree

decision tree๋Š” regression ๋ฐ classification ๋ฌธ์ œ์— ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” non-parametric ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

decision tree๋Š” ์˜ˆ์ธกํ•œ ๋ณ€์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒฐ๊ณผ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์ˆœ์ฐจ์ ์ด๊ณ  ๊ณ„์ธต์ ์ธ ๊ฒฐ์ •์„ ๋‚ด๋ฆฝ๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ข…๋ฅ˜๊ฐ€ ๋งค์šฐ ๋‹ค์–‘ํ•˜๋ฉฐ, ์ ์šฉ๋˜๋Š” ์‘์šฉ ๋ถ„์•ผ์— ๋”ฐ๋ผ ์‚ฌ์šฉ์ž๊ฐ€ ๋งˆ์Œ๋Œ€๋กœ ๋ณ€ํ™˜ํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

non-parametric์€ error ๋˜๋Š” data distribution์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ๊ฐ€์ •์ด ์—†์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ ๊ด€์ฐฐ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ model์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ decision tree์˜ ํŠน์ง•์œผ๋กœ

  • feature type์— ๋”ฐ๋ผ ์ข…๋ฅ˜๊ฐ€ ๋‹ฌ๋ผ์ง
  • threshold ๊ฐ’ ๊ฒฐ์ •์— ๋”ฐ๋ผ ์ข…๋ฅ˜๊ฐ€ ๋‹ฌ๋ผ์ง
  • stop criterion์— ๋”ฐ๋ผ ์ข…๋ฅ˜๊ฐ€ ๋‹ฌ๋ผ์ง

์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๊ตญ์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” decison tree๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ

  1. ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ๋‘ ๊ฐœ์˜ ๋‚˜๋‰œ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ์„œ๋กœ ๋น„์Šทํ•˜๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•
  2. ๋‚˜๋ˆด์„ ๋•Œ ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๋ฐฉ๋ฒ•
  3. ๋‚˜๋ˆด์„ ๋Œ€ ๋‹จ ํ•œ ๊ฐœ์˜ sample๋งŒ ํ•œ์ชฝ ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€๊ณ  ๋‚˜๋จธ์ง€ sample์€ ๋ฐ˜๋Œ€์ชฝ์œผ๋กœ ๋น ์ง€๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•

๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

3๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ tree์˜ ๋†’์ด๊ฐ€ ๊ธธ์–ด์ง€๊ณ , ํ•œ์ชฝ์œผ๋กœ ์ ๋ฆฌ๊ณ , split์„ ๋งŽ์ด ํ•ด์•ผํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋Š”๋ฐ์š”.

์ด๋Š” sample์˜ ๊ฐœ์ˆ˜ n๊ฐœ, feature type d๊ฐœ, k๊ฐœ์˜ threshold์— ๋”ฐ๋ผ nโˆ—dโˆ—kn*d*k๊ฐœ์˜ ๊ณ„์‚ฐ์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

sample์ด ๋งŽ์œผ๋ฉด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•™์Šต ์‹œ๊ฐ„์ด ๊ธธ์–ด์ง€๋ฉฐ, feature type์ด ๋งŽ๊ฑฐ๋‚˜ threshold๋ฅผ ๋„ˆ๋ฌด ์„ธ์„ธํ•˜๊ฒŒ ๋‚˜๋ˆ„๋ฉด ๊ณ„์‚ฐ๋Ÿ‰์ด ๋Š˜์–ด๋‚ฉ๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๊ฐ๊ฐ์˜ ๋…ธ๋“œ๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ feature type์˜ ๊ฒฝ์šฐ๋ฅผ randomํ•˜๊ฒŒ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ threshold๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ธฐ์ค€์„ 2~3๊ฐœ ์ •๋„๋กœ ์ ๊ฒŒ ํ•ด์„œ ๊ณ„์‚ฐ๋Ÿ‰์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

split node๋Š” ๋งŽ์€ ์กฐํ•ฉ์ด ๋‚˜์˜ฌ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๊ณ , ์™„๋ฒฝํ•œ decision tree๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ์˜๋ฏธ์—์„œ decision tree๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ greedy recursive splitting ์ „๋žต์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.


Splitting Algorithm

decision tree์˜ ๋ชฉํ‘œ๋Š” ๊ฐ ๋…ธ๋“œ์˜ ๋์—์„œ ์ตœ์„ ์˜ ์„ ํƒ์„ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๊ด€์ ์—์„œ ์ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํƒ์š•์ (greedy)์ด๋ฉฐ ์žฌ๊ท€์ (recursive)ํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Greedy๋Š” ๊ฐ€์žฅ ์ตœ์ ์˜ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๊ณ  ์žฌ๊ท€์  ์˜๋ฏธ๋Š” ๋” ํฐ ์งˆ๋ฌธ์„ ๋” ์ž‘์€ ์งˆ๋ฌธ์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

๊ฐ ๋…ธ๋“œ์—์„œ splitting ํ•˜๋Š” decision์€ purity์ด๋ผ๋Š” metric์— ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

๋งŒ์•ฝ ๋…ธ๋“œ๊ฐ€ 50:50 ๋น„์œจ๋กœ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„ํ• ๋˜๋ฉด ๋…ธ๋“œ๋Š” 100% ์ˆœ์ˆ˜ํ•˜์ง€ ์•Š๊ณ , ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹จ์ผ ํด๋ž˜์Šค์— ์†ํ•  ๋•Œ 100% ์ˆœ์ˆ˜ํ•˜๋‹ค๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ตœ๋Œ€ purity์— ๋„๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

decision tree์—์„œ purity ๊ฐœ๋…์€ ํ•˜์œ„ ์ง‘๋‹จ์— ์†ํ•˜๋Š” ๊ทธ๋ฃน์˜ ๋ฐ์ดํ„ฐ ์š”์†Œ ๋น„์œจ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒฐ์ •ํ•˜๋Š”๋ฐ์š”.

์˜ˆ๋ฅผ ๋“ค์–ด ์„ธํŠธ๊ฐ€ ํด๋ž˜์Šค A์˜ ํ•ญ๋ชฉ 60%, ํด๋ž˜์Šค B์˜ 30%, ํด๋ž˜์Šค C์˜ 10%๋กœ ๊ตฌ์„ฑ๋œ ๊ฒฝ์šฐ purity๋Š” 60%์ž…๋‹ˆ๋‹ค.

decision tree๋Š” ํ–‰์„ ์ž์‹ ๋…ธ๋“œ๋กœ ๋‚˜๋ˆ„๋Š” ๋ถ„ํ• ๋กœ ๊ตฌ์„ฑ๋˜๊ธฐ ๋•Œ๋ฌธ์—, tree๊ฐ€ binary๋กœ ๊ฐ„์ฃผ๋˜๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ๋…ธ๋“œ์—๋Š” ๋‘ ๊ฐœ์˜ ์ž์‹์ด ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฐ ๋™์ผํ•œ ์ ˆ์ฐจ๋กœ ํ•˜์œ„ ๊ทธ๋ฃน์„ ๋ถ„ํ• ํ•˜๋ฉฐ, ์ด ๊ณผ์ •์„ ์•ž์— ๋‚˜์™”๋˜ recursiveํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

spliting์€ ๋Œ€์ƒ ๋ณ€์ˆ˜์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” tree๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์„ ํƒ๋˜๋ฉฐ,

training set์—์„œ decision tree๋ฅผ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด greedy์ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด ํ•˜์œ„ ์ง‘๋‹จ์˜ ๊ฐ€์žฅ purityํ•จ, ์ฆ‰ ๋ถ„๊ธฐ ๋…ธ๋“œ์˜ ๊ฐ€์žฅ ๋ช…ํ™•ํ•œ ๋ถ„ํ• ์„ ๋ชฉํ‘œ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

purity์™€ gni ๊ณ„์ˆ˜ ๋“ฑ์— ๋Œ€ํ•œ ๊ฐœ๋… ์„ค๋ช…์€ ์œ„ํ‚ค๋ฐฑ๊ณผ์—์„œ ๋” ์ž์„ธํžˆ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

profile
๋ฏธ๋‚จ์ด ๊ท€์—ฝ์ฃ 

0๊ฐœ์˜ ๋Œ“๊ธ€