-
Decision Tree
root node에서 선택되는 feature가 뭔지에 따라 깊이가 달라짐(tree 길이). 어떤 피쳐를 가장 먼저 쓰느냐가 depth를 결정하는 큰 요소.
가장 중요한것: root node에서 어떤 descriptive feature 사용하느냐.
-
Entropy: data:image/s3,"s3://crabby-images/782db/782db95445b6877258176d4f4626e002117ded91" alt=""
불확실성을 판단하는 척도
엔트로피가 클수록 불확실성이 높은것.
*정보량이 크다: 일어나는 빈도수가 작은 정보
of surprise??
I(x) = log21/p(x) = -log2p(x)
data:image/s3,"s3://crabby-images/b1484/b1484800fd11fed5015254038acd9454bb191e51" alt=""
- Impurity Metrics
impurity, heterogineity의 측정
decision tree의 분기는 impurity가 작은 방향으로 진행된다.
data:image/s3,"s3://crabby-images/6b6e7/6b6e78c7922702c850869aaae45b91b7a6371e37" alt=""
data:image/s3,"s3://crabby-images/daee8/daee8d7913cbd142310a35e5532df1601413459e" alt=""
가지치고 나간 데이터셋에 가중치(weighting) 부여
- information gain
클수록 인포메이션 게인이 크게 작용했다는 것이니까.
탑다운으로 적용이 됨
-> Entropy of Original dataset is 1.
-> a's E = Entropy after dataset classified with "Suspicious Words" as Root Node = -0.0
b's E = Entropy after dataset classified with "Unknown Sender" as Root Node = 0.9182958340544896
c's E = Entropy after dataset classified with "Contains Images" as Root Node = 1.0
-> a's I = Entropy of Original Dataset - a's E
b's I = Entropy of Original Dataset - b's E
c's I = Entropy of Original Dataset - c's E
- ID3 Algorithm
*Iterative Dichotomizer3
CART
Entropy based I.
data:image/s3,"s3://crabby-images/b20f9/b20f9e48f77d9142cdb862fe2fff1808214c3583" alt=""
<Deciding which desriptive feature should be used as the Root Node>
data:image/s3,"s3://crabby-images/bb097/bb09728e233d552f43db4c982771f987d96f5fd0" alt=""
data:image/s3,"s3://crabby-images/0df53/0df536cd10971676020c8dd7fcd05029c1c4e5bd" alt=""
data:image/s3,"s3://crabby-images/c1800/c180059789e867aaa5c923deccff61ad8eee835b" alt=""
data:image/s3,"s3://crabby-images/16d3d/16d3dcd13d78193ea28d61305001b5ba1318f2f6" alt=""
data:image/s3,"s3://crabby-images/31ee8/31ee815e7db09a5144658ab4881dd793b4ac8ca0" alt=""
<Deciding which desriptive feature should be used as the First Interior Node>
data:image/s3,"s3://crabby-images/02b67/02b6730476122016a0eaf387b428afe4423bb2f9" alt=""
data:image/s3,"s3://crabby-images/dd95e/dd95eb9b790288545ad7c3fc672961199c74cfb6" alt=""
data:image/s3,"s3://crabby-images/fa9f4/fa9f445b60e6a24c2cb186c98c87f3ebc1551d72" alt=""
<Final Decision Tree for the vegetation dataset> data:image/s3,"s3://crabby-images/4c60a/4c60a1f709d219b1bf24150bb08a9761d17ec426" alt=""