Naive Bayes Classifier
P(Yc∣X1,...,Xn)=∏i=1nP(Xi)P(Yc)∏i=1nP(Xi∣Yc)
Yc is a label
Likelihood 계산 방법
∏i=1nP(Xi∣Yc)
P(Xi∣Yc)=∑Nd∈Yc+α⋅V∑tf(xi,d∈Yc)+α
- Xi : A word from the feature vector X of particilar sample.
- ∑tf(xi,d∈Yc) : The sum of raw term frequencies of word Xi belong to class Yc in document
- ∑Nd∈Yc : The sum of all term frequencies in the training dataset for class Yc
- α : An additive smoothing parameter (α=1 for Laplace smoothing)
- V : The size of the vocabulary (number of different words in the training set)
Multinomial Naive Bayes(다항 분포 나이브 베이즈)
Bag of words
- 모든 샘플 텍스트를 하나의 Vector로 표현
- 단어별로 인덱스를 부여해서 한 문장(또는 문서)의 단어의 개수를 Vectore로 표현(Corpus(말뭉치)라고도 함)
Example
Data
| Doc | Words | Class |
|---|
| Training | 1 | Chinese Beijing Chinese | c |
| 2 | Chinese Chinese Shanghai | c |
| 3 | Chinese Macao | c |
| 4 | Tokyo Japan Chinese | j |
| Test | 5 | Chinese Chinese Chinese Tokyo Japan | ? |
Bag of words
| | Chinese | Beijing | Shanghai | Macao | Tokyo | Japan | class |
|---|
| Training | 1 | 2 | 1 | 0 | 0 | 0 | 0 | c |
| 2 | 2 | 0 | 1 | 0 | 0 | 0 | c |
| 3 | 1 | 0 | 0 | 1 | 0 | 0 | c |
| 4 | 1 | 0 | 0 | 0 | 1 | 1 | j |
| Test | 5 | 3 | 0 | 0 | 0 | 1 | 1 | ? |
prior * Likelihood
Test의 class가 'c'로 구분될 확률
- P(c∣d5)⋅P(c)P(Chinese∣c)3P(Tokyo∣c)P(Janpan∣c)
=43∗(8+65+1)3∗(8+60+1)∗(8+60+1)
Test의 class가 'j'로 구분될 확률
- P(j)⋅P(Chinese∣j)3P(Tokyo∣j)P(Janpan∣j)
=41∗(3+61+1)3∗(3+61+1)∗(3+61+1)