Classification: 분류와 알고리즘

Yougurt_Man·2022년 5월 28일

Classification K-Nearest Logistic Regression Naive Bayes Random Forest SGD decision tree

Classification

Classification Algorithm

Logistic Regression

Naive Bayes Classfier

Stochastic Gradient Descent (SGD)

K-Nearest Neighbor

Decision Tree

Random Forest

Machine Learning Theory

목록 보기

9/18

Classification

Classification is a process of categorizing a given set of data into classes.
It can be performed on both structured or unstructured data. The process starts with predicting the class of given data points. The classes are often referred to as target, label or categories.

분류란 주어진 데이터셋을 클래스로 분류하는 과정을 의미한다. 설계한 모델은 각 클래스에 대응하는 확률값을 가진다 (0~1).
예) {0: 과일, 1: 야채, 2. 곡물} $\rightarrow$ {0.25, 0.5, 0.25}

Classification Algorithm

Logistic Regression

Independent Variable (입력)와 Dependent Variable (출력)의 관계를 정의하여, 2진분류를 하는 알고리즘.

Naive Bayes Classfier

데이터의 모든 Feature는 서로 독립적이라는 가정하에 사용되는 분류 알고리즘이다. 조건부 확률 $P(A|B)$ (B라는 사건이 주어졌을때, A라는 사건이 일어날 확률)에 기반하여, 입력 데이터의 클래스 확률을 분류한다.

다음은, P(Rain | Play Soccer) 축구를 하는날에, 비가 올 확률을 Bayes 정리로 구한 확률이다.

$P(Play Soccer | Rain)$ 비가 올때 축구를 할 확률 : 0.28
$P(Soccer)$ 축구를 할 확률 : 0.5
$P(Rain)$ 비가 올 확률 : 0.35
$P(Play Soccer | Rain) * P(Rain) / P(Play Soccer)$ 축구를 할때 비가 올 확률 : 0.2

다만, 서로간의 Feature는 반드시 독립적이어야 한다는 조건이 있다.

추후, 조건부 확률은, 최대 우도법 (Maximum Likelyhood)에서 같이 상세히 다뤄보자.

Stochastic Gradient Descent (SGD)

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).