[ML] 3주차-2 : 로지스틱 회귀의 비용함수

k_dah·2021년 11월 15일

machine learning

MachineLearning_AndrewNg

목록 보기

4/32

Machine Learning by professor Andrew Ng in Coursera

Logistic Regression Model

1) Cost Function

logistic regression에서는 parameter $θ$ 를 어떻게 fitting 하는지 알아본다.
우선 이전에 배운 linear regression에서는 아래와 같은 cost function을 사용했다.

Linear Regression:

J(\theta)=\frac{1}{m} \sum_{i=1}^{m} \underbrace{ \frac{1}{2} \left(h_\theta(x^{(i)}-y^{(i)} )\right)^2}_{\color{royalblue}{\textrm{cost}\left(h\theta (x) , y\right)}}

하지만 logistic regression에 같은 cost function을 적용하면 non-convex function이 된다.
즉, gradient descent를 적용할 수 없다.
non-convex function은 경사하강법을 적용해도 최솟값에 도달한다는 보장이 없기 때문이다.
=> 다른 비용함수가 필요하다.

Logistic Regression Cost Function

\text{cost}\big( h_\theta (x), y \big) = \begin{cases} -\log\big(h_\theta (x)\big) & \text{if } y=1 \\ -\log \big(1-h_\theta (x) \big) & \text{if } y=0 \end{cases}

$\text{cost}=0 \text{ if } y = 1 \text{ and } h_\theta(x) = 1$
예측을 1로 했고, y가 실제로도 1이면 cost = 0
하지만 이때 $\text{ cost} \to \infty \text{ as } h_\theta(x) \to \infty$

2) Simplified Cost Function and Gradient Descent

Logistic Regression Cost Function

Logistic Regression:

J(\theta) = \frac{1}{m} \sum_{i=1}^{m} \text{cost}\left( h_\theta ( x^{(i)}), y^{(i)} \right)

where \text{cost}\left( h_\theta (x), y \right) = \begin{cases} -\log(h_\theta (x)) & \text{if } y=1 \\ -\log(1-h_\theta (x)) & \text{if } y=0 \end{cases}

를 간단하게 나타내면 아래와 같다.

Cost(h_\theta(x), y) = -y\log(h_\theta(x)) - (1 - y)\log(1 - h_\theta(x))

최종 Logistic regression cost function :

J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log h_\theta (x) + (1-y^{(i)}) \log \left( 1-h_\theta (x) \right) \right]

$J(\theta)$ 를 최소화하는 최적의 parameter $\theta$ 를 아래의 "경사하강법"을 이용해서 찾는다.

Repeat \{ \theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta) \} (모든 \theta_j 동시에 업데이트)

이때

\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}

linear regression의 gradient descent와 비교해보면, 식의 형태가 동일한 것을 알 수 있다.
하지만 hypothesis 정의가 서로 다르다.

linear regression :

h_\theta(x) = \theta^T(x)

logistic regression :

h_\theta(x) = \frac{1}{1 + e^{-\theta^Tx}}

linear regression에서 feature scaling을 통해 gradient descent를 더 빠르게 수렴하도록 하기도 했는데 이 방법은 logistic regression에도 적용된다.

3) Advanced Optimization

Optimization algorithm

Gradient Descent
Conjugate gradient
BFGS
LBFGS

1 vs 2, 3, 4

2, 3, 4 장점

learning rate를 설정해주지 않아도 된다.
대체로 경사하강법보다 빠르다.

2, 3, 4 단점

더 복잡하다.

k_dah

개똥이

이전 포스트

[ML] 3주차-1 : 분류, 시그모이드 함수, 결정 경계

다음 포스트

[ML] 3주차-2 : 로지스틱 회귀의 비용함수

MachineLearning_AndrewNg

Machine Learning by professor Andrew Ng in Coursera

Logistic Regression Model

1) Cost Function

Logistic Regression Cost Function

2) Simplified Cost Function and Gradient Descent

Logistic Regression Cost Function

3) Advanced Optimization

Optimization algorithm

[ML] 3주차-1 : 분류, 시그모이드 함수, 결정 경계

[ML] 3주차-3 : Multiclass Classification

0개의 댓글