Logistic Regression

‍이세현·2024년 3월 25일

Introduction to Machine Learning

1

Classification

불연속(discrete), 분류 문제

Examples

Spam / Not Spam
Yes / No
c.f) Regression: 연속된 실수 예측

분류 문제를 regression으로 해결한다면

Threshold $h_\theta(x)$ at 0.5
- if $h_\theta(x) \geq 0.5$ , predict $y=1$
error - 예측값과의 오차로 계산한다면 학습이 제대로 되지 않는다.
- 이상치(특이 값)에 대한 대응이 안 된다.
출력값의 범위가 0, 1을 벗어난다. - 모든 실수
- Classificaiton: y=0 or y=1
- $h_\theta(x)$ 는 1 초과, 0 미만일 수 있다.
- Logistic Regression: $0 \leq h_\theta(x) \leq 1$

Hypothesis Representation

Logistic Regression Model

이름이 회귀이지만 분류 문제를 해결하기 위한 모델이다.

Goal: $0 \leq h_\theta(x) \leq 1$

선형 회귀 hypothesis: $h_\theta(x) = \theta^Tx$
→ Classification hypothesis

0 이상 1 이하의 실수여야 한다.
- Sigmoid function(logistic function)을 이용해서 출력 범위를 0 이상 1 이하로 제한한다. - regression과 유사
- sigmoid function은 확률 모델로 바꾸어주는 역할
0.5를 기준으로 최종 output을 0 또는 1로 설정한다.
$h_\theta(x) = g(\theta^Tx)$
$g(z) = \frac{1}{1+e^{-z}}, z = \theta^Tx$
$x$ 에 따라 $z$ 가 적절히 분배되도록 $\theta$ 가 학습된다.

Interpretation of Hypothesis Output

input $x$ 에 대한 출력 $h_\theta(x)$ 를 $y=1$ 이 되게 하는 probability로 간주한다.

암 세포에 대한 양성 예측 $h_\theta(x) = 0.7, y = 1$
- 양성일 확률이 70%
$h_\theta(x) = P(y=1|x;\theta)$
- $y=0$ or $y = 1$
- $P(y=0|x;\theta)+P(y=1|x;\theta)=1$

Decision boundary

경계선(boundary)을 기준으로 값을 결정한다.

Sigmoid Function
- predict $y=1$ if $h_\theta(x) \geq 0.5,\theta^Tx \geq 0$
- predict $y=0$ if $h_\theta(x) < 0.5, \theta^Tx < 0$

Cost Function

Cost(h_\theta(x), y) = \begin{cases} -log(h_\theta(x)) & \text{if } y=1\\ -log(1-h_\theta(x)) & \text{if } y=0 \end{cases}

1이 정답이라면 1에 가까울수록 0으로 수렴한다.
- 0에 가까울수록 cost는 매우 커진다.

$Cost(h_\theta(x),y) = -y\text{log}(h_\theta(x))-(1-y)\text{log}(1-h_\theta(x))$

Simplified cost function and gradient descent

$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m} \big[ y^{(i)}\text{log}h_\theta(x^{(i)})+(1-y^{(i)})\text{log}(1-h_\theta(x^{(i)})) \big]$

$\theta$ 에 대한 미분
$f=\text{log}h_\theta(x), t=h_\theta(x)$

\frac{\partial{f}}{\partial{\theta_j}} \ = \frac{\partial{f}}{\partial{t}} \frac{\partial{t}}{\partial{\theta_j}} \ = \frac{1}{t\text{ln}10} \frac{\partial{t}}{\partial{\theta}} \ =\frac{1+e^{-\theta^Tx}}{\text{ln}10} \frac{\partial{t}}{\partial{\theta_j}}

\frac{\partial{t}}{\partial{\theta_j}}= \ \frac{\partial{t}}{\partial{\theta^Tx}} \frac{\partial{\theta^Tx}}{\partial{\theta_j}}= \ (\frac{1}{1+e^{-\theta^Tx}})(1-\frac{1}{1+e^{-\theta^Tx}}) \frac{\partial{\theta^Tx}}{\partial{\theta_j}}

\frac{\partial{\theta^Tx}}{\partial{\theta_j}}= \ x_j

(1-\frac{1}{1+e^{-\theta^Tx}})x_j

Multi-class classification: One-vs-all

Multiclass classification

class가 세 가지 이상인 경우 각 class에 대해 모델을 사용한다.

특정 데이터가 입력되었을 때
- class 1 40%
- class 2 20%
- class 3 20%
  와 같이 확률의 합이 1이 아닐 수 있다.
  $P(y=i|x;\theta), (i=1, 2, 3)$
새로운 입력 $x$ 에 대해 $\text{max}_ih_\theta^{(i)}(x)$ 를 출력으로 예측한다.

Hi, there 👋

이전 포스트

Linear Regression with Multiple variables

다음 포스트

Regularization

0개의 댓글

관련 채용 정보