Linear Regression with Multiple variables

‍이세현·2024년 3월 21일

Multiple features

실제 데이터는 입력 한 가지로 출력이 결정되지 않는다.

ex) 집값을 결정하는 요인(features)은 방의 수, 층수, 연식 등이 있다.

Notation

$n$ : feature의 수
$x^{(i)}$ : $i$ 번째 training example의 input features
$x_j^{(i)}$ : $i$ 번째 training example의 $j$ 번째 feature 값

Hypothesis

h_{\theta}(x) = \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n

관습적으로 벡터 표현을 위해 $x_0$ 은 1로 정의한다.
$\theta_0$ 은 bias로 볼 수 있다.

\mathbf{x} = \begin{bmatrix} x_0 \\ x_1 \\ \begin{array}{c} ⋮ \\ \end{array} \\ x_n \\ \end{bmatrix}

\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \begin{array}{c} ⋮ \\ \end{array} \\ \theta_n \\ \end{bmatrix}

$h_{\theta}(x) = \theta^{\mathbf{T}}x$

Gradient descent for multiple variables

Notations

Hypothesis: $h_{\theta}(x) = \theta^{\mathbf{T}}x = \theta_ox_0 + \theta_1x_1 + ... + \theta_nx_n$
Parameters: $\theta_0, \theta_1, ..., \theta_n$
Cost function: $J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2$
Gradient descent: Repeat $\theta_j := \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta)$

모든 $j$ 에 대해, 즉 모든 $θ$ 에 대해 동시에 gradient descent update해야 한다.

Gradient descent in Practice I: Feature Scaling

feature scale에 따라 contour map의 모양이 달라진다.

Idea: 모든 feature를 비슷한 scale로 만든다.

$x_1$ 의 범위: 0~200
$x_2$ 의 범위: 1~5
feature scale이 큰 $x_1$ 이 dominant
이동 간격 lr이 동일할 때 parameter $θ_2$ 가 더 많이 움직여야 한다.
최저 cost로 도달하는 경로가 달라질 수 있으므로 feature의 범위는 중요하다.

feature value를 각 feature의 범위 크기로 나누어 0과 1 사이로 정규화한다.

scaling은 실험적으로 구할 수 있으며 모델, data에 따라 다르지만 대부분 -1:1 범위로 scaling 할 수 있다.
scaling 방법으로 normalization, standardization 등이 있다.

Gradient descent in Practice II: Learning rate

$\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta)$

Gradient descent
- Debugging: Gradient descent가 알맞게 동작하는지 확인하는 것
- 적절한 learning rate $\alpha$ 를 구하는 것
- 최종적으로 구해지는 loss는 가장 작은 값이어야 한다.
- 한 번의 반복에서 loss가 적어도 0.001 감소해야 수렴한 것으로 본다.
- learning rate $\alpha$ 가 지나치게 작으면 loss는 수렴하지 않는다.
- learning rate를 수정해도 올바르게 동작하지 않으면 모델을 수정해야 한다.

Features and Polynomial regression

Polynomial regression

h_\theta(x) = \theta_0 + \theta_1x + \theta_2x^2 + ... + \theta_nx^n \\ = \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n

동일한 feature에 대해 2차항, 3차항을 대입해 더 나은 결과가 나올 수 있다.
$x$ 에 대한 비선형함수이지만 θ에 대해서는 선형함수이다.
위와 같이 $\theta^Tx$ 로 표현이 가능하면 linear regression으로 해결할 수 있다.
선형성 조건
- $f(ax)=af(x)$
- $f(x+y) = f(x) + f(y)$

Normal equation

분석적으로 $θ$ 를 구하는 방법

linea regression 문제에서 loss function이 MSE이고 convex 형태라면 극솟값이 최솟값이므로 미분을 통해 solution $θ$ 를 한 번에 찾을 수 있다.
cost function이 MSE라면 항상 convex 형태이다.

일반적으로 convex 여부를 확인하는 것은 어려운 일이다.
또한 극값은 매우 많으므로 극솟값을 찾는 것은 무의미하다.
따라서 gradient descent를 통해 train 진행하는 것이 더 낫다.

1D일 때

J(\theta) = a\theta^2 + b\theta + c

미분하여 $\frac{d}{d\theta}J(\theta) = 0$ 을 만족하는 θ를 찾는다.

고차원 입력에 대해

모든 $θ$ 값에 대해 편미분을 수행하여 각각 0이 되게 하는 값을 찾는다.
입력 데이터가 역행렬을 가질 수 있는 경우
normal equation: $\theta = (X^TX)^{-1}X^Ty$ - 기초 과목이므로 증명은 하지 않음
이 방법은 계산적으로 비용이 많이 들 수 있지만, dataset이 작은 경우에 유용할 수 있다.

$m$ training examples, $n$ features일 때

Gradient Descent	Normal Equation
$N$ 이 클 때 잘 동작한다.	반복이 없지만 $N$ 이 크면 오래 걸린다.
debugging을 통해 $\alpha$ 를 결정해야 한다.	$\alpha$ 가 없어도 된다.
수차례 반복하면 언젠가 수렴한다.	$(X^TX)^{-1}$ 을 계산해야 한다.

Normal equation and non-invertibility

의사역행렬 pinv로 계산한다.

‍이세현

Hi, there 👋

이전 포스트

Deep Learning Basics

다음 포스트

Linear Regression with Multiple variables

Multiple features

Notation

Hypothesis

Gradient descent for multiple variables

Notations

Gradient descent in Practice I: Feature Scaling

Gradient descent in Practice II: Learning rate

Features and Polynomial regression

Polynomial regression

Normal equation

1D일 때

고차원 입력에 대해

$m$ training examples, $n$ features일 때

Normal equation and non-invertibility

Deep Learning Basics

Logistic Regression

0개의 댓글

관련 채용 정보

Linear Regression with Multiple variables

Multiple features

Notation

Hypothesis

Gradient descent for multiple variables

Notations

Gradient descent in Practice I: Feature Scaling

Gradient descent in Practice II: Learning rate

Features and Polynomial regression

Polynomial regression

Normal equation

1D일 때

고차원 입력에 대해

mmm training examples, nnn features일 때

Normal equation and non-invertibility

Deep Learning Basics

Logistic Regression

0개의 댓글

관련 채용 정보

$m$ training examples, $n$ features일 때