Regression: None-Linear Regression

챱챱챱스테이크·2022년 7월 8일

0

regression

None-Linear

x, y의 관계가 직선으로 표현되지 않고, 곡선으로 표현된다

why use polynomial regression?

exemple problem

비선형 관계에서 Linear Regression 시 MAE가 큰 값을 가진다.

하지만 곡선으로 Regression 시행 시, MAE가 낮아진다.
result

비선형 관계에서 Linear prediction graph보다 curve prediction graph가 더 정확도가 높다.

이러한 curve prediction graph 형태의 회귀 그래프를 만들기 위해 polynomial Regression을 사용한다.

About Polynomial Regression

다항 회귀 모델은 다중 회귀 모델로 '계산 될 수 있다.'

다항 회귀 분석은 다중 회귀 분석의 특수한 형태로 볼 수 있다.

feature change
그림에서 하나의 독립 변수 X를 x0, x1, x2로 확장된 것처럼, 하나의 feature가 여러 개의 feature로 확장된다.
- 이러한 특징으로 다중 회귀 분석의 특수 형태로 볼 수 있다.
  (여러 개의 독립변수 -> 종속 변수 예측)
overfitting
차수가 높아지면 variance(분산)가 증가한다.
이는 overfitting을 의미한다. (Bias-Variance Trade off)
- overfitting이 일어나지 않도록 Goldilocks zone을 찾는 것이 중요.
Goldilocks zone

전체 오류가 가장 낮은 지점을 뜻한다.
Bias-Variance Trade off로 variance, bias를 둘 다 잡을 수는 없으니, 적절한 수준에서 오류가 가장 낮은 부분을 찾아야한다.
exemple code

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# feature 확장을 위해 import 
from sklearn.preprocessing import PolynomialFeatures

x = 10 * np.random.rand(200, 1) - 3
y = 5 * pow(x, 2) + 0.9 * x + 10 * np.random.randn(200, 1)

plt.scatter(x, y)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

None-Linear

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=2
    )

# feature 확장을 위해 (여기서는 2차 방정식에 맞춰서)
poly = PolynomialFeatures(degree=2, include_bias=True)

# train, test를 변환한다.
x_train_trans = poly.fit_transform(x_train)
x_test_trans = poly.fit_transform(x_test)

변환 전과 변환 후 -> 위 그림 예시처럼, x^0, x^1, x^2

lr = LinearRegression()
lr.fit(x_train_trans, y_train)

print(lr.score(x_test_trans, y_test))

r2 score

X_new = np.linspace(-3, 3, 200).reshape(200, 1)
X_new_poly = poly.fit_transform(X_new)

y_new = lr.predict(X_new_poly)

plt.plot(X_new, y_new, "r-", linewidth=2, label="Predictions")
plt.plot(x_train, y_train, "b.",label='Training points')
plt.plot(x_test, y_test, "g.",label='Testing points')

plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

result visualize

예제 공부 출처
여러 개의 독립 변수를 이용한 polynomial regression도 가능하다.

챱챱챱스테이크

#ChrisBumsteadFan

다음 포스트

Car Price Prediction

0개의 댓글