[Regressor] Polynomial Regression

안암동컴맹·2024년 4월 7일
0

Machine Learning

목록 보기
97/103

Polynomial Regression

Introduction

Polynomial regression is a statistical technique that expands the capabilities of linear regression by modeling the relationship between the independent variable xx and the dependent variable yy as an nn-th degree polynomial. It provides a way to model a nonlinear relationship through a linear model, by introducing polynomial features, thereby accommodating a broader range of data structures.

Background and Theory

Polynomial Features and Vandermonde Matrix

Polynomial regression can be implemented by transforming the original input features into polynomial features. This process involves generating every combination of features raised to every power up to the nn th degree. For a single independent variable xx, the polynomial features would be x,x2,x3,,xnx, x^2, x^3, \ldots, x^n. This transformation of the input variable into a set of polynomial features is represented by the Vandermonde matrix.

A Vandermonde matrix for a single independent variable xx with mm observations and a polynomial degree of nn is structured as follows:

V=[1x1x12x1n1x2x22x2n1xmxm2xmn]\mathbf{V} = \begin{bmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^n \\ 1 & x_2 & x_2^2 & \cdots & x_2^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \cdots & x_m^n \\ \end{bmatrix}

This matrix is used to transform the single-dimensional input xx into a multi-dimensional feature space, where linear regression techniques can then be applied.

Polynomial Regression Model

Given the Vandermonde matrix V\mathbf{V} and a response vector y\mathbf{y} with mm observations, the polynomial regression model can be expressed as:

y=Vβ+ϵ\mathbf{y} = \mathbf{V}\boldsymbol{\beta} + \boldsymbol{\epsilon}

where:

  • y\mathbf{y} is the vector of observed values of the dependent variable,
  • V\mathbf{V} is the Vandermonde matrix of transformed polynomial features,
  • β\boldsymbol{\beta} is the vector of coefficients [β0,β1,,βn]T[\beta_0, \beta_1, \cdots, \beta_n]^T,
  • ϵ\boldsymbol{\epsilon} represents the vector of errors or residuals.

Derivation of the Normal Equation

The normal equation is derived from the principle of least squares, which aims to minimize the sum of squared residuals RSSRSS between the observed values yy and the values predicted by the model y^\hat{y}.

Given the model equation in matrix form as y=Vβ+ϵ\mathbf{y} = \mathbf{V}\boldsymbol{\beta} + \boldsymbol{\epsilon}, where V\mathbf{V} is the Vandermonde matrix of polynomial features and y\mathbf{y} is the vector of observed values, the RSS is defined as:

RSS=(yVβ)T(yVβ)RSS = (\mathbf{y} - \mathbf{V}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{V}\boldsymbol{\beta})

To find the minimum of RSSRSS, we take its derivative with respect to β\boldsymbol{\beta} and set it to zero. This process involves the following steps:

  1. Expand the RSS equation:

    RSS=yTyyTVββTVTy+βTVTVβRSS = \mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{V}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{V}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{V}^T\mathbf{V}\boldsymbol{\beta}
  2. Take the derivative of RSS with respect to β\boldsymbol{\beta}:

    RSSβ=2VTy+2VTVβ\frac{\partial RSS}{\partial \boldsymbol{\beta}} = -2\mathbf{V}^T\mathbf{y} + 2\mathbf{V}^T\mathbf{V}\boldsymbol{\beta}
  3. Set the derivative to zero and solve for β\boldsymbol{\beta}:

    VTVβ=VTy\mathbf{V}^T\mathbf{V}\boldsymbol{\beta} = \mathbf{V}^T\mathbf{y}
    β=(VTV)1VTy\boldsymbol{\beta} = (\mathbf{V}^T\mathbf{V})^{-1}\mathbf{V}^T\mathbf{y}

This final equation, β=(VTV)1VTy\boldsymbol{\beta} = (\mathbf{V}^T\mathbf{V})^{-1}\mathbf{V}^T\mathbf{y}, is known as the normal equation. It provides a direct method to compute the coefficients β\boldsymbol{\beta} that minimize the RSSRSS, and thus, the error between the predicted and observed values.

Detailed Computation Steps

Normal Equation Computation

  1. Generate the Vandermonde Matrix V\mathbf{V} for your data.
  2. Compute VTV\mathbf{V}^T\mathbf{V}: Multiply the transpose of V\mathbf{V} by V\mathbf{V}.
  3. Compute (VTV)1(\mathbf{V}^T\mathbf{V})^{-1}: Find the inverse of the matrix obtained in step 2.
  4. Compute VTy\mathbf{V}^T\mathbf{y}: Multiply the transpose of V\mathbf{V} by the response vector y\mathbf{y}.
  5. Calculate β\boldsymbol{\beta}: Multiply the matrix from step 3 by the vector obtained in step 4.

Polynomial Feature Generation

To generate polynomial features and the Vandermonde matrix in practice, one often uses computational tools or libraries that automate this process, especially for datasets with multiple features and higher-degree polynomials.

Implementation

Parameters

  • deg: int, default = 2
    Degree of a polynomial function
  • alpha: float, default = 1.0
    Regularization strength
  • l1_ratio: float, default = 0.5
    Balancing parameter between L1 and L2 regularization

Examples

from luma.regressor.poly import PolynomialRegressor
from luma.model_selection.search import RandomizedSearchCV
from luma.metric.regression import RSquaredScore
from luma.visual.evaluation import ResidualPlot

import matplotlib.pyplot as plt
import numpy as np

X = np.linspace(0.1, 3, 200).reshape(-1, 1)
y = (np.cos(X**2) * np.log(X)).flatten() + 0.2 * np.random.randn(200)

param_dist = {
    "deg": range(2, 10),
    "alpha": np.logspace(-3, 3, 5),
    "l1_ratio": np.linspace(0, 1, 5),
    "regularization": ["l1", "l2", "elastic-net"],
}

rand = RandomizedSearchCV(
    estimator=PolynomialRegressor(),
    param_dist=param_dist,
    max_iter=100,
    cv=5,
    metric=RSquaredScore,
    maximize=True,
    refit=True,
    shuffle=True,
    random_state=42,
)
rand.fit(X, y)
print(rand.best_params, rand.best_score)
reg = rand.best_model

est_func = r""
for i, coef in enumerate(reg.coef_):
    est_func = f"+({coef:.2f})x^{i}" + est_func
print(est_func)

fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)

ax1.scatter(X, y, s=10, c="black", alpha=0.4)
ax1.plot(X, reg.predict(X), lw=2, c="b", alpha=0.7, label="Predicted Plot")
ax1.fill_between(X.flatten(), y, reg.predict(X), color="b", alpha=0.1)
ax1.set_xlabel("x")
ax1.set_ylabel("y")
ax1.set_title(
    f"{type(reg).__name__} Result ["
    + r"$R^2$"
    + f": {reg.score(X, y, metric=RSquaredScore):.4f}]"
)
ax1.legend()
ax1.grid(alpha=0.2)

res = ResidualPlot(reg, X, y)
res.plot(ax=ax2, show=True)

  • Predicted plot:
    y=1.48x717.86x6+88.24x5231.27x4+347.48x3300.51x2+139.62x27.29y=1.48x^7-17.86x^6+88.24x^5-231.27x^4+347.48x^3-300.51x^2+139.62x-27.29

Applications

Polynomial regression is widely used in fields such as:

  • Economics: For modeling economic growth patterns or the impact of policy changes.
  • Engineering: In signal processing and the analysis of stress-strain curves.
  • Environmental Science: Modeling the relationship between environmental factors and biological responses.

Strengths and Limitations

Strengths

  • Flexibility: Can model complex, nonlinear relationships between dependent and independent variables.
  • Simplicity: Despite being able to fit nonlinear patterns, it remains a linear model, making it relatively straightforward to analyze and interpret.

Limitations

  • Overfitting: Higher-degree polynomials can lead to overfitting the training data, capturing noise rather than the underlying trend.
  • Extrapolation: Polynomial regression models can exhibit extreme behavior outside the range of the training data, making predictions less reliable.
  • Computation: The inversion of VTV\mathbf{V}^T\mathbf{V} can be computationally expensive and numerically unstable for high-degree polynomials or large datasets.
profile
𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀

0개의 댓글