[Regressor] Inverse Gaussian Regression

안암동컴맹·2024년 4월 6일

Machine Learning

목록 보기

87/103

Inverse Gaussian Regression

Introduction

Inverse Gaussian Regression is a specialized regression model used for positive continuous outcomes, particularly when the data exhibit a long tail to the right, such as in the case of time until an event or repair times. It assumes that the response variable follows an Inverse Gaussian distribution, making it a powerful tool for analyzing duration and rate data.

Background and Theory

Inverse Gaussian Distribution

The Inverse Gaussian distribution, also known as the Wald distribution, is defined for positive continuous variables and is characterized by its mean ( $\mu$ ) and shape ( $\lambda$ ) parameters. Its probability density function (PDF) is:

f(y; \mu, \lambda) = \left( \frac{\lambda}{2\pi y^3} \right)^{1/2} \exp \left( -\frac{\lambda (y - \mu)^2}{2\mu^2 y} \right)

where $y > 0$ , $\mu > 0$ is the mean, and $\lambda > 0$ is the shape parameter.

Inverse Gaussian Regression Model

In this regression model, the mean of the dependent variable $Y$ is related to the predictors $X$ through a link function. The canonical link for the mean in Inverse Gaussian regression is the reciprocal link, $g(\mu) = 1/\mu$ :

\frac{1}{\mu} = \beta_0 + \beta_1X_1 + \ldots + \beta_nX_n

Optimization Process

Maximum Likelihood Estimation (MLE)

MLE is employed to estimate the model parameters by maximizing the likelihood of observing the given sample data. The likelihood function for the Inverse Gaussian distribution, given observations $(y_i, x_i)$ , is:

L(\beta, \lambda; y, X) = \prod_{i=1}^{N} \left( \frac{\lambda}{2\pi y_i^3} \right)^{1/2} \exp \left( -\frac{\lambda (y_i - \mu_i)^2}{2\mu_i^2 y_i} \right)

Log-Likelihood Function

The log-likelihood function simplifies the optimization process:

\log L(\beta, \lambda; y, X) = \frac{N}{2} \log(\lambda) - \frac{N}{2} \log(2\pi) - \frac{3}{2} \sum_{i=1}^{N} \log(y_i) - \sum_{i=1}^{N} \frac{\lambda (y_i - \mu_i)^2}{2\mu_i^2 y_i}

Gradient Calculations

The optimization of the log-likelihood with respect to parameters $\beta$ and $\lambda$ requires the calculation of gradients. Let's denote $\theta = (\beta, \lambda)$ as the parameter vector. The gradient of the log-likelihood function $\nabla_\theta \log L(\theta; y, X)$ involves partial derivatives with respect to each $\beta_j$ and $\lambda$ .

For $\beta_j$ :

\frac{\partial \log L}{\partial \beta_j} = \sum_{i=1}^{N} \left( \frac{\lambda (y_i - \mu_i)}{\mu_i^3 y_i} \right) x_{ij}

For $\lambda$ :

\frac{\partial \log L}{\partial \lambda} = \frac{N}{2\lambda} - \sum_{i=1}^{N} \frac{(y_i - \mu_i)^2}{2\mu_i^2 y_i}

Optimization via Gradient Ascent

Numerical optimization techniques, such as gradient ascent or Newton-Raphson, are used to iteratively adjust $\beta$ and $\lambda$ by moving in the direction of the gradient of the log-likelihood until convergence is achieved.

Implementation

Parameters

learning_rate: float, default = 0.01
Step size of the gradient descent update
max_iter: int, default = 100
Number of iteration
l1_ratio: float, default = 0.5
Balancing parameter of L1 and L2 in elastic-net regularization
alpha: float, defualt = 0.01
Regularization strength
phi: float, default = 1.0
Shape parameter of inverse Gaussian density
regularization: Literal['l1', 'l2', 'elastic-net'], default = None
Regularization type

Applications

Inverse Gaussian Regression is suited for:

Time-to-event data analysis: Such as survival times, failure times.
Rate data analysis: Including speed of progression of diseases or other processes.

Strengths and Limitations

Strengths

Flexibility for Skewed Data: Capable of modeling right-skewed data effectively.
Mean-Dispersion Relationship: Incorporates variability through the shape parameter, offering a nuanced understanding of data dispersion.

Limitations

Computational Complexity: Gradient calculations and optimization can be computationally intensive.
Data Requirements: Requires positive continuous data, limiting its applicability.

Advanced Considerations

Convergence Issues: Special care is needed to ensure convergence in the optimization process, especially for data that are highly skewed or have outliers.
Software Implementations: Several statistical software packages offer built-in functions for Inverse Gaussian regression, abstracting away the complexities of MLE and gradient calculations.

Conclusion

Inverse Gaussian Regression provides a robust framework for analyzing positive continuous data, particularly where the distribution is skewed. Through the use of MLE and careful gradient calculations, it offers detailed insights into the underlying processes governing the data, making it a valuable tool in statistical modeling and data analysis.

References

McCullagh, Peter, and John Nelder. "Generalized Linear Models." Chapman & Hall/CRC, 1989.

Whitmore, G.A. "Estimation of the Parameters of the Inverse Gaussian Distribution by Maximum Likelihood." Technometrics, vol. 17, no. 2, 1975, pp. 215-221.

안암동컴맹

𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀