[Regressor] Inverse Gaussian Regression

안암동컴맹·2024년 4월 6일
0

Machine Learning

목록 보기
87/103

Inverse Gaussian Regression

Introduction

Inverse Gaussian Regression is a specialized regression model used for positive continuous outcomes, particularly when the data exhibit a long tail to the right, such as in the case of time until an event or repair times. It assumes that the response variable follows an Inverse Gaussian distribution, making it a powerful tool for analyzing duration and rate data.

Background and Theory

Inverse Gaussian Distribution

The Inverse Gaussian distribution, also known as the Wald distribution, is defined for positive continuous variables and is characterized by its mean (μ\mu) and shape (λ\lambda) parameters. Its probability density function (PDF) is:

f(y;μ,λ)=(λ2πy3)1/2exp(λ(yμ)22μ2y)f(y; \mu, \lambda) = \left( \frac{\lambda}{2\pi y^3} \right)^{1/2} \exp \left( -\frac{\lambda (y - \mu)^2}{2\mu^2 y} \right)

where y>0y > 0, μ>0\mu > 0 is the mean, and λ>0\lambda > 0 is the shape parameter.

Inverse Gaussian Regression Model

In this regression model, the mean of the dependent variable YY is related to the predictors XX through a link function. The canonical link for the mean in Inverse Gaussian regression is the reciprocal link, g(μ)=1/μg(\mu) = 1/\mu:

1μ=β0+β1X1++βnXn\frac{1}{\mu} = \beta_0 + \beta_1X_1 + \ldots + \beta_nX_n

Optimization Process

Maximum Likelihood Estimation (MLE)

MLE is employed to estimate the model parameters by maximizing the likelihood of observing the given sample data. The likelihood function for the Inverse Gaussian distribution, given observations (yi,xi)(y_i, x_i), is:

L(β,λ;y,X)=i=1N(λ2πyi3)1/2exp(λ(yiμi)22μi2yi)L(\beta, \lambda; y, X) = \prod_{i=1}^{N} \left( \frac{\lambda}{2\pi y_i^3} \right)^{1/2} \exp \left( -\frac{\lambda (y_i - \mu_i)^2}{2\mu_i^2 y_i} \right)

Log-Likelihood Function

The log-likelihood function simplifies the optimization process:

logL(β,λ;y,X)=N2log(λ)N2log(2π)32i=1Nlog(yi)i=1Nλ(yiμi)22μi2yi\log L(\beta, \lambda; y, X) = \frac{N}{2} \log(\lambda) - \frac{N}{2} \log(2\pi) - \frac{3}{2} \sum_{i=1}^{N} \log(y_i) - \sum_{i=1}^{N} \frac{\lambda (y_i - \mu_i)^2}{2\mu_i^2 y_i}

Gradient Calculations

The optimization of the log-likelihood with respect to parameters β\beta and λ\lambda requires the calculation of gradients. Let's denote θ=(β,λ)\theta = (\beta, \lambda) as the parameter vector. The gradient of the log-likelihood function θlogL(θ;y,X)\nabla_\theta \log L(\theta; y, X) involves partial derivatives with respect to each βj\beta_j and λ\lambda.

For βj\beta_j:

logLβj=i=1N(λ(yiμi)μi3yi)xij\frac{\partial \log L}{\partial \beta_j} = \sum_{i=1}^{N} \left( \frac{\lambda (y_i - \mu_i)}{\mu_i^3 y_i} \right) x_{ij}

For λ\lambda:

logLλ=N2λi=1N(yiμi)22μi2yi\frac{\partial \log L}{\partial \lambda} = \frac{N}{2\lambda} - \sum_{i=1}^{N} \frac{(y_i - \mu_i)^2}{2\mu_i^2 y_i}

Optimization via Gradient Ascent

Numerical optimization techniques, such as gradient ascent or Newton-Raphson, are used to iteratively adjust β\beta and λ\lambda by moving in the direction of the gradient of the log-likelihood until convergence is achieved.

Implementation

Parameters

  • learning_rate: float, default = 0.01
    Step size of the gradient descent update
  • max_iter: int, default = 100
    Number of iteration
  • l1_ratio: float, default = 0.5
    Balancing parameter of L1 and L2 in elastic-net regularization
  • alpha: float, defualt = 0.01
    Regularization strength
  • phi: float, default = 1.0
    Shape parameter of inverse Gaussian density
  • regularization: Literal['l1', 'l2', 'elastic-net'], default = None
    Regularization type

Applications

Inverse Gaussian Regression is suited for:

  • Time-to-event data analysis: Such as survival times, failure times.
  • Rate data analysis: Including speed of progression of diseases or other processes.

Strengths and Limitations

Strengths

  • Flexibility for Skewed Data: Capable of modeling right-skewed data effectively.
  • Mean-Dispersion Relationship: Incorporates variability through the shape parameter, offering a nuanced understanding of data dispersion.

Limitations

  • Computational Complexity: Gradient calculations and optimization can be computationally intensive.
  • Data Requirements: Requires positive continuous data, limiting its applicability.

Advanced Considerations

  • Convergence Issues: Special care is needed to ensure convergence in the optimization process, especially for data that are highly skewed or have outliers.
  • Software Implementations: Several statistical software packages offer built-in functions for Inverse Gaussian regression, abstracting away the complexities of MLE and gradient calculations.

Conclusion

Inverse Gaussian Regression provides a robust framework for analyzing positive continuous data, particularly where the distribution is skewed. Through the use of MLE and careful gradient calculations, it offers detailed insights into the underlying processes governing the data, making it a valuable tool in statistical modeling and data analysis.

References

  1. McCullagh, Peter, and John Nelder. "Generalized Linear Models." Chapman & Hall/CRC, 1989.
  2. Whitmore, G.A. "Estimation of the Parameters of the Inverse Gaussian Distribution by Maximum Likelihood." Technometrics, vol. 17, no. 2, 1975, pp. 215-221.
profile
𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀

0개의 댓글