Gamma Regression is utilized for modeling positive continuous data with skewed distributions, often applied in fields such as insurance, medical research, and economics. It is particularly suited for data that are not only positive but also exhibit variability not conforming to normal distributions. This regression model assumes that the dependent variable follows a Gamma distribution, a flexible family of distributions that can capture a wide range of data shapes.
The Gamma distribution is defined for positive continuous variables and is characterized by its shape () and scale () parameters. The probability density function (PDF) is given by:
where , is the shape parameter, is the scale parameter, and is the gamma function evaluated at .
In Gamma regression, the response variable is assumed to follow a Gamma distribution. The mean of , , is linked to a linear combination of explanatory variables through a link function, commonly the natural logarithm, ensuring that predictions remain positive:
MLE is used to estimate the parameters () of the Gamma regression model. The likelihood function for a set of parameters given the observed data is constructed based on the joint PDF of the observed responses, which, under the assumption of independence, is the product of individual Gamma PDFs.
The likelihood function for the Gamma regression model, given independent observations , is:
where each is related to the mean (a function of and ) and the shape parameter is typically assumed to be constant across observations.
The log-likelihood function, which is more tractable for optimization purposes, is given by:
Notably, is a function of the explanatory variables and the parameters to be estimated.
The optimization of the log-likelihood function involves calculating its gradient with respect to the parameters and iteratively adjusting these parameters to find the maximum log-likelihood. Due to the complexity of the Gamma distribution's log-likelihood function, numerical methods such as gradient ascent or quasi-Newton methods are employed to estimate the parameters.
alpha
: float
, default = 1.0beta
: float
, default = 1.0learning_rate
: float
, default = 0.01max_iter
: int
, default = 100l1_ratio
: float
, default = 0.5reg_strength
: float
, defualt = 0.01regularization
: Literal['l1', 'l2', 'elastic-net']
, default = NoneGamma regression is widely used in scenarios where the dependent variable is strictly positive and highly skewed, such as:
Gamma Regression provides a robust framework for modeling positive continuous data, especially when dealing with skewed distributions. The use of MLE for parameter estimation, while computationally demanding, allows for precise model fitting, making Gamma Regression a powerful tool in statistical analysis and prediction across various fields.
- McCullagh, Peter, and John Nelder. "Generalized Linear Models." Chapman & Hall/CRC, 1989.
- Dunn, Peter K., and Gordon K. Smyth. "Generalized Linear Models With Examples in R." Springer, 2018.