Negative Binomial Regression offers a robust alternative to Poisson regression for modeling count data, particularly when the data exhibits overdispersion, meaning the variance is greater than the mean. This method allows for a more accurate modeling of count variables by introducing an additional parameter to account for overdispersion, making it invaluable in various scientific and analytical applications where count data does not conform to the strict assumptions of the Poisson distribution.
Overdispersion in count data challenges traditional Poisson regression, which assumes that the mean and variance of the distribution are equal. This assumption often does not hold in real-world data, leading to underestimation of standard errors and potentially misleading inferences.
The Negative Binomial distribution extends the Poisson by introducing an overdispersion parameter, allowing the variance to exceed the mean. Its probability mass function is defined as:
where represents the number of failures before achieving successes, and is the probability of a success.
Negative Binomial Regression links the mean of the count variable to predictors using a log-link function:
with and the variance given by:
where is the overdispersion parameter.
The likelihood function for a Negative Binomial model with data , where is the observed count and is the vector of explanatory variables for observation , is constructed based on the Negative Binomial probability distribution:
where represents the mean count for the -th observation, is the gamma function, are the coefficients to be estimated, and is the overdispersion parameter.
To find the values of and that maximize , partial derivatives of the log-likelihood function with respect to these parameters are taken and set to zero. However, due to the complexity of the likelihood function for the Negative Binomial distribution, analytical solutions are infeasible. Thus, numerical optimization techniques, such as Newton-Raphson or gradient descent, are employed.
The log-likelihood function is given by:
The gradient (the vector of first derivatives) and the Hessian (the matrix of second derivatives) of the log-likelihood function with respect to and are used in these optimization techniques to iteratively adjust the parameter estimates until the maximum likelihood estimates are found.
learning_rate
: float
, default = 0.01max_iter
: int
, default = 100l1_ratio
: float
, default = 0.5alpha
: float
, defualt = 0.01regularization
: Literal['l1', 'l2', 'elastic-net']
, default = NoneNegative Binomial Regression is widely applicable in scenarios where count data exhibit overdispersion:
- Gardner, William, et al. "Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models." Psychological Bulletin, vol. 118, no. 3, 1995, pp. 392–404.
- Hilbe, Joseph M. "Negative Binomial Regression." Cambridge University Press, 2011.