[Regressor] Negative Binomial Regression

안암동컴맹·2024년 4월 6일

Machine Learning

목록 보기

84/103

Negative Binomial Regression

Introduction

Negative Binomial Regression offers a robust alternative to Poisson regression for modeling count data, particularly when the data exhibits overdispersion, meaning the variance is greater than the mean. This method allows for a more accurate modeling of count variables by introducing an additional parameter to account for overdispersion, making it invaluable in various scientific and analytical applications where count data does not conform to the strict assumptions of the Poisson distribution.

Background and Theory

The Challenge of Overdispersion

Overdispersion in count data challenges traditional Poisson regression, which assumes that the mean and variance of the distribution are equal. This assumption often does not hold in real-world data, leading to underestimation of standard errors and potentially misleading inferences.

Negative Binomial Distribution

The Negative Binomial distribution extends the Poisson by introducing an overdispersion parameter, allowing the variance to exceed the mean. Its probability mass function is defined as:

P(Y = y) = \binom{y + r - 1}{y} \left(\frac{p}{1-p}\right)^y (1-p)^r

where $y$ represents the number of failures before achieving $r$ successes, and $p$ is the probability of a success.

Mathematical Formulation

Negative Binomial Regression links the mean of the count variable $Y$ to predictors $X$ using a log-link function:

\log(\mu) = \beta_0 + \beta_1X_1 + \ldots + \beta_nX_n

with $\mu = \mathbb{E}[Y|X]$ and the variance given by:

Var(Y|X) = \mu + \alpha \mu^2

where $\alpha$ is the overdispersion parameter.

Optimization Process

Formulating the Likelihood Function

The likelihood function for a Negative Binomial model with data $(y_i, x_i), i = 1, \ldots, N$ , where $y_i$ is the observed count and $x_i$ is the vector of explanatory variables for observation $i$ , is constructed based on the Negative Binomial probability distribution:

L(\beta, \alpha; y, X) = \prod_{i=1}^{N} \frac{\Gamma(y_i + 1/\alpha)}{\Gamma(1/\alpha) y_i!} \left(\frac{1}{1 + \alpha \mu_i}\right)^{1/\alpha} \left(\frac{\alpha \mu_i}{1 + \alpha \mu_i}\right)^{y_i}

where $\mu_i = \exp(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_n x_{in})$ represents the mean count for the $i$ -th observation, $\Gamma$ is the gamma function, $\beta$ are the coefficients to be estimated, and $\alpha$ is the overdispersion parameter.

Maximizing the Likelihood

To find the values of $\beta$ and $\alpha$ that maximize $L(\beta, \alpha; y, X)$ , partial derivatives of the log-likelihood function with respect to these parameters are taken and set to zero. However, due to the complexity of the likelihood function for the Negative Binomial distribution, analytical solutions are infeasible. Thus, numerical optimization techniques, such as Newton-Raphson or gradient descent, are employed.

The log-likelihood function is given by:

\log L(\beta, \alpha; y, X) = \sum_{i=1}^{N} \log \left[ \frac{\Gamma(y_i + 1/\alpha)}{\Gamma(1/\alpha) y_i!} \left(\frac{1}{1 + \alpha \mu_i}\right)^{1/\alpha} \left(\frac{\alpha \mu_i}{1 + \alpha \mu_i}\right)^{y_i} \right]

Gradient and Hessian

The gradient (the vector of first derivatives) and the Hessian (the matrix of second derivatives) of the log-likelihood function with respect to $\beta$ and $\alpha$ are used in these optimization techniques to iteratively adjust the parameter estimates until the maximum likelihood estimates are found.

Implementation

Parameters

learning_rate: float, default = 0.01
Step size of the gradient descent update
max_iter: int, default = 100
Number of iteration
l1_ratio: float, default = 0.5
Balancing parameter of L1 and L2 in elastic-net regularization
alpha: float, defualt = 0.01
Regularization strength
regularization: Literal['l1', 'l2', 'elastic-net'], default = None
Regularization type

Applications

Negative Binomial Regression is widely applicable in scenarios where count data exhibit overdispersion:

Public Health: Modeling the number of disease incidents in different populations.
Insurance: Predicting the number of claims filed by policyholders.
Ecological Studies: Counting the number of species or events within a given area or time period.

Strengths and Limitations

Strengths

Flexibility: Can accurately model data with overdispersion.
Interpretability: Coefficients can be directly interpreted in terms of the expected change in the log-count for a unit change in predictors.

Limitations

Complexity: More parameters to estimate compared to Poisson regression, which may require more data and computational power.
Assumptions: Still relies on assumptions that may not hold for all count data, such as the form of the mean-variance relationship.

Advanced Topics

Model Diagnostics and Validation: Techniques for assessing the fit and predictive accuracy of Negative Binomial models.
Zero-Inflated Negative Binomial Regression: Addressing datasets with an excess of zeros by combining Negative Binomial distribution for count data and a binary distribution for the occurrence of zeros.

References

Gardner, William, et al. "Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models." Psychological Bulletin, vol. 118, no. 3, 1995, pp. 392–404.

Hilbe, Joseph M. "Negative Binomial Regression." Cambridge University Press, 2011.

안암동컴맹

𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀