[Regressor] Negative Binomial Regression

안암동컴맹·2024년 4월 6일
0

Machine Learning

목록 보기
84/103

Negative Binomial Regression

Introduction

Negative Binomial Regression offers a robust alternative to Poisson regression for modeling count data, particularly when the data exhibits overdispersion, meaning the variance is greater than the mean. This method allows for a more accurate modeling of count variables by introducing an additional parameter to account for overdispersion, making it invaluable in various scientific and analytical applications where count data does not conform to the strict assumptions of the Poisson distribution.

Background and Theory

The Challenge of Overdispersion

Overdispersion in count data challenges traditional Poisson regression, which assumes that the mean and variance of the distribution are equal. This assumption often does not hold in real-world data, leading to underestimation of standard errors and potentially misleading inferences.

Negative Binomial Distribution

The Negative Binomial distribution extends the Poisson by introducing an overdispersion parameter, allowing the variance to exceed the mean. Its probability mass function is defined as:

P(Y=y)=(y+r1y)(p1p)y(1p)rP(Y = y) = \binom{y + r - 1}{y} \left(\frac{p}{1-p}\right)^y (1-p)^r

where yy represents the number of failures before achieving rr successes, and pp is the probability of a success.

Mathematical Formulation

Negative Binomial Regression links the mean of the count variable YY to predictors XX using a log-link function:

log(μ)=β0+β1X1++βnXn\log(\mu) = \beta_0 + \beta_1X_1 + \ldots + \beta_nX_n

with μ=E[YX]\mu = \mathbb{E}[Y|X] and the variance given by:

Var(YX)=μ+αμ2Var(Y|X) = \mu + \alpha \mu^2

where α\alpha is the overdispersion parameter.

Optimization Process

Formulating the Likelihood Function

The likelihood function for a Negative Binomial model with data (yi,xi),i=1,,N(y_i, x_i), i = 1, \ldots, N, where yiy_i is the observed count and xix_i is the vector of explanatory variables for observation ii, is constructed based on the Negative Binomial probability distribution:

L(β,α;y,X)=i=1NΓ(yi+1/α)Γ(1/α)yi!(11+αμi)1/α(αμi1+αμi)yiL(\beta, \alpha; y, X) = \prod_{i=1}^{N} \frac{\Gamma(y_i + 1/\alpha)}{\Gamma(1/\alpha) y_i!} \left(\frac{1}{1 + \alpha \mu_i}\right)^{1/\alpha} \left(\frac{\alpha \mu_i}{1 + \alpha \mu_i}\right)^{y_i}

where μi=exp(β0+β1xi1++βnxin)\mu_i = \exp(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_n x_{in}) represents the mean count for the ii-th observation, Γ\Gamma is the gamma function, β\beta are the coefficients to be estimated, and α\alpha is the overdispersion parameter.

Maximizing the Likelihood

To find the values of β\beta and α\alpha that maximize L(β,α;y,X)L(\beta, \alpha; y, X), partial derivatives of the log-likelihood function with respect to these parameters are taken and set to zero. However, due to the complexity of the likelihood function for the Negative Binomial distribution, analytical solutions are infeasible. Thus, numerical optimization techniques, such as Newton-Raphson or gradient descent, are employed.

The log-likelihood function is given by:

logL(β,α;y,X)=i=1Nlog[Γ(yi+1/α)Γ(1/α)yi!(11+αμi)1/α(αμi1+αμi)yi]\log L(\beta, \alpha; y, X) = \sum_{i=1}^{N} \log \left[ \frac{\Gamma(y_i + 1/\alpha)}{\Gamma(1/\alpha) y_i!} \left(\frac{1}{1 + \alpha \mu_i}\right)^{1/\alpha} \left(\frac{\alpha \mu_i}{1 + \alpha \mu_i}\right)^{y_i} \right]

Gradient and Hessian

The gradient (the vector of first derivatives) and the Hessian (the matrix of second derivatives) of the log-likelihood function with respect to β\beta and α\alpha are used in these optimization techniques to iteratively adjust the parameter estimates until the maximum likelihood estimates are found.

Implementation

Parameters

  • learning_rate: float, default = 0.01
    Step size of the gradient descent update
  • max_iter: int, default = 100
    Number of iteration
  • l1_ratio: float, default = 0.5
    Balancing parameter of L1 and L2 in elastic-net regularization
  • alpha: float, defualt = 0.01
    Regularization strength
  • regularization: Literal['l1', 'l2', 'elastic-net'], default = None
    Regularization type

Applications

Negative Binomial Regression is widely applicable in scenarios where count data exhibit overdispersion:

  • Public Health: Modeling the number of disease incidents in different populations.
  • Insurance: Predicting the number of claims filed by policyholders.
  • Ecological Studies: Counting the number of species or events within a given area or time period.

Strengths and Limitations

Strengths

  • Flexibility: Can accurately model data with overdispersion.
  • Interpretability: Coefficients can be directly interpreted in terms of the expected change in the log-count for a unit change in predictors.

Limitations

  • Complexity: More parameters to estimate compared to Poisson regression, which may require more data and computational power.
  • Assumptions: Still relies on assumptions that may not hold for all count data, such as the form of the mean-variance relationship.

Advanced Topics

  • Model Diagnostics and Validation: Techniques for assessing the fit and predictive accuracy of Negative Binomial models.
  • Zero-Inflated Negative Binomial Regression: Addressing datasets with an excess of zeros by combining Negative Binomial distribution for count data and a binary distribution for the occurrence of zeros.

References

  1. Gardner, William, et al. "Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models." Psychological Bulletin, vol. 118, no. 3, 1995, pp. 392–404.
  2. Hilbe, Joseph M. "Negative Binomial Regression." Cambridge University Press, 2011.
profile
𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀

0개의 댓글