이 시리즈는 포항공과대학교 옥정슬 교수님의 기계학습(CSED515) 수업과 [Probabilistic Machine Learning: Advanced Topics] 등을 참고로 하여 작성된 글입니다.

지난 포스트에서는 다양한 Multivariate distribution(다변량 분포)과, Gaussian joint distribution을 다뤄봤습니다. 이번에는 Exponential family에 대해 종합적으로 알아보고자 합니다.

Exponential Family

Definition

지수족 분포는 일반적으로 다음과 같은 형태로 표현됩니다:

p(xη)=h(x)exp(ηTT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta^T T(x) - A(\eta) \right)

여기서,

  • η\eta는 natural parameter(자연 파라미터터) 벡터입니다.
  • T(x)T(x)는 sufficient statistic(충분 통계량) 벡터입니다.
  • h(x)h(x)는 base measure 함수입니다.
  • A(η)A(\eta)는 normalizer(정규화 상수) 또는 log-partition function로, 모든 xx에 대해 확률 분포가 합이 1이 되도록 합니다.

e.g., Bernoulli distribution

  • 확률 질량 함수(PMF):
    p(xθ)=θx(1θ)1x,x{0,1}p(x|\theta) = \theta^x (1 - \theta)^{1 - x}, \quad x \in \{0,1\}
  • Exponential family
    p(xη)=exp(xηlog(1+eη))p(x|\eta) = \exp\left( x \eta - \log(1 + e^\eta) \right)
    여기서,
    • natural parameter: η=log(θ1θ)\eta = \log\left(\frac{\theta}{1 - \theta}\right)
    • sufficient statistic: T(x)=xT(x) = x
    • base measure: h(x)=1h(x) = 1
    • log-partition function: A(η)=log(1+eη)A(\eta) = \log(1 + e^\eta)

e.g., Binomial distribution

  • 확률 질량 함수(PMF):
    p(xn,θ)=(nx)θx(1θ)nx,x{0,1,,n}p(x|n,\theta) = \binom{n}{x} \theta^x (1 - \theta)^{n - x}, \quad x \in \{0,1,\ldots,n\}
  • Exponential family
    p(xη)=h(x)exp(ηT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta T(x) - A(\eta) \right)
    여기서,
    • natural parameter: η=log(θ1θ)\eta = \log\left(\frac{\theta}{1 - \theta}\right)
    • sufficient statistic: T(x)=xT(x) = x
    • base measure: h(x)=(nx)h(x) = \binom{n}{x}
    • log-partition function: A(η)=nlog(1+eη)A(\eta) = n \log(1 + e^\eta)

e.g., Gaussian distribution

  • 확률 밀도 함수(PDF):
    p(xμ,σ2)=12πσ2exp((xμ)22σ2)p(x|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)
  • Exponential family
    p(xη)=h(x)exp(ηTT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta^T T(x) - A(\eta) \right)
    여기서,
    • natural parameter: η=(μσ2,12σ2)T\eta = \left( \frac{\mu}{\sigma^2}, -\frac{1}{2\sigma^2} \right)^T
    • sufficient statistic: T(x)=(xx2)T(x) = \begin{pmatrix} x \\ x^2 \end{pmatrix}
    • base measure: h(x)=12πh(x) = \frac{1}{\sqrt{2\pi}}
    • log-partition function: A(η)=μ22σ2+12log(2πσ2)A(\eta) = \frac{\mu^2}{2\sigma^2} + \frac{1}{2} \log(2\pi\sigma^2)

e.g., Exponential distribution

  • 확률 밀도 함수(PDF):
    p(xλ)=λeλx,x0p(x|\lambda) = \lambda e^{-\lambda x}, \quad x \geq 0
  • Exponential family
    p(xη)=h(x)exp(ηT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta T(x) - A(\eta) \right)
    여기서,
    • natural parameter: η=λ\eta = -\lambda
    • sufficient statistic: T(x)=xT(x) = x
    • base measure: h(x)=λh(x) = \lambda
    • log-partition function: A(η)=log(η)A(\eta) = -\log(-\eta)

e.g., Gamma distribution

  • 확률 밀도 함수(PDF):
    p(xα,β)=βαΓ(α)xα1eβx,x>0p(x|\alpha,\beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0
  • Exponential family
    p(xη)=h(x)exp(ηTT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta^T T(x) - A(\eta) \right)
    여기서,
    • natural parameter: η=(α1β)\eta = \begin{pmatrix} \alpha - 1 \\ -\beta \end{pmatrix}
    • sufficient statistic: T(x)=(logxx)T(x) = \begin{pmatrix} \log x \\ x \end{pmatrix}
    • base measure: h(x)=1Γ(α)h(x) = \frac{1}{\Gamma(\alpha)}
    • log-partition function: A(η)=αlogβlogΓ(α)A(\eta) = \alpha \log \beta - \log \Gamma(\alpha)

e.g., Chi-squared distribution

  • 확률 밀도 함수(PDF):
    p(xk)=12k/2Γ(k/2)xk/21ex/2,x>0p(x|k) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2}, \quad x > 0
  • Exponential family
    p(xη)=h(x)exp(ηT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta T(x) - A(\eta) \right)
    여기서,
    • natural parameter: η=(k2112)\eta = \begin{pmatrix} \frac{k}{2} - 1 \\ -\frac{1}{2} \end{pmatrix}
    • sufficient statistic: T(x)=(logxx)T(x) = \begin{pmatrix} \log x \\ x \end{pmatrix}
    • base measure: h(x)=12k/2Γ(k/2)h(x) = \frac{1}{2^{k/2} \Gamma(k/2)}
    • log-partition function: A(η)=0A(\eta) = 0 (base measure가 모든 필요한 정규화를 포함)

e.g., Poisson distribution

  • 확률 질량 함수(PMF):
    p(xλ)=λxeλx!,x{0,1,2,}p(x|\lambda) = \frac{\lambda^x e^{-\lambda}}{x!}, \quad x \in \{0,1,2,\ldots\}
  • Exponential family
    p(xη)=h(x)exp(ηxA(η))p(x|\eta) = h(x) \exp\left( \eta x - A(\eta) \right)
    여기서,
    • natural parameter: η=logλ\eta = \log \lambda
    • sufficient statistic: T(x)=xT(x) = x
    • base measure: h(x)=1x!h(x) = \frac{1}{x!}
    • log-partition function: A(η)=λ=eηA(\eta) = \lambda = e^\eta

e.g., Beta distribution

  • 확률 밀도 함수(PDF):
    p(xα,β)=xα1(1x)β1B(α,β),0x1p(x|\alpha,\beta) = \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1
    여기서 B(α,β)B(\alpha, \beta)는 beta function입니다.
  • Exponential family
    p(xη)=h(x)exp(ηTT(x)A(η))p(x|\eta) = h(x) \exp\left( \eta^T T(x) - A(\eta) \right)
    여기서,
    • natural parameter: η=(α1β1)\eta = \begin{pmatrix} \alpha - 1 \\ \beta - 1 \end{pmatrix}
    • sufficient statistic: T(x)=(logxlog(1x))T(x) = \begin{pmatrix} \log x \\ \log(1 - x) \end{pmatrix}
    • base measure: h(x)=1h(x) = 1
    • log-partition function: A(η)=logB(α,β)=logΓ(α)+logΓ(β)logΓ(α+β)A(\eta) = \log B(\alpha, \beta) = \log \Gamma(\alpha) + \log \Gamma(\beta) - \log \Gamma(\alpha + \beta)

e.g., Dirichlet distribution

  • 확률 밀도 함수(PDF):
    p(xα)=1B(α)i=1Kxiαi1,xΔK1p(\mathbf{x}|\boldsymbol{\alpha}) = \frac{1}{B(\boldsymbol{\alpha})} \prod_{i=1}^K x_i^{\alpha_i - 1}, \quad \mathbf{x} \in \Delta^{K-1}
    ΔK1\Delta^{K-1}(x1,,xK)(x_1, \ldots, x_K)가 모든 xi0x_i \geq 0이고 i=1Kxi=1\sum_{i=1}^K x_i = 1
  • 지수족 형태으로 재구성:
    p(xη)=h(x)exp(ηTT(x)A(η))p(\mathbf{x}|\eta) = h(\mathbf{x}) \exp\left( \eta^T T(\mathbf{x}) - A(\eta) \right)
    여기서,
    • natural parameter: η=α1\eta = \boldsymbol{\alpha} - 1
    • sufficient statistic: T(x)=xT(\mathbf{x}) = \mathbf{x}
    • base measure: h(x)=1B(α)h(\mathbf{x}) = \frac{1}{B(\boldsymbol{\alpha})}
    • log-partition function: A(η)=i=1KlogΓ(αi)logΓ(i=1Kαi)A(\eta) = \sum_{i=1}^K \log \Gamma(\alpha_i) - \log \Gamma\left(\sum_{i=1}^K \alpha_i\right)

Summary of Natural Parameters and Sufficient Statistics

정리를 좀 해보면 다음과 같습니다.

분포natural parameter (η\eta)sufficient statistic (T(x)T(x))
Bernoulliη=log(θ1θ)\eta = \log\left(\frac{\theta}{1 - \theta}\right)T(x)=xT(x) = x
Binomialη=log(θ1θ)\eta = \log\left(\frac{\theta}{1 - \theta}\right)T(x)=xT(x) = x
Gaussianη=(μσ2,12σ2)T\eta = \left( \frac{\mu}{\sigma^2}, -\frac{1}{2\sigma^2} \right)^TT(x)=(xx2)T(x) = \begin{pmatrix} x \\ x^2 \end{pmatrix}
Exponentialη=λ\eta = -\lambdaT(x)=xT(x) = x
Gammaη=(α1β)\eta = \begin{pmatrix} \alpha - 1 \\ -\beta \end{pmatrix}T(x)=(logxx)T(x) = \begin{pmatrix} \log x \\ x \end{pmatrix}
Chi-squaredη=(k2112)\eta = \begin{pmatrix} \frac{k}{2} - 1 \\ -\frac{1}{2} \end{pmatrix}T(x)=(logxx)T(x) = \begin{pmatrix} \log x \\ x \end{pmatrix}
Poissonη=logλ\eta = \log \lambdaT(x)=xT(x) = x
Betaη=(α1β1)\eta = \begin{pmatrix} \alpha - 1 \\ \beta - 1 \end{pmatrix}T(x)=(logxlog(1x))T(x) = \begin{pmatrix} \log x \\ \log(1 - x) \end{pmatrix}
Dirichletη=α1\eta = \boldsymbol{\alpha} - 1T(x)=xT(\mathbf{x}) = \mathbf{x}

이번에는 Exponential family(지수족)의 개념과 응용에 대해 알아봤습니다. 이는 나중에 mixture model, conjugacy, estimation 등 다양한 곳에서 더 쉽게 논리를 펼칠 수 있게 해줍니다. 다른 통찰도 얻어볼 수 있을텐데, 이는 나중에 나올 때 추가적으로 어떻게 적용되는지 알아보겠습니다. 여기서는 단순히 exponential family의 canonical form을 바탕으로 어떻게 다양한 확률 분포를 변환하고 parameter를 추출해낼 수 있는지가 중요한 듯 합니다. 관련된 의미와 통찰은 추후 작성하겠습니다.

0개의 댓글