[DetnEst] 14. Statistical Decision Theory : Final

KBC·2024년 12월 11일

Detection&Estimation

Detection and Estimation

목록 보기

21/23

Overview

So far, detection under

Neyman-Pearson criteria ( max $P_D$ s.t. $P_{FA}=$ constant $)$ : likelihood ratio test, threshold set by $P_{FA}$
Minimize Bayesian risk (assign costs to decisions, have priors of the different hypothesis) : likelihood ratio test, threshold set by priors+costs
- minimum probability of error = maximum a posteriori detection
- maximum likelihood detection = minimum probability of error with equal priors
Known deterministic signals in Gaussian noise : correlators
Random signals : estimator-correlators, energy detectors

All assume knowledge of $p(x;\mathcal{H}_0)$ and $p(x;\mathcal{H}_1$ )

What if don't know the distribution of $x$ under the two hypothesis?
What if under hypothesis 0, distribution is in some set, and under hypothesis 1, this distribution lies in another set - can we distinguish between these two?

Composite Hypothesis Testing

Composite Hypothesis Testing

Signal and / or noise PDF have unknown parameters i.e. noise var., exact carrier freq., signal var.,
Composite hypothesis test : must accommodate unknown parameters
- cf. simple hypothesis test : the PDFs are completely known
- Ex) DC level in WGN with unknown amplitude $A>0$ $\mathcal{H}_0 : x[n] = w[n] \quad \text{vs} \quad \mathcal{H}_1 : x[n] = A + w[n], \quad n = 0, 1, \dots, N-1 \\[0.2cm] \bullet \; \text{NP test: decide } \mathcal{H}_1 \text{ if} \\[0.2cm] \frac{p(x; A, \mathcal{H}_1)}{p(x; \mathcal{H}_0)} = \frac{\exp \left[ -\frac{1}{2\sigma^2} \sum_{n=0}^{N-1} (x[n] - A)^2 \right]}{\exp \left[ -\frac{1}{2\sigma^2} \sum_{n=0}^{N-1} x^2[n] \right]} > \gamma, \\[0.2cm] T(x) = \frac{1}{N} \sum_{n=0}^{N-1} x[n] > \frac{\sigma^2}{NA} \ln \gamma + \frac{A}{2} = \gamma'.$
- Can we implement this detector without knowledge of the exact value of $A$ ?

The test statistic does not depend on $A$ , but it appears that the threshold $\gamma'$ does (although it does not, indeed) $T(x) \sim \begin{cases} \mathcal{N}\left(0, \frac{\sigma^2}{N}\right) & \text{under } \mathcal{H}_0 \\ \mathcal{N}\left(A, \frac{\sigma^2}{N}\right) & \text{under } \mathcal{H}_1 \end{cases} \\[0.2cm] P_{FA} = \Pr(T(x) > \gamma'; \mathcal{H}_0) = Q\left(\frac{\gamma'}{\sqrt{\sigma^2 / N}}\right), \\[0.2cm] P_D = \Pr(T(x) > \gamma'; \mathcal{H}_1) = Q\left(\frac{\gamma' - A}{\sqrt{\sigma^2 / N}}\right), \\[0.2cm] \gamma' = \sqrt{\frac{\sigma^2}{N}} Q^{-1}(P_{FA}) : \text{independent of } A \\[0.2cm] P_D = Q\left(Q^{-1}(P_{FA}) - \sqrt{\frac{NA^2}{\sigma^2}}\right) : \text{depend on the value of } A \\[0.2cm] \frac{1}{N} \sum_{n=0}^{N-1} x[n] > \sqrt{\frac{\sigma^2}{N}} Q^{-1}(P_{FA}) \; \text{yields the highest } P_D \; \text{for any value of } A > 0.$
Uniformly Most Powerful(UMP) test

Uniformly Most Powerful (UMP) tests

If $-\infty<A<\infty$ , different test for $A$ positive and negative
The hypothesis testing problem $\rightarrow$ parameter testing problem $\begin{aligned} &\mathcal{H}_0 : A = 0 \\[0.2cm] &\mathcal{H}_1 : A > 0 \quad \text{(one-sided test → UMP exists.)} \\[0.5cm] &\mathcal{H}_0 : A = 0 \\[0.2cm] &\mathcal{H}_1 : A \neq 0 \quad \text{(two-sided test → UMP test does not exist.)} \end{aligned}$
When a UMP test does not exist, we have to implement suboptimal tests
The optimal NP test, which is unrealizable, can provide an upper bound of the performance
Clairvoyant Detector : a detector assuming perfect knowledge of an unknown parameter to design the NP detector

Example of DC Level in WGN with unknown amplitude $-\infty<A<\infty$

Clairvoyant detector : decide $\mathcal{H}_1$ if $\frac{1}{N} \sum_{n=0}^{N-1} x[n] > \gamma'_+ \quad \text{for } A > 0, \\[0.2cm] \frac{1}{N} \sum_{n=0}^{N-1} x[n] < \gamma'_- \quad \text{for } A < 0.$ clearly unrealizable, but provides an uppder bound on performance
Under $\mathcal{H}_0$ , as $\bar x\sim\mathcal{N}(0,\frac{\sigma^2}{N})$ $P_{FA} = \Pr\{\bar{x} > \gamma'_+; \mathcal{H}_0\} = Q\left(\frac{\gamma'_+}{\sqrt{\frac{\sigma^2}{N}}}\right) \quad \text{if } A > 0, \\[0.2cm] P_{FA} = \Pr\{\bar{x} < \gamma'_-; \mathcal{H}_0\} = 1 - Q\left(\frac{\gamma'_-}{\sqrt{\frac{\sigma^2}{N}}}\right) = Q\left(\frac{-\gamma'_-}{\sqrt{\frac{\sigma^2}{N}}}\right) \quad \text{if } A < 0.$
For a constant $P_{FA}$ , we should choose $\gamma'_- =-\gamma'_+$
Under $\mathcal{H}_1$ , as $\bar x\sim\left(A,\frac{\sigma^2}{N}\right)$ $P_D = Q\left(\frac{\gamma'_+ - A}{\sqrt{\frac{\sigma^2}{N}}}\right) = Q\left(Q^{-1}(P_{FA}) - \sqrt{\frac{NA^2}{\sigma^2}}\right), \quad \text{if } A > 0, \\[0.2cm] P_D = 1 - Q\left(\frac{\gamma'_- - A}{\sqrt{\frac{\sigma^2}{N}}}\right) = Q\left(\frac{-\gamma'_- + A}{\sqrt{\frac{\sigma^2}{N}}}\right) = Q\left(Q^{-1}(P_{FA}) - \sqrt{\frac{NA^2}{\sigma^2}}\right), \quad \text{if } A < 0.$
A candidate detector : decide $\mathcal{H}_1$ if $|\frac{1}{N}\sum^{N-1}_{n=0}x[n]|>r''$ $P_D = Q\left(Q^{-1}\left(\frac{P_{FA}}{2}\right) - \sqrt{\frac{NA^2}{\sigma^2}}\right) + Q\left(Q^{-1}\left(\frac{P_{FA}}{2}\right) + \sqrt{\frac{NA^2}{\sigma^2}}\right).$

Composite Hypothesis Testing Approaches

Two major approaches

Bayesian approach
: to consider the unknown parameters as realizations of random variables and to assign a prior PDF
- Requires prior knowledge of the unknown parameters
- Requires multidimensional integration
Generalized likelihood ratio test(GLRT)
: to estimate the unknown parameters for use in a likelihood ratio test
- More popular due to the ease of implementation and less restricitive assumptions
- Prior knowledge is not necessary

Bayesian approach

p(x; \mathcal{H}_0) = \int p(x | \theta_0; \mathcal{H}_0) p(\theta_0) d\theta_0 \\[0.3cm] p(x; \mathcal{H}_1) = \int p(x | \theta_1; \mathcal{H}_1) p(\theta_1) d\theta_1 \\[0.3cm] \text{Decide } \mathcal{H}_1 \text{ if } \frac{p(x; \mathcal{H}_1)}{p(x; \mathcal{H}_0)} = \frac{\int p(x | \theta_1; \mathcal{H}_1) p(\theta_1) d\theta_1}{\int p(x | \theta_0; \mathcal{H}_0) p(\theta_0) d\theta_0} > \gamma.

Generalized Likelihood Ratio Test

The GLRT replaces the unknown parameters by their maximum likelihood estimators(MLEs)

There is no optimality associated with the GLRT, but it works well in practice
GLRT : Decide $\mathcal{H}_1$ if
$L_G(x) = \frac{p(x; \hat{\theta}_1, \mathcal{H}_1)}{p(x; \hat{\theta}_0, \mathcal{H}_0)} > \gamma, \\[0.3cm] \text{where } \hat{\theta}_i \text{ is the MLE of } \theta_i \text{ assuming } \mathcal{H}_i \text{ is true (maximizes } p(x; \hat{\theta}_i, \mathcal{H}_i) \text{)}.$
Example of DC Level in WGN with unknown amplitude - GLRT( $\theta_1=A$ )
$\mathcal{H}_0:A=0\\[0.2cm] \mathcal{H}_1:A\neq 0$ $\hat{A} = \bar{x} \rightarrow L_G(x) = \frac{p(x; \hat{A}, \mathcal{H}_1)}{p(x; \mathcal{H}_0)} > \gamma, \\[0.2cm] L_G(x) = \frac{\exp\left[-\frac{1}{2\sigma^2} \sum_{n=0}^{N-1} (x[n] - \bar{x})^2\right]}{\exp\left[-\frac{1}{2\sigma^2} \sum_{n=0}^{N-1} x^2[n]\right]}, \\[0.2cm] \ln L_G(x) = -\frac{1}{2\sigma^2} \left( \sum_{n=0}^{N-1} x^2[n] - 2\bar{x} \sum_{n=0}^{N-1} x[n] + N\bar{x}^2 \right) - \sum_{n=0}^{N-1} x^2[n], \\[0.2cm] = -\frac{1}{2\sigma^2} \left( -2N\bar{x}^2 + N\bar{x}^2 \right) = \frac{N\bar{x}^2}{2\sigma^2}, \\[0.2cm] \rightarrow \text{decide } \mathcal{H}_1 \text{ if } |\bar{x}| > \gamma'.$
Alternative form of GLRT
$L_G(x) = \frac{\max_{\theta_1} p(x; \theta_1, \mathcal{H}_1)}{\max_{\theta_0} p(x; \theta_0, \mathcal{H}_0)}.$
If the PDF under $\mathcal{H}_0$ is completely known,
$L_G(x) = \frac{\max_{\theta_1} p(x; \theta_1, \mathcal{H}_1)}{p(x; \mathcal{H}_0)} = \max_{\theta_1} \frac{p(x; \theta_1, \mathcal{H}_1)}{p(x; \mathcal{H}_0)} = \max_{\theta_1} L(x; \theta_1).$

Example of DC level in WGN with unknown amplitude and variance - GLRT

\mathcal{H}_0: A = 0, \, \sigma^2 > 0 \\ \mathcal{H}_1: A \neq 0, \, \sigma^2 > 0, \quad \sigma^2 : \text{nuisance parameter}.

(not of immediate interest, but must be accounted for the analysis of the parameters of interest)

GLRT : decide $\mathcal{H}_1$ if $L_G(\mathbf{x}) = \frac{p(\mathbf{x}; \hat{A}, \hat{\sigma}_1^2, \mathcal{H}_1)}{p(\mathbf{x}; \hat{\sigma}_0^2, \mathcal{H}_0)} > \gamma, \\ \hat{A} = \bar{x}, \quad \hat{\sigma}_0^2 = \frac{1}{N} \sum_{n=0}^{N-1} x^2[n], \quad \hat{\sigma}_1^2 = \frac{1}{N} \sum_{n=0}^{N-1} \left( x[n] - \bar{x} \right)^2. \\ p(\mathbf{x}; \hat{A}, \hat{\sigma}_1^2, \mathcal{H}_1) = \frac{1}{(2\pi \hat{\sigma}_1^2)^{N/2}} \exp\left[ -\frac{1}{2\hat{\sigma}_1^2} \sum_{n=0}^{N-1} \left( x[n] - \hat{A} \right)^2 \right], \\ p(\mathbf{x}; \hat{\sigma}_0^2, \mathcal{H}_0) = \frac{1}{(2\pi \hat{\sigma}_0^2)^{N/2}} \exp\left[ -\frac{N}{2} \right], \\ 2 \ln L_G(\mathbf{x}) = N \ln \frac{\hat{\sigma}_0^2}{\hat{\sigma}_1^2}.$

Locally Most Powerful Detectors

For two-sided tests, a UMP test does not exist. For one-sided tests, a UMP test may not exist. One-sided test without any nuisance parameters

\mathcal{H}_0:\theta=\theta_0,\;\mathcal{H}_1:\theta>\theta_0

If we wish to test for values of $\theta$ that are near $\theta_0$ , then a locally most powerful test exists

The LMP test does not guarantee the optimality if $|\theta-\theta_0|$ is large
NP test : decide $\mathcal{H}_1$ if $\frac{p(\mathbf{x}; \theta)}{p(\mathbf{x}; \theta_0)} > \gamma \quad \rightarrow \quad \ln p(\mathbf{x}; \theta) - \ln p(\mathbf{x}; \theta_0) > \ln \gamma \\ \ln p(\mathbf{x}; \theta) \approx \ln p(\mathbf{x}; \theta_0) + \left. \frac{\partial \ln p(\mathbf{x}; \theta)}{\partial \theta} \right|_{\theta = \theta_0} (\theta - \theta_0) \quad \\[0.2cm] \rightarrow \quad \left. \frac{\partial \ln p(\mathbf{x}; \theta)}{\partial \theta} \right|_{\theta = \theta_0} > \ln \gamma / (\theta - \theta_0) = \gamma', \\ T_\text{LMP}(\mathbf{x}) = \frac{\left. \frac{\partial \ln p(\mathbf{x}; \theta)}{\partial \theta} \right|_{\theta = \theta_0}}{\sqrt{I(\theta_0)}}: \text{scaled statistic}.$

Example of correlation testing

2-D IID Gaussian vectors $\{\mathbf{x}[0], \mathbf{x}[1], \dots, \mathbf{x}[N-1]\}, \quad \mathbf{x}[n] = \begin{bmatrix} x_1[n] \\ x_2[n] \end{bmatrix}^T$ $\mathbf{x}[n] \sim \mathcal{N}(\mathbf{0}, \mathbf{C}), \quad \mathbf{C} = \sigma^2 \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \quad \mathbf{C}^{-1} = \sigma^{-2} \begin{bmatrix} \frac{1}{1-\rho^2} & -\frac{\rho}{1-\rho^2} \\ -\frac{\rho}{1-\rho^2} & \frac{1}{1-\rho^2} \end{bmatrix} H_0: \rho = 0, \quad H_1: \rho > 0 \\ p(\mathbf{x}; \rho) = \prod_{n=0}^{N-1} \frac{1}{(2\pi) \det^{1/2}(\mathbf{C})} \exp\left( -\frac{1}{2} \mathbf{x}[n]^T \mathbf{C}^{-1} \mathbf{x}[n] \right) \\ \ln p(\mathbf{x}; \rho) = -\frac{N}{2} \ln 2\pi - \frac{N}{2} \ln \sigma^4 (1-\rho^2) - \frac{1}{2\sigma^2} \sum_{n=0}^{N-1} \mathbf{x}[n]^T \mathbf{C}_0^{-1} \mathbf{x}[n] \\ \frac{\partial \ln p(\mathbf{x}; \rho)}{\partial \rho} = \frac{N\rho}{1-\rho^2} - \frac{1}{2\sigma^2} \sum_{n=0}^{N-1} \mathbf{x}[n]^T \frac{\partial \mathbf{C}_0^{-1}}{\partial \rho} \mathbf{x}[n]\\[0.4cm] \frac{\partial \mathbf{C}_0^{-1}}{\partial \rho} = \begin{bmatrix} \frac{2\rho}{(1-\rho^2)^2} -\frac{1+\rho^2}{(1-\rho^2)^2} \\ -\frac{1+\rho^2}{(1-\rho^2)^2} \frac{2\rho}{(1-\rho^2)^2} \end{bmatrix}\\[0.3cm] \frac{\partial \ln p(\mathbf{x}; \rho)}{\partial \rho}\Bigg|_{\rho=0} = -\frac{1}{\sigma^2} \sum_{n=0}^{N-1} \mathbf{x}^T[n] \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix} \mathbf{x}[n] = \frac{\sum_{n=0}^{N-1} x_1[n] x_2[n]}{\sigma^2}\\[0.3cm] I(\rho) = \frac{N(1+\rho^2)}{(1-\rho^2)^2} \implies I(0) = N \\[0.3cm] T_{LMP}(\mathbf{x}) = \frac{\sum_{n=0}^{N-1} x_1[n] x_2[n]}{\sqrt{N\sigma^2}} = \sqrt{N}\hat{\rho} > \gamma'\\[0.3cm] \hat{\rho} = \frac{1}{N} \sum_{n=0}^{N-1} \frac{x_1[n] x_2[n]}{\sigma^2} \quad \text{is an estimate of } \rho, \text{ although it is not the MLE.}$

Multiple Hypothesis Testing

Without unknown parameters, the optimal Bayesian approach with minimum probability of error criterion and equally hypotheses lead to the maximum likelihood rule

Choose the hypothesis for which $p(\text{x}|\mathcal{H}_i)$ is maximum

How about the case with unknown parameters?

Bayesian approach

$p(\text{x}|\mathcal{H}_i)=\int p(\text{x}|\theta_i,\mathcal{H}_i)p(\theta_i)d\theta_i$ : ML rule can be implemented
Still not so popular due to the difficulty of performing integration

How about GLRT? Can it be extended to multiple hypothesis test?

GLRT for multiple hypothesis test : not possible

Example : detecting a signal that is modeled as a DC level or a line in WGN

\mathcal{H}_0 : x[n] = w[n], \quad \mathcal{H}_1 : x[n] = A + w[n], \quad \mathcal{H}_2 : x[n] = A + Bn + w[n]

The unknown parameters for the PDFs conditioned on $\mathcal{H}_0$ and $\mathcal{H}_1$ are a subset of those for the PDFs conditioned on $\mathcal{H}_2$
$\theta_0 = \sigma^2, \quad \theta_1 = \begin{bmatrix} \sigma^2 \\ A \end{bmatrix} = \begin{bmatrix} \theta_0 \\ \theta_A \end{bmatrix}, \quad \theta_2 = \begin{bmatrix} \sigma^2 \\ A \\ B \end{bmatrix} = \begin{bmatrix} \theta_1 \\ \theta_B \end{bmatrix}$
- The parameter spaces are nested
- decide $\mathcal{H}_k$ if $\max_{\theta_i} p(\text{x};\theta_i|\mathcal{H}_i)$ is maximum for $i=k$ always choose $\mathcal{H}_2$ because of the nesting

Alternative approaches
- Include a term to give penalty to the number of parameters
- Generalized ML rule : deicde $\mathcal{H}_k$ if $\xi_i = \ln p\left(\mathbf{x}; \hat{\boldsymbol{\theta}}_i \mid \mathcal{H}_i \right) - \frac{1}{2} \ln \det \left( \mathbf{I}(\hat{\boldsymbol{\theta}}_i) \right) \text{ is maximized for } i = k$
- $\det(I(\hat \theta_i))$ increases with more number of parameters
- Minimum description length (MDL) : choose the hypothesis that minimizes $\text{MDL}(i)=-\ln p(\text{x};\hat \theta_i|\mathcal{H}_i)+\frac{n_i}{2}\ln N$ where $n_i$ is the number of estimated parameters