[Theory of Statistics] 2. Decision Theory (결정 이론)

woongineer·2024년 3월 15일

목록 보기

2/3

2.1 결정이론의 요소들

관측치 모형

모형: $(\mathcal{X}, \mathcal{Y}, (\mathbf{P}_\theta)_{\theta \in \Theta})$
$\quad\quad$ $\rightarrow$ $\mathcal{X}$ : 표본공간, $\quad \mathcal{Y}$ : $\mathcal{X}$ 상의 $\sigma-field$ , $\quad (\mathbf{P}_\theta)_{\theta \in \Theta}$ : $\mathcal{Y}$ 상의 확률 측도들의 모임.
관측치 $\mathbf{X} \sim \mathcal{P}_\theta$
$\mathbf{X}=(X_1,\ldots,X_n)$ , $\quad \mathcal{P}_\theta=\Pi_{i=1}^nf_\theta$

행동 공간 (Action Space)

$\mathcal{A}$ : 취할 수 있는 가능한 행동들의 공간.

Example
(i) 추정
$\quad\mathbf{X} \sim \mathcal{P}_\theta$ ; $\theta$ 의 추정이 목적.
$\quad\mathcal{A}=\Theta$

(ii) 검정
$\quad H_0:\theta \in \Theta_0\quad$ vs. $\quad H_1:\theta \in \Theta_1$
$\quad A=\{H_0,H_1\}$

(iii) 순위 ranking
$\quad$ 3개 핸드폰 회사의 순위를 매기려 한다.
$\quad \mathcal{A}=\{(a,b,c), (a,c,b), \cdots\}$

손실함수(Loss function)

$L:\Theta \times \mathcal{A}\rightarrow\mathbf{R}\quad$ (참값에 action을 취했을 때의 값)
$\quad (\theta, a)\longmapsto L(\theta, a)$
$\quad L(\theta, a)$ : $\ \theta$ 가 참이고, 행동 $a$ 를 취할 때, 얻어지는 손실의 양.

Example. 추정
$L(\theta, a)=(\theta-a)^2$ : 제곱손실오차
$L(\theta, a)=|\theta-a|$ : 절댓값 손실
$L(\theta, a)=\|\theta-a\|^2$ : $\theta, a\in\mathbf{R}^k$ 인 경우 (다변량)
$L(f, a)=\int(f(x)-a(x))^2dP(x)$ : $f, a$ : 함수인 경우.
$L(\Sigma, A)=\|\Sigma-a\|_F^2\quad$ (Covariance matrix; Frobenius norm.)

Example. 가설검정
$H_0:\theta\in\Theta_0\quad$ vs. $\quad H_1:\theta\in\Theta_1$
$\mathcal{A}=\{H_0, H_1\}, \{0, 1\}$
$L(\theta, a)=0$
$L(\theta, a) = \begin{cases} 0 &,\; \theta \in a \\ 1 &,\; \theta \notin a \end{cases}$

결정규칙 (Decision rule)

$\delta:\mathcal{X}\longmapsto\mathcal{A}$
$\quad\; x \longmapsto \delta(x)$
$\rightarrow$ 관측치 $x$ 가 정해지면, 취할 행동을 정해놓은 함수.

Example.
(i) 추정
추정량 = 결정규칙
$\delta(x)=\bar{x}$
$\delta(x)=\frac{1}{n}\Sigma(x_i-\bar{x})^2$

(ii) 검정
$\delta(x) = \begin{cases} H_0 &,\; \frac{\sqrt{n}}{\sigma}(\bar{x}-\sigma_0)>z_\alpha \\ H_1 &,\; \frac{\sqrt{n}}{\sigma}(\bar{x}-\sigma_0)\leq z_\alpha \end{cases}$

랜덤화된 결정규칙 (Randomized decision rule)

$\delta:\mathcal{X}\longmapsto \mathbf{P(\mathcal{A})}$ ; $\; \mathcal{A}$ 상의 확률측도들의 모임.
$\quad\;\; x \longmapsto \delta(x,\cdot)$ ; $\; \mathcal{A}$ 상의 확률측도
$x$ 가 관측이 되면, $a\sim\delta(x, \cdot)$ 를 생성해서 행동 $a$ 를 취한다.
$\delta; \mathcal{X}\times(\mathcal{A}$ 상의 $\sigma-field) \rightarrow [0,1]$
$\quad\quad\quad\quad\quad\quad\quad(x,A)\longmapsto\delta(x,A)\in[0,1]$

손실함수의 계산
$L(\theta,\delta(x,\cdot))=\int L(\theta, a)\delta(x,da)$

위험함수 (Risk function)

$\quad\quad$ cf) Action에 대한 avg.가 아닌 결정규칙에 대한 avg.

$\theta$ 가 참일 때, 결정규칙의 기대손실.
$\begin{aligned}R(\theta,\delta)&=\int L(\theta, \delta(x))\mathcal{P_\theta(dx)}\\ &=\mathbf{E}_\theta[ L(\theta,\delta(x))] \end{aligned}$

Example. 추정
$\nu(\theta)$ : 추정할 모수
$\delta(x)=\hat{\nu}(x)$ : $\nu$ 의 추정량.
$L(\theta,a)=(\nu(\theta)-a)^2$
$\begin{aligned} R(\theta,\delta)=\mathbf{E}_\theta[\nu(\theta)-\delta(x)]^2&=\mathbf{E}_\theta[\nu(\theta)-\mathbf{E}_\theta\delta(x)+\mathbf{E}_\theta\delta(x)-\delta(x)]^2\\ &=\mathbf{E}_\theta[\nu(\theta)-\mathbf{E}_\theta\delta(x)]^2+\mathbf{E}_\theta[\delta(x)-\mathbf{E}_\theta\delta(x)]^2\\ &=Var(\delta(x))+\{\nu(\theta)-\mathbf{E}_\theta\delta(x)\}^2\\ &=Var_\theta\delta(x)+Bias^2(\delta(x))_\blacksquare \\ &= MSE(\theta,\delta) \end{aligned}$

Example. 두 추정량의 비교
$\mu:$ 추정하고자 하는 관악구의 소득
$\mu_0:$ 알려진 대한민국 가구소득 평균

$X_1,\cdots,X_n\sim N(\mu, \sigma^2):$ 표본추출된 관악구 거주자들의 소득
$\begin{cases} \delta_1=0.2\mu_0+0.8\bar{X}&\\ \delta_2=\bar{X}& \end{cases}$

$\begin{cases} MSE_\mu(\delta_1)=R(\mu,\delta_1)=0.64\frac{\sigma^2}{n}+0.04(\mu_0-\mu)^2&\\ MSE_\mu(\delta_2)=Var_\mu(\bar{X})=\frac{\sigma^2}{n}=R(\mu,\delta_2)& \end{cases}$

$\delta_1,\delta_2$ 중에 우열을 가리기가 어렵다. 이유는 함수인 $R(\mu,\delta)$ 를 비교하기 때문.

Method 1) $R(\theta,\delta)$ 를 숫자 하나로 만드는 방법: Bayes 방법, Minimax 방법
Method 2) 비교하는 $\delta$ 의 모임을 줄이는 방법: UMVUE, 불변추정량

Example. 검정의 예
손실함수 (0-1 loss)

행동\참	H0	H1
H0	0	2종 오류(1)
H1	1종 오류(1)	0

검정함수 (Test function)

랜덤화된 결정규칙
$\delta:\mathcal{X}\rightarrow [0,1]$
$\quad\;\; x \longmapsto \delta(x)=H_0$ 를 기각할 확률.

기각역 (Rejection region)

$\delta(x)=I(x\in C)$ , $\quad C:$ 기각역

위험함수

$\delta(x)=I(x\in C)$ 라 하자.

$\begin{aligned} R(\theta,\delta) &= E_{\theta} L(\theta,\delta(X)) \\ &= \begin{cases} \mathbf{E}_\theta\delta(x), & \text{if } \theta \in \Theta_0 \\ \mathbf{E}_\theta(1-\delta(x)), & \text{if } \theta \in \Theta_1 \end{cases}\\ &= \begin{cases} P_{\theta}(X \in C), & \text{if } \theta \in \Theta_0 \\ P_{\theta}(X \notin C), & \text{if } \theta \in \Theta_1 \end{cases}\\ &= \begin{cases} H_0 \text{가 참일 때 }H_1 \text{을 선택하는 확률}, & \text{if } \theta \in \Theta_0 \quad; \text{1종오류 확률}\\ H_1 \text{이 참일 때 }H_0 \text{를 선택하는 확률}, & \text{if } \theta \in \Theta_1\quad; \text{2종오류 확률} \end{cases} \end{aligned}$

2.2 결정규칙의 비교

$\delta$ 는 $\delta'$ 보다 더 좋다. (improve)

$\begin{aligned} \iff &(i)\; R(\theta,\delta) \leq R(\theta,\delta'), \; \forall\theta \in \Theta\\ &(ii)\; R(\theta,\delta)<R(\theta,\delta')인\;\; \theta\in\Theta\;\; 존재. \end{aligned}$

허용가능(admissible), 허용불가능(inadmissible)

$\delta:$ 허용 불가능 $\iff$ $\delta$ 보다 더 좋은 $\delta'$ 존재. $\quad\quad$ i.e. 더 좋은게 있으면 쓰지 말아야. "나쁘다"
$\delta:$ 허용 가능 $\iff$ $\delta$ 는 허용 불가능하지 않다. $\quad\quad$ but, 허용 가능하다고 good? No. "나쁘지 않다" 정도.

베이즈 규칙 (Bayes rule)

(i) 베이즈 위험(Bayes risk)

$\quad r(\pi,\delta)=\int_\Theta R(\theta,\delta)\pi(d\theta)$ , $\quad \pi:\Theta$ 위에 정의된 분포.(사전분포, prior)

(ii) 사전분포 $\pi$ 에 대한 베이즈 규칙

$\quad \delta^B:= \underset{\delta}{\arg\min}\ r(\pi,\delta)$

최소최대규칙 (Minimax rule)

$\delta^m:= \underset{\delta}{\arg\min}\;\underset{\theta \in\Theta}{\max}\; r(\pi,\delta)$ $\quad\quad\quad \Rightarrow\quad$ 가장 위험한 일을 하지 않는 것. (e.g. 보험)

완비모임정리 (Complete class theorem)

통계모형 $(\mathcal{X},\mathcal{Y},(\mathbf{P}_\theta)_{\theta \in\Theta})$ 이 다음을 만족.

(a) 확률의 정규성 (regularity)

$\mathbf{P}:\mathcal{Y}\times\Theta \rightarrow [0,1]$ 은 확률적 커널이다.
i.e., (i) 모든 $A \in \mathcal{Y}$ (A 고정) 에 대해, $\theta \longmapsto \mathbf{P}_\theta(A)$ 는 보렐 측도 가능 (최소 거리 공간의 의미).
$\quad\ \;$ (ii) 모든 $\theta \in \Theta$ ( $\theta$ 고정) 에 대해, $\mathbf{P}_\theta$ 는 $\mathcal{Y}$ 상의 확률 측도.

(b) 모형의 연속성

모수공간 $\Theta$ 는 거리공간이고, $\theta \longmapsto \mathbf{P}_\theta$ 는 $L_1-norm$ 에 연속.
$\|\mathbf{P}_{\theta_1}-\mathbf{P}_{\theta_2}\|_{L_1}=\int|f_{\theta_1}(x)-f_{\theta_2}(x)|d\mu(x)$
$\mathcal{A}:$ 행동공간, $\quad \mathcal{P}(\mathcal{A}):\mathcal{A}$ 상의 확률측도들의 모임. 랜덤화된 행동공간.

완비모임정리

모수공간 $\Theta:$ 분리가능한(separable) 거리공간.
$\mathcal{A}:$ 긴밀한(compact) 거리공간.
손실함수 $L(\theta,a)$ 는 유계(bounded)이고, $(\theta,a)$ 에 관해 연속.
$\Rightarrow$ 모든 $\delta\in\mathcal{P}(A)$ 에 대해 (randomized),
$\quad$ $\delta_k\rightsquigarrow\delta_0$ (분포수렴 느낌)
$\quad$ 이고 $R(\theta,\delta_0)\leq R(\theta,\delta),\quad \forall\theta\in\Theta$
$\quad$ 를 만족하는 $\delta_0\in\mathcal{P}(A)$ 와 사전분포의 열 $\pi_k$ 가 존재한다.
$\quad\delta_k$ 는 사전분포 $\pi_k$ 에 대한 베이즈 규칙이다.

cf) 완비모임정리는 모든 결정규칙 $\delta$ 에 대해, $\delta$ 보다 성능이 좋은 베이즈 규칙의 극한이 존재한다는 뜻이다.

Q.E.D.

woongineer

통린이 대학원생

이전 포스트

[Theory of Statistics] 1. Statistical Models, Goals, and Performance Criteria

다음 포스트

[Theory of Statistics] 2. Decision Theory (결정 이론)

통계이론1

2.1 결정이론의 요소들

관측치 모형

행동 공간 (Action Space)

손실함수(Loss function)

결정규칙 (Decision rule)

랜덤화된 결정규칙 (Randomized decision rule)

위험함수 (Risk function)

검정함수 (Test function)

기각역 (Rejection region)

위험함수

2.2 결정규칙의 비교

$\delta$ 는 $\delta'$ 보다 더 좋다. (improve)

허용가능(admissible), 허용불가능(inadmissible)

베이즈 규칙 (Bayes rule)

(i) 베이즈 위험(Bayes risk)

(ii) 사전분포 $\pi$ 에 대한 베이즈 규칙

최소최대규칙 (Minimax rule)

완비모임정리 (Complete class theorem)

(a) 확률의 정규성 (regularity)

(b) 모형의 연속성

완비모임정리

[Theory of Statistics] 1. Statistical Models, Goals, and Performance Criteria

[Theory of Statistics] Sufficiency (충분성)

0개의 댓글

[Theory of Statistics] 2. Decision Theory (결정 이론)

통계이론1

2.1 결정이론의 요소들

관측치 모형

행동 공간 (Action Space)

손실함수(Loss function)

결정규칙 (Decision rule)

랜덤화된 결정규칙 (Randomized decision rule)

위험함수 (Risk function)

검정함수 (Test function)

기각역 (Rejection region)

위험함수

2.2 결정규칙의 비교

δ\deltaδ는 δ′\delta'δ′보다 더 좋다. (improve)

허용가능(admissible), 허용불가능(inadmissible)

베이즈 규칙 (Bayes rule)

(i) 베이즈 위험(Bayes risk)

(ii) 사전분포 π\piπ에 대한 베이즈 규칙

최소최대규칙 (Minimax rule)

완비모임정리 (Complete class theorem)

(a) 확률의 정규성 (regularity)

(b) 모형의 연속성

완비모임정리

[Theory of Statistics] 1. Statistical Models, Goals, and Performance Criteria

[Theory of Statistics] Sufficiency (충분성)

0개의 댓글

$\delta$ 는 $\delta'$ 보다 더 좋다. (improve)

(ii) 사전분포 $\pi$ 에 대한 베이즈 규칙