Basic Markov Chain Theory

woozins·2024년 8월 19일

Statistical Machine Learning

목록 보기

2/7

MCMC(markov chain monte carlo)를 이해하기 위해, 간단한 markov chain theory에 대하여 알 필요가 있다. 바로 본론으로 들어가도록 하자.

Markov Chain

The process $\{X_n : n\in T\}$ is a Markov chain if

P(X_n = x | X_0,...,X_{n-1}) = P(X_n = x|X_{n-1})

즉, markov chain은 오직 이전의 상태(state)에만 영향을 받는 확률과정을 이른다.
따라서 다음과 같은 성질이 성립한다.

f(x_1,...,x_n) = f(x_1)f(x_2|x_1)f(x_3|x_2)...f(x_n|x_{n-1})

이 포스트에서, 우리의 관심사는 다음과 같다.

Markov Chain이 언젠가 평형상태를 이루는 조건은 무엇인가?

참고로, 여기서 평형상태라는 것은 n이 커질 때 $X_n$ 의 분포가 어떠한 상태로 수렴한다는 것을 의미한다. 이의 정의에 대해선 뒤에서 다루도록 하자.

Some definitions

Transition Matrix

We call $p_{ij} = P(X_{n+1} = j |X_n = i)$ the transition probabilities.
The matrix P whose (i,j) element is $p_{ij}$ is called the transition matrix.

이 때, $p_{ij}$ 가 시간 n에 영향을 받지않으면, 이러한 markov chain을 Homogenous 라고 한다. 우리는 오직 homogenous한 상황에 대하여만 다룬다.

N-step transition probabilities

$p_{ij}(n) = P(X_{m+n} = j |X_m = i)$
즉 상태가 i인 state로부터 n시간이 흐른 후, state가 j일 확률이다.

The Chapman-Kolmogorov equations

The n-step probabilities satisfy
$p_{ij}(m+n) = \sum_k p_{ik}(m)p_{kj}(n)$

이는

P_{m+n} = P_mP_n

임을 의미한다.
$\because P_{m+n,ij} = \sum_k P_{m,ik}P_{n, kj}$

Communication relation

Communication relation

We say that i reaches j if $p_{ij}(n) >0$ for some n, and we write $i \rightarrow j$ .
If $i \rightarrow j$ and $j \rightarrow i$ , we write $i \leftrightarrow j$ and say that i and j communicate.

Theorem

The communication relation satisfies the following properties.

$i \leftrightarrow i$

if $i \leftrightarrow j$ then $j \leftrightarrow i$

if $i \leftrightarrow j$ and $j \leftrightarrow k$ then $i \leftrightarrow k$

The set of states $\chi$ can be written as a disjoint union of classes $\chi = \chi_1 \cup \chi_2 \cup ...$ where any two states i and j communicate each other if and only if they are in the same class.

위의 정리에서 만약, 모든 상태가 하나의 Class에 있다면, 우리는 그 Chain을 irreducible 하다고 부른다.
또한, 만약 어느 class안의 상태에서 class 외부로 이동이 불가하다면, 그러한 class를 closed라고 부르고, 만약 그러한 class의 원소가 하나라면, 그 원소를 absorbing state라고 부른다.

Recurrent state

Definition
State i is recurrent or persistent if

P(X_n = i \; for \; some \;n \ge 1|X_0 = i)

Otherwise, state i is transient.

즉, 우리는 어떠한 상태 i가, 언젠가 반드시 돌아온다면 이러한 상태를 "재귀적"이라고 한다.
반대로, 어떠한 상태 영원히 돌아오지 않을 확률이 존재한다면, 이러한 상태를 "일시적"이라고 한다.

Theorem

A state i is recurrent if and only if
$\sum_n p_{ii}(n) = \infty$

A state i is transient if and only if
$\sum_n p_{ii}(n) < \infty$

PF) Let $Y = \sum_n I_n$ , where $I_n$ : state가 n 시간 후 다시 i로 돌아오는 사건.
$E(Y|X_0 = i) = \sum_n p_{ii}(n)$ 이러한 관점에서 생각해볼때, 만약 i가 recurrent하다면, $Y = \infty$ 이어야 한다.

한편, 다시 state i로 돌아올 확률을 $\alpha< 1$ 라고하자.
그렇다면, $P(Y|X_o =i) = \alpha^n(1-\alpha)$ 이다. 즉 $Y|X_0$ 는 geometric 분포를 따르고,이의 평균은 $\frac{1}{\alpha} < \infty$ .

Properties of recurrence

If state i is recurrent and $i \leftrightarrow j$ , then j is recurrent

If state i is transient and $i \leftrightarrow j$ , then j is transient

A finite Markov chain must have at least one recurrent state.

The states of a finite, irreducible markov chain are all recurrent.

위 명제들에 대하여 생각해보자.
1. i는 반드시 i로 돌아온다. 만약 i상태에서 출발해 j상태에 도착했다고 해도, 반드시 i로 돌아와야 한다. 즉, $i \rightarrow j \rightarrow i$ .
만약, j가 transient하다면, 앞으로 state i가 계속해서 무한번 반복되는 동안, 즉 $i \rightarrow i$ 의 경로 사이에 j가 단 한번도 없어야 한다.
이 확률은 0이 될 수 밖에 없다.

볼차노-바이어슈트라스 정리 : 유한한 실수 공간에서의 임의의 유계 수열은 적어도 하나의 수렴 부분수열을 가짐. 비슷한 아이디어로 생각해보자.
1,3에 의해 자명하다.

위의 명제들은 아래의 정리로 자명하게 이어진다.
Decomposition Theorem
The state space $\chi$ can be written as the disjoint union

\chi = \chi_T \cup \chi_1 \cup \chi_2 ...

where $\chi_T$ are the transient states and each $\chi_i$ is a closed, irreducible set of recurrent states.

Convergence of Markov Chains

Some definitions

recurrence time
Let $X_0 = i$
$T_{ij} = \min \{n > 0: X_n = j\}$

mean recurrence time
$m_i = E(T_{ii}) = \sum_n nf_{ii}(n)$ , where
$f_{ij}(n) = P(X_1 \neq j, ... X_{n-1} \neq j, X_n = j|X_0 = j)$

A recurrent state is null recurrent or null if $m_i = \infty$ otherwise it is called non-null or positive.

Period
The period of state i is d if $p_{ii}(n) = 0$ whenever n is not divisible by d and d is the largest integer with this property. A state with period 1 is called aperiodic.

Lemma

If state i has period d and $i \leftrightarrow j$ , then j has period d.
pf) Let j is not d-periodic. i.e, $\exist N \; s.t.\; p_{jj}(N) \ne 0$ , (N은 d의 배수가 아님)
$i \leftrightarrow j$ 이므로, $i \rightarrow j$ 를 갈 때 a시간이 걸리고, $j \rightarrow i$ 로 갈 때 b시간이 걸린다면, a+b는 반드시 d의 배수이다.
다음과 같은 경로를 생각하자. $i \rightarrow j \rightarrow j \rightarrow i$ . 여기서 j가 d-periodic이 아니면 $i \rightarrow i$ 가 d의 배수가 아닌 확률로 돌아올 확률이 양수가 된다. Q.E.D

Ergodic

A state is ergodic if it is recurrent, non-null and aperiodic. A chain is ergodic if all its states are ergodic

Stationary distribution

We say that $\pi$ is a stationary distribution if $\pi = \pi P$ , where $\pi = \{\pi_i : i \in \chi \}$ is a vector of non-negative numbers, that sum to 1.

Main Theorem : When a markov chain has a stationary distributon?

1 본 정리는 Markov Chain의 극한분포의 유일 존재 조건을 나타낸다.

An irreducible, ergodic Markov chain has a unique stationary distribution $\pi$ . The limiting distribution exists and is equal to $\pi$ . If g is any bounded function, then,

lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^{N}g(X_n) \rightarrow E_{\pi}(g) \approx \sum_j \pi_j g(j), \; a.s.

2 본 정리는 마코프체인의 정상분포의 존재성이 보장된 상황에서, 정상분포가 무엇인지 확인하는 데 쓰이는 정리임
We say that $\pi$ satisfies detailed balance if

\pi_i p_{ij} = p_{ji}\pi_j

If $\pi$ satisfies detailed balance, then $\pi$ is a stationary distribution.

pf) WTS : $\pi = \pi P$ .

$(\pi P)_j = \sum_k \pi_k p_{kj} = \sum_k p_{jk}\pi_j$
= $\pi_j$

Appendix : Inference for Markov chains.

그렇다면, 전이행렬 P는 어떻게 추정할 수 있을까?

P의 각 행은, multinomial 분포를 따른다.
즉, MLE를 구하려면, $\hat{p}_{ij} = \frac{n_{ij}}{n_i}$ 로 구하면 됨.

woozins

공부 메모 공간

이전 포스트

Convexity

다음 포스트