확률과 통계 정리

유승우·2023년 9월 26일

수학

노션 링크

Basic concept

Probability Space

Sample space(표본공간) $\Omega$ : a (fixed) set of all possible outcomes
event(사건) $\mathcal F$ : A set of events
probability measure $\mathbb P:\mathcal F \rightarrow[0,1]$

📐 공리

$\{A_i\}\subseteq\mathcal F$ 에 대해

$0 \le P(A_i)\le1$
$\mathbb P(\Omega)=1$
$if\ i\ne j\ and\ A_i\cap A_j=\emptyset,\ \mathbb P(\cup_iA_i)=\sum_i\mathbb P(A_i)$

$(\Omega, \mathcal F, \mathbb P)$ 를 probability space라 한다.

Basic properties of probability

$A$ 를 사건이라 하자.

$\mathbb P(A^c) = 1-\mathbb P(A)$
If $B$ 가 사건이고 $B\subseteq A,\ \mathbb P(B)\le \mathbb P(A)$
$0=\mathbb P(\emptyset)\le\mathbb P(A)\le\mathbb P(\Omega)=1$

Other concepts

Boole’s inequality (Union bound)

⁍

Conditional probability

\mathbb P(A|B)={\mathbb P(A\cap B)\over \mathbb P(B)}

Bayes’ rule

\mathbb P(A|B)={\mathbb P(B|A)\mathbb P(A)\over\mathbb P(B)}\\ posterior={likelihood\times prior\over evidence}

Random variable

A random variable on a probability space $(\Omega, \mathcal F, \mathbb P)$ is a function $X: \Omega\rightarrow \mathbb R$

\mathbb P(X=x)=\mathbb P(\{\omega\in\Omega:X(\omega)=x\})

위에서 볼 수 있 듯 사건 $X=x$ 는 집합이다.

Types of random variables

Discrete random variables (이산 확률 변수)
- probability mass function (PMF) $p:X(\Omega)\rightarrow[0,1]$ $\sum_{x\in X(\Omega)}p(x)=1\quad \mathbb P(X=x)=p(x)$
Continuous random variables (연속 확률 변수)
- probability density function (PDF) $p: \mathbb R\rightarrow [0,\infin)$ $\int_{-\infin}^\infin p(x)=1\quad F(x)=\int_{-\infin}^x p(z)dz$

Joint probability

$\{X_i\}$ are independent if for every finite subset of indices $i_1,\dots,j_k\in I$

p(X_{i_1},\dots,X_{i_k})=\prod_{j=1}^kp(X_{i_j})

i.i.d

random variables are independent and identically distributed (i.i.d.)

p(X_1,\dots,X_n)=\prod_{i=1}^np(X_i)

where $X_1,\dots,X_n$ all share the same PMF/PDF

독립이고 서로 동일한 분포

CDF (cumulative distribution function)

F_X(x)=\mathbb P(X\le x)

Probability Distributions

각 기댓값과 분산은 급수/적분 계산을 하면 쉽게 구할 수 있다.

Uniform Distrbution

확률이 일정한 분포

Discrete

$X\sim Unif(k,l)$
$P_X(x) = \begin{cases} {1\over(l-k+1)},\ x=k,k+1,\cdots,l-1,l \\ 0,\ otherwise \end{cases}$
$\mathbb E[X] = {(k+l)/2};\ Var[X]=(l-k)(l+k+2)/12$

Continuous

$X\sim Unif(a,b)$
$P_X(x) = \begin{cases} {1\over(b-a)},\ a\le x<b \\ 0,\ otherwise \end{cases}$
$\mathbb E[X] = {(a+b)/2};\ Var[X]=(b-a)^2/12$

Untitled

Bernoullo Distribution

랜덤한 이진 분포

$X\sim Bern(p)$
$P_X(x) = \begin{cases} p,\quad x=1\\ 1-p,\quad x=0\\ 0,\quad otherwise \end{cases}$
$\mathbb E[X] = p;\ Var[X]=p(1-p)$

Geometric Distrubution

Bernoulli를 x회 시행할 때 첫번째(마지막)에만 성공할 확률

$X\sim Geo(p)$
$P_X(x) = p(1-p)^{x-1}\ for\ x=1,2,3,\cdots$
$\mathbb E[X] = 1/p;\ Var[X]=p(1-p)$
ex)

Untitled

Binomial Distribution

이항분포

$X\sim B(p)$
$P_X(x) = \binom{n}{x}p^x(1-p)^{x-1}\ for\ x=0,1,2,3,\cdots,n$
$\mathbb E[X] = np;\ Var[X]=np(1-p)$

Untitled

Negative Binomial Distribution

음이항분포: r회의 실패를 하기 위한 성공 횟수

$X\sim NB(p)$
$P_X(x) = \binom{x+r-1}{x}p^x(1-p)^r\ for\ x=0,1,2,3,\cdots$
$\mathbb E[X] = rp(1-p);\ Var[X]=rp(1-p)^2$

Untitled

Poisson Distribution

평균적으로 $\alpha$ 회의 사건이 관찰되었을 때 실제로 x회일 확률 분포

$X\sim Poi(p)$
$P_X(x) = \begin{cases} a^xe^{-\alpha}/x!,\quad x=0,1,2,\cdots\\ 0,\quad otherwise \end{cases}$
$\mathbb E[X] = \alpha;\ Var[X]=\alpha$

Untitled

n time slots에서 평균적으로 $\alpha$ 회의 사건이 발생되었다. 그럼 각 slot에서 사건이 발생할 확률은 $\alpha /n$ 이다. $X$ 를 실제 관찰된 사건의 횟수로 두면 $X\sim B(n,\alpha/n)$ 이다.

$n\rightarrow \infin,\ B(n,\alpha/n)\rightarrow poi(\alpha)$

Untitled

Exponential Distribution

단위시간동안 평균적으로 $\lambda$ 회의 사건이 발생할 때 사건까지 waiting time

$X\sim Exp(\lambda)$
$f_X(x) = \lambda e^{-\lambda x}\ for\ x\ge0$
$\mathbb E[X] = {1\over \lambda};\ Var[X]={1\over \lambda^2}$

$K$ 를 $Pr[K=k]=Pr[k-1<X\le k]$ 인 이산확률분포라 하자.

그럼 $P_K(k) = F_X(k)-F(k-1)=(1-e^{-\lambda})(e^{-\lambda})^{k-1}$ 이므로 $K\sim Geo(1-e^{-\lambda})$ 라 할 수 있다.

geometric distribition이 첫 성공을 위한 waiting time이라 할 수 있으므로 Exponential Distribution은 사건까지의 waiting time이라 할 수 있다.

Untitled

Gaussian (Normal) Distribution

가우시안 (정규) 분포

$X\sim N(\mu,\sigma^2)$
$f_X(x) = {1\over\sqrt{2\pi\sigma^2}}e^{-{(x-\mu)^2\over2\sigma^2}},\ F_X(x)=\Phi({x-\mu\over\sigma})$
$\mathbb E[X] = \mu;\ Var[X]=\sigma^2$ A
표준 정규 분포
Central limit theorem
- $Y=X_1+X_2+\cdots+X_n$ 이 i.i.d일 때 $n\rightarrow \infin, Y\rightarrow Normal\ distribution$
- 실제 현상들은 작고 독립적인 사건들의 집합이므로 정규분포는 실제 적용에 중요하다.

유승우

ㅎㅇㄹ

이전 포스트

선형대수학 정리

다음 포스트

확률과 통계 정리

Basic concept

Probability Space

Basic properties of probability

Other concepts

Boole’s inequality (Union bound)

Conditional probability

Bayes’ rule

Random variable

Types of random variables

Joint probability

i.i.d

CDF (cumulative distribution function)

Probability Distributions

Uniform Distrbution

Discrete

Continuous

Bernoullo Distribution

Geometric Distrubution

Binomial Distribution

Negative Binomial Distribution

Poisson Distribution

Exponential Distribution

Gaussian (Normal) Distribution

선형대수학 정리

Kaggle: Instant Gratification

0개의 댓글