Chap7. Bayesian Network

Kiwoong Park·2024년 3월 13일

통계

7.1 Probability Concepts

Probability

It is relative frequency (상대적 빈도, P(A=true)

Conditional Probability

보통 더 관심이 많은 것은 어떤 주어진 상황, 새로운 사건이 일어났을 때 어떤 다른 사건이 일어날 확률임
P(A=true|B=true)
- Out of all the outcomes in which B is true, how many also have A equal to true

7.2 Probability Theorems

Law of Total Probability
- a.k.a "summing out" or marginalization
- $P(a)=\sum_{b}P(a,b) =\sum_{b}P(a|b)p(b)$
- When B is any random variable
Joint distribution contains the information we need to compute any

Computing with Probabilities: Chain Rule of Factorization

P(a,b,c, ...z) = P(a|b,c, ..., z)P(b,c,...z)
Repeatedly applying this idea, we can write
- P(a,b,c, ...z) = P(a|b,c, ...z)P(b|c, ...z)...P(z)

Joint Probability Distribution

Joint probabilities can be between any number of variables
- P(Int=true, Effort=true, GPA=true)
P(I=true)=sum of P(I,E,G) in rows with Int=true

Independence

Variables A and B are independent if any of the following hold:
- P(A|B) = P(A)
  - P(A,B) = P(A)P(B)
  - P(B|A) = P(B)

Conditional vs. Marginal Independence

Marginal independence
- P(A|B) > P(A)
- This is not marginally independent!
  - X and Y are independent if and only if P(X)=P(X|Y)
Conditional independence
- P(A|B,C) = P(A|C)
- C가 주어진 상황에서는 B와 A는 독립적인 경우

7.3 Interpretation of Bayesian Network

Detour: Naive Bayes Classifier

Bayesian Network

A graphical notation of
- Random variables
- Conditional independence
- To obtain a compact representation of the full joint distributions
Syntax
- A acyclic(싸이클X) and directed graph(방향성O)
- A set of nodes
  - A random variable
  - A conditional distribution given its parents
  - $P(X_1|Parents(X_1))$
- A set of links
  - Direct influence from the parent to the child

Interpretation of Bayesian Network

Topology of network encodes conditional independence assertions
- Often from the domain experts
- What is related to what and how
  - Weather // Cavity $\rarr$ Toothache, Cavity $\rarr$ Stench
Interpretation
- Weather is independent of the other variables
- Toothache and Stench are conditionally independent given Cavity
  (이가 아플 확률이 구취가 나는 확률과 연관이 있을 수 있으나, 충치가 없다고 했을 때 이가 아플 확률가 구취가 나는 확률은 연관이 없다)

Components of Bayesian Network

Qualitative components
- Prior knowledge of casual relation
- Learning from data
- Frequently used structures
- Structural aspects
Quantitative components
- Conditional probability tables
- Probability distribution assigned to nodes
Probability computing is related to both
- Quantitative and Qualitative

7.4 Bayes Ball Algorithm

Typical Local Structures

Common parent
- Fixing "alarm" decouples "JohnCalls" and "MaryCalls"
- J $\perp$ M|A~ 알람이 울리거나/안울렸을 때(A가 주어졌을 때) JohnCalls 과 MaryCalls는 독립이다.
- P(J,M|A)=P(J|A)P(M|A)
Cascading
- Fixing "alarm" decouples "Buglary" and "MaryCalls"
- B $\perp$ M|A ~ 알람이 주어졌을 때 도둑이 들거나 안들거나 상관없이 MarryCalls의 확률은 독립적이다.
- P(M|B,A)=P(M|A)
V-Structure
- Fixing "alarm" couples "Buglary" and "Earthquake" (B와 E가 A라는 common child를 가짐)
- 특정 정보를 알게되면, 다른 정보들의 관계가 생김
- 알람이 울렸을 때 지진이 일어나지 않았다면 도둑이 들었다고 유추할 수 있음
- $\sim$ (B $\perp$ E|A) : A가 일어났을 때 B와 E는 독립적이지 않음. 즉, 관계가 생김.
  - P(B,E,A)=P(B)P(E)P(A|B,E)
- 특정 정보가 알려져 있지 않으면 독립임

Bayes Ball Algorithm

Conditional Independence 를 아는 것이 중요함!
특정 랜덤 변수 하나랑 다른 랜덤 변수 하나랑 독립일까? , 어떤 상황에서는 또 독립일까?
Bayes Ball을 굴려서 굴러가면 독립이 아님(=non-independent)

Purpose: checking $X_A \perp X_B | X_c$

Shade all nodes in $X_c$

Place balls at each node in $X_A$

Let the ball rolling on the graph by Bayes ball rules

Then, ask whether there is any ball reaching $X_B$

Markov Blanket
- $A\perpB|Blanket$
- Blanket = {parents, children, children's other paretns}
- 이 Blanket이 되면 Blanket 밖의 노드들(=B)과는 Conditionally Independent 하다
D-Seperation
- X is d-seperated (directly seperated) from Z given Y if we can't send a ball from any node in X to any node in Z using the Bayes ball algorithm

7.5 Factorization of Bayesian Network

그래프에 속한 RV(Random Variable)의 결합분포(joint distribution)는 family의 모든 조건부 분포 $P(Child|parent$ 의 곱으로 표현할 수 있다.
$P(X_1, X_2, ... , X_n) = \prod_i P(X_i|Parents(X_i))$
Factorization theorem

파라미터 수를 크게 줄일 수 있음!
- Factorizes according to the node given its parents
  - $P(X) = \prod_i P(X_i|X_{\pi_i})$
  - $X$ 는 joint random variables .. ex. $x_1,x_2, .. ,x_n$
- $X_{\pi_i}$ is the set of parents nodes of $X_i$

Plate Notation

Let's consider a certain Gaussian model
- Many Xs

7.6 Inference Question on Bayesian Network

Question 1 : Likelihood

Given a set of evidence, what is the likelihood of the evidence set?
- $X = {X_1, ... , X_N}$ : all randm variables
- $X_V = {X_{k+1} ... X_N}$ : evidence variables(이미 관측한 내용, 알아보고자 하는 내용 등)
- $X_H = X - X_V = {X_1 ... X_k}$ : hidden variables
General form
- $P(X_V) = \sum_{X_H} P(X_H, X_V)$

Question 2 : Conditional Probability

Given a set of evidence, what is the conditional probability of interested hidden variables?
- $X_H = {Y, Z}
  - Y: interested hidden variables
  - Z: uninterested hidden variables
General form
- $P(Y|X_V) = \sum_Z P(Y,Z|X_V) \\ = \sum_Z \frac{P(Y,Z,X_V)}{P(X_V)} \\ = \sum_Z \frac{P(Y,Z,X_V)}{\sum_{y,z}P(Y=y, Z=z, X_V)} \\$

Question 3 : Most Probable Assignment

$\underset{a}{argmax}P(A|B=true, MC=true)$

Given a set of evidence, what is the most probable assignment, or explanation given the evidence?
- Some variables of interests
- Need to utilize the inference question 2
  - Conditional probability
- Maximum a posteriori configuration of Y
Applications of a posteriori
- Prediction
  - B, E $\rarr$ A
- Diagnosis
  - A $\rarr$ B,E

7.7 Variable Elimination

Marginalization and Elimination

Computing joint probabilities is a key
- How to compute them?
- Many, many, many multiplications and summations
  - $P(a=true,b=true, mc=true)=\sum_{JC} \sum_E P(a,b,E,JC,mc) \\ = \sum_{JC}\sum_E P(JC|a) P(mc|a) P(a|b,E) P(E) p(b)$
- In big Oh notation?
Is there any better method(Brute-force 말고..)?
- What-if we move around the summation? $P(a,b,mc) = \sum_{JC} \sum_{E}P(a,b,E,JC,mc) \\ = P(b)P(mc|a) \sum_{JC} P(JC|a) \sum_E P(a|b,E)P(E)$
- Did we reduced the computation complexity?

Variable Elimination

Preliminary
- $P(e|jc,mc) = \alpha P(e,jc,mc)$
- $$\alpha = \frac{1}{P(jc,mc)}
Joint probability(e=jc=mc=true)
- $P(e,jc,mc,B,A) = \\ \alpha P(e) \sum_B P(b) \sum_A P(a|b,e)P(jc|a)P(mc|a)$
- Line up the terms by the Topological order(B,E,A,JC,MC)
- Consider a probability distribution as a function
  - $$f_E(E=t) = 0.002

7.8 Potential Function and Clique Graph

Potential function

$P(A,B,C,D) \\ = P(A|B)P(B|C)P(C|D)P(D)$
Let's define a potential function
- Potential function:
  a function which is not a probability function yet, but once normalized it can be a probability distribution function
- Potential function on nodes
  - $\psi(a,b), \psi(b,c), \psi(c,d)$

References
문일철. 2017. 인공지능 및 기계학습 개론II. KOOK

Kiwoong Park

You matter, never give up

이전 포스트

중요 샘플링(importance sampling)

다음 포스트

Chap7. Bayesian Network

7.1 Probability Concepts

Probability

Conditional Probability

7.2 Probability Theorems

Computing with Probabilities: Chain Rule of Factorization

Joint Probability Distribution

Independence

Conditional vs. Marginal Independence

7.3 Interpretation of Bayesian Network

Detour: Naive Bayes Classifier

Bayesian Network

Interpretation of Bayesian Network

Components of Bayesian Network

7.4 Bayes Ball Algorithm

Typical Local Structures

Bayes Ball Algorithm

7.5 Factorization of Bayesian Network

Plate Notation

7.6 Inference Question on Bayesian Network

Question 1 : Likelihood

Question 2 : Conditional Probability

Question 3 : Most Probable Assignment

7.7 Variable Elimination

Marginalization and Elimination

Variable Elimination

7.8 Potential Function and Clique Graph

Potential function

중요 샘플링(importance sampling)

벡터와 행렬의 미분

0개의 댓글