Chap7. Bayesian Network

Kiwoong Park·2024년 3월 13일

7.1 Probability Concepts

Probability

  • It is relative frequency (상대적 빈도, P(A=true)

Conditional Probability

  • 보통 더 관심이 많은 것은 어떤 주어진 상황, 새로운 사건이 일어났을 때 어떤 다른 사건이 일어날 확률임
  • P(A=true|B=true)
    • Out of all the outcomes in which B is true, how many also have A equal to true

7.2 Probability Theorems

  • Law of Total Probability
    • a.k.a "summing out" or marginalization
    • P(a)=bP(a,b)=bP(ab)p(b)P(a)=\sum_{b}P(a,b) =\sum_{b}P(a|b)p(b)
    • When B is any random variable
  • Joint distribution contains the information we need to compute any

Computing with Probabilities: Chain Rule of Factorization

  • P(a,b,c, ...z) = P(a|b,c, ..., z)P(b,c,...z)
  • Repeatedly applying this idea, we can write
    • P(a,b,c, ...z) = P(a|b,c, ...z)P(b|c, ...z)...P(z)

Joint Probability Distribution

  • Joint probabilities can be between any number of variables
    • P(Int=true, Effort=true, GPA=true)
  • P(I=true)=sum of P(I,E,G) in rows with Int=true

Independence

  • Variables A and B are independent if any of the following hold:
    • P(A|B) = P(A)
      • P(A,B) = P(A)P(B)
      • P(B|A) = P(B)

Conditional vs. Marginal Independence

  • Marginal independence

    • P(A|B) > P(A)
    • This is not marginally independent!
      • X and Y are independent if and only if P(X)=P(X|Y)
  • Conditional independence

    • P(A|B,C) = P(A|C)
    • C가 주어진 상황에서는 B와 A는 독립적인 경우

7.3 Interpretation of Bayesian Network

Detour: Naive Bayes Classifier

Bayesian Network

  • A graphical notation of
    • Random variables
    • Conditional independence
    • To obtain a compact representation of the full joint distributions
  • Syntax
    • A acyclic(싸이클X) and directed graph(방향성O)
    • A set of nodes
      • A random variable
      • A conditional distribution given its parents
      • P(X1Parents(X1))P(X_1|Parents(X_1))
    • A set of links
      • Direct influence from the parent to the child

Interpretation of Bayesian Network

  • Topology of network encodes conditional independence assertions

    • Often from the domain experts
    • What is related to what and how
      • Weather // Cavity \rarr Toothache, Cavity \rarr Stench
  • Interpretation

    • Weather is independent of the other variables
    • Toothache and Stench are conditionally independent given Cavity
      (이가 아플 확률이 구취가 나는 확률과 연관이 있을 수 있으나, 충치가 없다고 했을 때 이가 아플 확률가 구취가 나는 확률은 연관이 없다)

Components of Bayesian Network

  • Qualitative components
    • Prior knowledge of casual relation
    • Learning from data
    • Frequently used structures
    • Structural aspects
  • Quantitative components
    • Conditional probability tables
    • Probability distribution assigned to nodes
  • Probability computing is related to both
    • Quantitative and Qualitative

7.4 Bayes Ball Algorithm

Typical Local Structures

  • Common parent
    • Fixing "alarm" decouples "JohnCalls" and "MaryCalls"
    • J\perpM|A~ 알람이 울리거나/안울렸을 때(A가 주어졌을 때) JohnCalls 과 MaryCalls는 독립이다.
    • P(J,M|A)=P(J|A)P(M|A)
  • Cascading
    • Fixing "alarm" decouples "Buglary" and "MaryCalls"
    • B\perpM|A ~ 알람이 주어졌을 때 도둑이 들거나 안들거나 상관없이 MarryCalls의 확률은 독립적이다.
    • P(M|B,A)=P(M|A)
  • V-Structure
    • Fixing "alarm" couples "Buglary" and "Earthquake" (B와 E가 A라는 common child를 가짐)
    • 특정 정보를 알게되면, 다른 정보들의 관계가 생김
    • 알람이 울렸을 때 지진이 일어나지 않았다면 도둑이 들었다고 유추할 수 있음
    • \sim (B \perp E|A) : A가 일어났을 때 B와 E는 독립적이지 않음. 즉, 관계가 생김.
      • P(B,E,A)=P(B)P(E)P(A|B,E)
    • 특정 정보가 알려져 있지 않으면 독립임

Bayes Ball Algorithm

Conditional Independence 를 아는 것이 중요함!
특정 랜덤 변수 하나랑 다른 랜덤 변수 하나랑 독립일까? , 어떤 상황에서는 또 독립일까?
Bayes Ball을 굴려서 굴러가면 독립이 아님(=non-independent)

  • Purpose: checking XAXBXcX_A \perp X_B | X_c
    • Shade all nodes in XcX_c
    • Place balls at each node in XAX_A
    • Let the ball rolling on the graph by Bayes ball rules
    • Then, ask whether there is any ball reaching XBX_B
  • Markov Blanket
    • A\perpBBlanketA\perpB|Blanket
    • Blanket = {parents, children, children's other paretns}
    • 이 Blanket이 되면 Blanket 밖의 노드들(=B)과는 Conditionally Independent 하다
  • D-Seperation
    • X is d-seperated (directly seperated) from Z given Y if we can't send a ball from any node in X to any node in Z using the Bayes ball algorithm

7.5 Factorization of Bayesian Network

  • 그래프에 속한 RV(Random Variable)의 결합분포(joint distribution)는 family의 모든 조건부 분포 P(ChildparentP(Child|parent의 곱으로 표현할 수 있다.

  • P(X1,X2,...,Xn)=iP(XiParents(Xi))P(X_1, X_2, ... , X_n) = \prod_i P(X_i|Parents(X_i))

  • Factorization theorem

    파라미터 수를 크게 줄일 수 있음!

    • Factorizes according to the node given its parents
      • P(X)=iP(XiXπi)P(X) = \prod_i P(X_i|X_{\pi_i})
      • XX 는 joint random variables .. ex.x1,x2,..,xnx_1,x_2, .. ,x_n
    • XπiX_{\pi_i} is the set of parents nodes of XiX_i

Plate Notation

  • Let's consider a certain Gaussian model
    • Many Xs

7.6 Inference Question on Bayesian Network

Question 1 : Likelihood

  • Given a set of evidence, what is the likelihood of the evidence set?
    • X=X1,...,XNX = {X_1, ... , X_N}: all randm variables
    • XV=Xk+1...XNX_V = {X_{k+1} ... X_N} : evidence variables(이미 관측한 내용, 알아보고자 하는 내용 등)
    • XH=XXV=X1...XkX_H = X - X_V = {X_1 ... X_k}: hidden variables
  • General form
    • P(XV)=XHP(XH,XV)P(X_V) = \sum_{X_H} P(X_H, X_V)

Question 2 : Conditional Probability

  • Given a set of evidence, what is the conditional probability of interested hidden variables?
    • $X_H = {Y, Z}
      • Y: interested hidden variables
      • Z: uninterested hidden variables
  • General form
    • P(YXV)=ZP(Y,ZXV)=ZP(Y,Z,XV)P(XV)=ZP(Y,Z,XV)y,zP(Y=y,Z=z,XV)P(Y|X_V) = \sum_Z P(Y,Z|X_V) \\ = \sum_Z \frac{P(Y,Z,X_V)}{P(X_V)} \\ = \sum_Z \frac{P(Y,Z,X_V)}{\sum_{y,z}P(Y=y, Z=z, X_V)} \\

Question 3 : Most Probable Assignment

argmaxaP(AB=true,MC=true)\underset{a}{argmax}P(A|B=true, MC=true)

  • Given a set of evidence, what is the most probable assignment, or explanation given the evidence?
    • Some variables of interests
    • Need to utilize the inference question 2
      • Conditional probability
    • Maximum a posteriori configuration of Y
  • Applications of a posteriori
    • Prediction
      • B, E \rarrA
    • Diagnosis
      • A \rarr B,E

7.7 Variable Elimination

Marginalization and Elimination

  • Computing joint probabilities is a key
    • How to compute them?
    • Many, many, many multiplications and summations
      • P(a=true,b=true,mc=true)=JCEP(a,b,E,JC,mc)=JCEP(JCa)P(mca)P(ab,E)P(E)p(b)P(a=true,b=true, mc=true)=\sum_{JC} \sum_E P(a,b,E,JC,mc) \\ = \sum_{JC}\sum_E P(JC|a) P(mc|a) P(a|b,E) P(E) p(b)
    • In big Oh notation?
  • Is there any better method(Brute-force 말고..)?
    • What-if we move around the summation?
      P(a,b,mc)=JCEP(a,b,E,JC,mc)=P(b)P(mca)JCP(JCa)EP(ab,E)P(E)P(a,b,mc) = \sum_{JC} \sum_{E}P(a,b,E,JC,mc) \\ = P(b)P(mc|a) \sum_{JC} P(JC|a) \sum_E P(a|b,E)P(E)
    • Did we reduced the computation complexity?

Variable Elimination

  • Preliminary
    • P(ejc,mc)=αP(e,jc,mc)P(e|jc,mc) = \alpha P(e,jc,mc)
    • $$\alpha = \frac{1}{P(jc,mc)}
  • Joint probability(e=jc=mc=true)
    • P(e,jc,mc,B,A)=αP(e)BP(b)AP(ab,e)P(jca)P(mca)P(e,jc,mc,B,A) = \\ \alpha P(e) \sum_B P(b) \sum_A P(a|b,e)P(jc|a)P(mc|a)
    • Line up the terms by the Topological order(B,E,A,JC,MC)
    • Consider a probability distribution as a function
      • $$f_E(E=t) = 0.002

7.8 Potential Function and Clique Graph

Potential function

  • P(A,B,C,D)=P(AB)P(BC)P(CD)P(D)P(A,B,C,D) \\ = P(A|B)P(B|C)P(C|D)P(D)
  • Let's define a potential function
    • Potential function:
      a function which is not a probability function yet, but once normalized it can be a probability distribution function
    • Potential function on nodes
      • ψ(a,b),ψ(b,c),ψ(c,d)\psi(a,b), \psi(b,c), \psi(c,d)

References
문일철. 2017. 인공지능 및 기계학습 개론II. KOOK

profile
You matter, never give up

0개의 댓글