7.1 Probability Concepts
Probability
- It is relative frequency (상대적 빈도, P(A=true)
Conditional Probability
- 보통 더 관심이 많은 것은 어떤 주어진 상황, 새로운 사건이 일어났을 때 어떤 다른 사건이 일어날 확률임
- P(A=true|B=true)
- Out of all the outcomes in which B is true, how many also have A equal to true
7.2 Probability Theorems
- Law of Total Probability
- a.k.a "summing out" or marginalization
- P(a)=∑bP(a,b)=∑bP(a∣b)p(b)
- When B is any random variable
- Joint distribution contains the information we need to compute any
Computing with Probabilities: Chain Rule of Factorization
- P(a,b,c, ...z) = P(a|b,c, ..., z)P(b,c,...z)
- Repeatedly applying this idea, we can write
- P(a,b,c, ...z) = P(a|b,c, ...z)P(b|c, ...z)...P(z)
Joint Probability Distribution
- Joint probabilities can be between any number of variables
- P(Int=true, Effort=true, GPA=true)
- P(I=true)=sum of P(I,E,G) in rows with Int=true
Independence
- Variables A and B are independent if any of the following hold:
- P(A|B) = P(A)
- P(A,B) = P(A)P(B)
- P(B|A) = P(B)
Conditional vs. Marginal Independence
-
Marginal independence
- P(A|B) > P(A)
- This is not marginally independent!
- X and Y are independent if and only if P(X)=P(X|Y)
-
Conditional independence
- P(A|B,C) = P(A|C)
- C가 주어진 상황에서는 B와 A는 독립적인 경우
7.3 Interpretation of Bayesian Network
Detour: Naive Bayes Classifier
Bayesian Network
- A graphical notation of
- Random variables
- Conditional independence
- To obtain a compact representation of the full joint distributions
- Syntax
- A acyclic(싸이클X) and directed graph(방향성O)
- A set of nodes
- A random variable
- A conditional distribution given its parents
- P(X1∣Parents(X1))
- A set of links
- Direct influence from the parent to the child
Interpretation of Bayesian Network
Components of Bayesian Network
- Qualitative components
- Prior knowledge of casual relation
- Learning from data
- Frequently used structures
- Structural aspects
- Quantitative components
- Conditional probability tables
- Probability distribution assigned to nodes
- Probability computing is related to both
- Quantitative and Qualitative
7.4 Bayes Ball Algorithm
Typical Local Structures
- Common parent
- Fixing "alarm" decouples "JohnCalls" and "MaryCalls"
- J⊥M|A~ 알람이 울리거나/안울렸을 때(A가 주어졌을 때) JohnCalls 과 MaryCalls는 독립이다.
- P(J,M|A)=P(J|A)P(M|A)
- Cascading
- Fixing "alarm" decouples "Buglary" and "MaryCalls"
- B⊥M|A ~ 알람이 주어졌을 때 도둑이 들거나 안들거나 상관없이 MarryCalls의 확률은 독립적이다.
- P(M|B,A)=P(M|A)
- V-Structure
- Fixing "alarm" couples "Buglary" and "Earthquake" (B와 E가 A라는 common child를 가짐)
- 특정 정보를 알게되면, 다른 정보들의 관계가 생김
- 알람이 울렸을 때 지진이 일어나지 않았다면 도둑이 들었다고 유추할 수 있음
- ∼ (B ⊥ E|A) : A가 일어났을 때 B와 E는 독립적이지 않음. 즉, 관계가 생김.
- P(B,E,A)=P(B)P(E)P(A|B,E)
- 특정 정보가 알려져 있지 않으면 독립임
Bayes Ball Algorithm
Conditional Independence 를 아는 것이 중요함!
특정 랜덤 변수 하나랑 다른 랜덤 변수 하나랑 독립일까? , 어떤 상황에서는 또 독립일까?
Bayes Ball을 굴려서 굴러가면 독립이 아님(=non-independent)
- Purpose: checking XA⊥XB∣Xc
- Shade all nodes in Xc
- Place balls at each node in XA
- Let the ball rolling on the graph by Bayes ball rules
- Then, ask whether there is any ball reaching XB
- Markov Blanket
- A\perpB∣Blanket
- Blanket = {parents, children, children's other paretns}
- 이 Blanket이 되면 Blanket 밖의 노드들(=B)과는 Conditionally Independent 하다
- D-Seperation
- X is d-seperated (directly seperated) from Z given Y if we can't send a ball from any node in X to any node in Z using the Bayes ball algorithm
7.5 Factorization of Bayesian Network
-
그래프에 속한 RV(Random Variable)의 결합분포(joint distribution)는 family의 모든 조건부 분포 P(Child∣parent의 곱으로 표현할 수 있다.
-
P(X1,X2,...,Xn)=∏iP(Xi∣Parents(Xi))
-
Factorization theorem
파라미터 수를 크게 줄일 수 있음!
- Factorizes according to the node given its parents
- P(X)=∏iP(Xi∣Xπi)
- X 는 joint random variables .. ex.x1,x2,..,xn
- Xπi is the set of parents nodes of Xi
Plate Notation
- Let's consider a certain Gaussian model
7.6 Inference Question on Bayesian Network
Question 1 : Likelihood
- Given a set of evidence, what is the likelihood of the evidence set?
- X=X1,...,XN: all randm variables
- XV=Xk+1...XN : evidence variables(이미 관측한 내용, 알아보고자 하는 내용 등)
- XH=X−XV=X1...Xk: hidden variables
- General form
- P(XV)=∑XHP(XH,XV)
Question 2 : Conditional Probability
- Given a set of evidence, what is the conditional probability of interested hidden variables?
- $X_H = {Y, Z}
- Y: interested hidden variables
- Z: uninterested hidden variables
- General form
P(Y∣XV)=Z∑P(Y,Z∣XV)=Z∑P(XV)P(Y,Z,XV)=Z∑∑y,zP(Y=y,Z=z,XV)P(Y,Z,XV)
Question 3 : Most Probable Assignment
aargmaxP(A∣B=true,MC=true)
- Given a set of evidence, what is the most probable assignment, or explanation given the evidence?
- Some variables of interests
- Need to utilize the inference question 2
- Maximum a posteriori configuration of Y
- Applications of a posteriori
7.7 Variable Elimination
Marginalization and Elimination
- Computing joint probabilities is a key
- How to compute them?
- Many, many, many multiplications and summations
P(a=true,b=true,mc=true)=JC∑E∑P(a,b,E,JC,mc)=JC∑E∑P(JC∣a)P(mc∣a)P(a∣b,E)P(E)p(b)
- In big Oh notation?
- Is there any better method(Brute-force 말고..)?
- What-if we move around the summation?
P(a,b,mc)=JC∑E∑P(a,b,E,JC,mc)=P(b)P(mc∣a)JC∑P(JC∣a)E∑P(a∣b,E)P(E)
- Did we reduced the computation complexity?
Variable Elimination
- Preliminary
- P(e∣jc,mc)=αP(e,jc,mc)
- $$\alpha = \frac{1}{P(jc,mc)}
- Joint probability(e=jc=mc=true)
P(e,jc,mc,B,A)=αP(e)B∑P(b)A∑P(a∣b,e)P(jc∣a)P(mc∣a)
- Line up the terms by the Topological order(B,E,A,JC,MC)
- Consider a probability distribution as a function
7.8 Potential Function and Clique Graph
Potential function
P(A,B,C,D)=P(A∣B)P(B∣C)P(C∣D)P(D)
- Let's define a potential function
- Potential function:
a function which is not a probability function yet, but once normalized it can be a probability distribution function
- Potential function on nodes
- ψ(a,b),ψ(b,c),ψ(c,d)
References
문일철. 2017. 인공지능 및 기계학습 개론II. KOOK