Message Passing and Junction Tree Algorithms

홍성우·2021년 1월 15일
0

Bayesian

목록 보기
3/6

Recap

  • Message Passing is a process of converting directed graphical model and moralizing the graph.

Example
Let's assume we have a direct graph as below.

The joint probability of all the variables in the graph is given by,

P(C,D,I,G,S,L,J,H)=P(C)P(DC)P(I)P(GD,I)P(SI)P(LG)P(JL,S)P(HG,J)P(C,D,I,G,S,L,J,H)\\ = P(C)P(D|C)P(I)P(G|D,I)P(S|I)P(L|G)P(J|L,S)P(H|G,J)

Now through the process of message passing, we have moralized the graph as follows.

And the joint probability of the undirected graphical model is given by,

p(C,D,I,G,S,L,J,H)=ψC(C)ψD(D,C)ψI(I)ψG(G,I,D)ψS(S,I)ψL(L,G,S)ψJ(L,S,J)ψH(H,G,J)p(C,D,I,G,S,L,J,H)\\ = \psi_C(C)\psi_D(D,C)\psi_I(I)\psi_G(G,I,D)\psi_S(S,I)\psi_L(L,G,S)\psi_J(L,S,J)\psi_H(H,G,J)

Now we pick an ordering in which we desire to remove random variables.

Factor Graph

  • Factor graph is a graphical model representation that unifies directed and undirected models
  • Undirected bipartite graph with two kinds of nodes
    - Round nodes represent variables
    • Square nodes represent factors
  • There is an edge from each variable to every factor that mentions it.

  • Variables:
    X={X1,...,Xi,...,Xn}X=\{X_1,...,X_i,...,X_n\}
  • Factors:
    ψa,ψb,ψc,...\psi_a, \psi_b, \psi_c, ..., where a,...{1,...,n}a,... \sub \{1,...,n\}
  • Joint Distribution (Gibbs Distribution):
    p(x)=1Zαψα(xα)p(x)=\frac 1 Z \prod_\alpha \psi_\alpha(x_\alpha)

Inference Example

Consider a directed graph below.

The joint probability of the directed graph above is represented as follows.

p(a,b,c,d,e)=p(ab)p(bc,d)p(c)p(d)p(ed)p(a,b,c,d,e)=p(a|b)p(b|c,d)p(c)p(d)p(e|d)

We can moralize the directed graph as the following.

The undirected graph above can be factorized as the following.

f1(a,b)f2(b,c,d)f3(c)f4(d,e)f5(d)f_1(a,b)f_2(b,c,d)f_3(c)f_4(d,e)f_5(d)
  • How would we calculate p(a,b)?p(a,b)?
    p(a,b)=f1(a,b)c,df2(b,c,d)f3(c)f5(d)ef4(d,e)p(a,b)=f_1(a,b)\sum_{c,d}f_2(b,c,d)f_3(c)f_5(d)\sum_ef_4(d,e)
  • Note that f1(a,b)=P(ab)f_1(a,b)=P(a|b) and f4(d,e)=P(ed)f_4(d,e)=P(e|d)
  • Note that ef4(d,e)=me(d)\sum_e f_4(d,e) = m_e(d)
  • Note that f5(d)=md(d)f_5(d)=m_d(d)

Message passing in Belief Propagation

  • ψ\psi, an unnormalized function of the dependence relation.
  • Conceptually, we can view ψ\psi as a message passed on from other nodes onto a node. Image below gives an intuition of message passing.

  • A "belief" in X1X_1 is basically a product of all the incoming factor ψ1ψ2ψ3\psi_1 * \psi_2 * \psi_3.

Variable Belief

bi(xi)=αN(i)μαi(xi)b_i(x_i)=\prod_{\alpha \in N(i)} \mu_{\alpha \rightarrow i}(x_i)

Variable Message

μiα(xi)=αN(i)μαi(xi)\mu_{i\rightarrow \alpha}(x_i)=\prod_{\alpha \in N(i)} \mu_{\alpha \rightarrow i}(x_i)

The equation above means that the outbound message from a node is equal to the product of all the inbound messages.

Factor Belief

bα(xα)=ψα(xα)iN(α)μiα(xα[i])b_\alpha(x_\alpha) = \psi_\alpha(x_\alpha) \prod_{i\in N(\alpha)} \mu_{i \rightarrow \alpha}(x_\alpha[i])

Factor Message

μαi(xi)=xα:xα[i]=xiψα(xα)jN(α)for all j=iμjα(xα[i])\mu_{\alpha \rightarrow i}(x_i)=\sum_{x_\alpha:x_\alpha[i]=x_i} \psi_\alpha(x_\alpha) \prod_{j\in N(\alpha) for\ all\ j\cancel=i} \mu_{j\rightarrow \alpha}(x_\alpha[i])

Where α\alpha is the index of message sending node, and ii is the index of node that receives message.
The equation above means that the message of a factor from node α\alpha to node ii is (1) product of all the inbound messages, (2) product of factor value to (1), (3) integrate(sum) out all the un-necessary variables.

On the example below, the process is described by the following.

  • (1) factor ψ1\psi_1 receives message from the node X3X_3.
    μx3ψ1\mu_{x_3\rightarrow \psi_1}
  • (2) Factor ψ1\psi_1 applies the CPD onto the message.
    ψ1μx3ψ1\psi_1 * \mu_{x_3\rightarrow \psi_1}
  • (3) Integrate out all the unnecessary variables with regards to X1X_1
    x3ψ1μx3ψ1\sum_{x_3}\psi_1 * \mu_{x_3\rightarrow \psi_1}
    The equation above shows the message passing from node X3 to X1X_3\ to\ X_1.

Summary of the Messages

Variable to Factor message

μxf(x)=g{N(x)f}μgx(x)\mu_{x \rightarrow f} (x) = \prod_{g\in \{N(x)|f\}} \mu_{g \rightarrow x} (x)

Factor to Variable message

μfx(x)=Xfxθf(Xf)y{N(f)x}μyf(y)\mu_{f\rightarrow x}(x) = \sum_{X_f|x}\theta_f(X_f) \prod_{y\in\{N(f)|x\}} \mu_{y \rightarrow f}(y)

Where the first part of the equation (max term) is a process of integrating out all the unnecessary variables.

Sum-Product Belief Propagation

Input: a factor graph with no cycles
Output: exact marginals for each variable and factor.

Algorithm:
1. Initialize the messages to the uniform distribution

μiα(xi)=1, μαi(xi)=1\mu_{i\rightarrow \alpha}(x_i)=1,\ \mu_{\alpha \rightarrow i}(x_i)=1
  1. Choose a root node
  2. Send messages from the leaves to the root.
    Send messages from the root to the leaves.
    μiα(xi)=αN(i)\αμαi(xi)\mu_{i\rightarrow \alpha}(x_i)=\prod_{\alpha \in N(i)\backslash\alpha} \mu_{\alpha \rightarrow i}(x_i)
    μαi(xi)=xα:xα[i]=xiψα(xα)jN(α)\iμjα(xα[i])\mu_{\alpha \rightarrow i}(x_i)=\sum_{x_{\alpha}:x_{\alpha}[i]=x_i}\psi_\alpha(x_\alpha)\prod_{j\in N(\alpha)\backslash i}\mu_{j \rightarrow \alpha}(x_\alpha[i])
  3. Compute the belief (unnormalized marginals)
    bi(xi)=αN(i)μαi(xi)b_i(x_i)=\prod_{\alpha \in N(i)}\mu_{\alpha \rightarrow i}(x_i)
    bα(xα)=ψα(xα)iN(α)μiα(xα[i])b_\alpha(x_\alpha)=\psi_\alpha(x_\alpha) \prod_{i \in N(\alpha)} \mu_{i \rightarrow \alpha}(x_\alpha[i])
  4. Normalize beliefs and return the exact marginals.
    pi(xi)bi(xi)pα(xα)bα(xα)p_i(x_i) \propto b_i(x_i)\\ p_\alpha(x_\alpha) \propto b_\alpha(x_\alpha)

Loops make issues in Message Passing

Recall that message Passing requires leaf nodes in order for the (forward, backward) propagation to end.

Suppose we have a graph given by

p(a,b,c,d)=f1(a,b)f2(b,c)f3(c,d)f4(a,d)p(a,b,c,d)=f_1(a,b)f_2(b,c)f_3(c,d)f_4(a,d)

If we eliminate variables, the graph becomes

p(a,b,c)=f1(a,b)f2(b,c)df3(c,d)f4(c,d)=f1(a,b)f2(b,c)f5(a,c)p(a,b,c)=f_1(a,b)f_2(b,c)\sum_d f_3(c,d)f_4(c,d)=f_1(a,b)f_2(b,c)f_5(a,c)

Which is still a connected graph.

Clique Graph

Def
A clique graph consists of a set of potentials, ϕ1(χ1),...,ϕn(χn)\phi_1(\chi_1),...,\phi_n(\chi_n) each defined on a set of variables χ1\chi_1. For neighboring cliques on the graph, defined on sets of variables χ1,χj\chi_1, \chi_j, the intersection χs=χiχj\chi_s=\chi_i\cap\chi_j is called the separator and has a corresponding potential ϕs(χs)\phi_s(\chi_s).

A clique graph represents the function,

cϕc(χc)sϕs(χs)\frac{\prod_c \phi_c(\chi^c)}{\prod_s \phi_s(\chi^s)}

Example


ϕ(χ1)ϕ(χ2)/ϕ(χ1χ2)\phi(\chi^1)\phi(\chi^2)/\phi(\chi^1\cap\chi^2)

Example: Probability density
Let's assume we have a graph as following.

p(a,b,c,d)=ϕ(a,b,c)ϕ(b,c,d)Z=p(a,b,c)p(b,c,d)p(c,b)p(a,b,c,d)=\frac{\phi(a,b,c)\phi(b,c,d)}{Z}=\frac{p(a,b,c)p(b,c,d)}{p(c,b)}

Proof Summary

ZP(a,b,c)=dϕ(a,b,c)ϕ(b,c,d)ZP(b,c,d)=aϕ(a,b,c)ϕ(b,c,d)Z*P(a,b,c)=\sum_d \phi(a,b,c)\phi(b,c,d)\\ Z*P(b,c,d)=\sum_a \phi(a,b,c)\phi(b,c,d)\\

By Definition.
Multiplying both equations,

ZP(a,b,c)P(b,c,d)=a,dϕ(a,b,c)Z'*P(a,b,c)P(b,c,d)=\sum_{a,d}\phi(a,b,c)

Junction Tree

  • Form a new representation of the graph in which variables are clustered together, resulting in a singly-connected graph in the cluster variables.

  • Distribution can be written as product of marginal distributions, divided by a product of the intersection of the marginal distributions.

  • Not a remedy to the intractability.

  • We can obtain Junction Tree through variable elimination process.

  • The message passing protocol
    Cluster B is allowed to send a message to a neighbar C only after it has received messages from all neighbors except C.

Pseudo Code

def Collect(C):
	for B in children(C):
    	Collect (B)
        send message to C
        
def Distribute(C):
	for B in children(C):
    	send message to B
        Distribution(C)

Message from Clique to another

The graph above is Message Passing protocol from Cluster BB to Cluster CC.
Cluster BB, CC have their own sets of random variables in their Clusters (or Cliques).
BCB\cap C is a set of random variables that are shared between the two clusters.

μBC(u)=vB\CψB(uv)(A,B)EμAB(uAvA)\mu_{B\rightarrow C}(u) = \sum_{v\in B\backslash C}\psi_B(u\cup v) \prod_{(A,B)\in E}\mu_{A\rightarrow B}(u_A \cup v_A)

In short, the procedure is the following.
1. B Collects all the incoming messages
2. B Applies its own factor (ψB\psi_B in the equation above)
3. Integrade all the random variables that C doesn't understand (i.e. C does not include)

Formal Junction Tree Algorithm

  1. Moralization: Marry the parents (only for directed distributions).
  2. Triangulation: Ensure that every loop of length 4 or more has a chord. In other words, every loop must have a minimal clique of size 3.
  3. Junction Tree: Form a junction tree from cliques of the triangulated graph, removing any unnecessary links in a loop on the cluster graph. Algorithmically, this can be achieved by finding a tree with maximal spanning weight with weight given by the number of variables in the separator between cliques. Alternatively, given a clique elimination order, one may connect each clique to the single neighboring clique.
  4. Potential Assignment: Assign potentials to junction tree cliques and set the separator potentials to unity.
  5. Message Propagation

Some Facts about Belief Propagation

  • BP is exact on trees. (No approximation)
  • If BP converges it has reached a local minimum of an objective function (No proof of convergeance, though)
  • If it converges, convergence is fast near the fixed point.

Zoo of Network

UGM and DGM

  • Use to represent family of probability distributions
  • Clear definition of arrows and circles

Factor Graph

  • A way to represent factorization for both UGM and DGM
  • Bipartite Graph
  • More like a data structure
  • Not to read the independencies (i.e. no functionality of I-Map)

Clique graph or Junction Tree

  • A data structure used for exact inference and message passing
  • Nodes are cluster of variables
  • Not to read the independencies (i.e. no functionality of I-Map)

0개의 댓글

관련 채용 정보