Now through the process of message passing, we have moralized the graph as follows.
And the joint probability of the undirected graphical model is given by,
Where α is the index of message sending node, and i is the index of node that receives message.
The equation above means that the message of a factor from node α to node i is (1) product of all the inbound messages, (2) product of factor value to (1), (3) integrate(sum) out all the un-necessary variables.
On the example below, the process is described by the following.
(1) factor ψ1 receives message from the node X3.
μx3→ψ1
(2) Factor ψ1 applies the CPD onto the message.
ψ1∗μx3→ψ1
(3) Integrate out all the unnecessary variables with regards to X1
x3∑ψ1∗μx3→ψ1
The equation above shows the message passing from node X3toX1.
Summary of the Messages
Variable to Factor message
μx→f(x)=g∈{N(x)∣f}∏μg→x(x)
Factor to Variable message
μf→x(x)=Xf∣x∑θf(Xf)y∈{N(f)∣x}∏μy→f(y)
Where the first part of the equation (max term) is a process of integrating out all the unnecessary variables.
Sum-Product Belief Propagation
Input: a factor graph with no cycles Output: exact marginals for each variable and factor.
Algorithm:
1. Initialize the messages to the uniform distribution
μi→α(xi)=1,μα→i(xi)=1
Choose a root node
Send messages from the leaves to the root.
Send messages from the root to the leaves.
Def
A clique graph consists of a set of potentials, ϕ1(χ1),...,ϕn(χn) each defined on a set of variables χ1. For neighboring cliques on the graph, defined on sets of variables χ1,χj, the intersection χs=χi∩χj is called the separator and has a corresponding potential ϕs(χs).
A clique graph represents the function,
∏sϕs(χs)∏cϕc(χc)
Example
ϕ(χ1)ϕ(χ2)/ϕ(χ1∩χ2)
Example: Probability density
Let's assume we have a graph as following.
Form a new representation of the graph in which variables are clustered together, resulting in a singly-connected graph in the cluster variables.
Distribution can be written as product of marginal distributions, divided by a product of the intersection of the marginal distributions.
Not a remedy to the intractability.
We can obtain Junction Tree through variable elimination process.
The message passing protocol
Cluster B is allowed to send a message to a neighbar C only after it has received messages from all neighbors except C.
Pseudo Code
def Collect(C):
for B in children(C):
Collect (B)
send message to C
def Distribute(C):
for B in children(C):
send message to B
Distribution(C)
Message from Clique to another
The graph above is Message Passing protocol from Cluster B to Cluster C.
Cluster B, C have their own sets of random variables in their Clusters (or Cliques). B∩C is a set of random variables that are shared between the two clusters.
μB→C(u)=v∈B\C∑ψB(u∪v)(A,B)∈E∏μA→B(uA∪vA)
In short, the procedure is the following.
1. B Collects all the incoming messages
2. B Applies its own factor (ψB in the equation above)
3. Integrade all the random variables that C doesn't understand (i.e. C does not include)
Formal Junction Tree Algorithm
Moralization: Marry the parents (only for directed distributions).
Triangulation: Ensure that every loop of length 4 or more has a chord. In other words, every loop must have a minimal clique of size 3.
Junction Tree: Form a junction tree from cliques of the triangulated graph, removing any unnecessary links in a loop on the cluster graph. Algorithmically, this can be achieved by finding a tree with maximal spanning weight with weight given by the number of variables in the separator between cliques. Alternatively, given a clique elimination order, one may connect each clique to the single neighboring clique.
Potential Assignment: Assign potentials to junction tree cliques and set the separator potentials to unity.
Message Propagation
Some Facts about Belief Propagation
BP is exact on trees. (No approximation)
If BP converges it has reached a local minimum of an objective function (No proof of convergeance, though)
If it converges, convergence is fast near the fixed point.
Zoo of Network
UGM and DGM
Use to represent family of probability distributions
Clear definition of arrows and circles
Factor Graph
A way to represent factorization for both UGM and DGM
Bipartite Graph
More like a data structure
Not to read the independencies (i.e. no functionality of I-Map)
Clique graph or Junction Tree
A data structure used for exact inference and message passing
Nodes are cluster of variables
Not to read the independencies (i.e. no functionality of I-Map)