\newcommand{\bigci}{\perp\mkern-10mu\perp}
This article is a brief overview of conditional independence in graphical models, and the related d-separation. Let us begin with a definition.
For three random variables X, Y and Z, we say X is conditionally independent of Y given Z iff
p(X, Y | Z) = p(X | Z) p(Y | Z).
We can use a shorthand notation
X \bigci Y | Z
Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe Z.
Tail-tail
The first case is called the tail-tail.
We can factor the joint distribution to get
p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z)
and conditioning on the value of Z we get (using the Bayes’ theorem)
p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z).
From this we can immediately see that conditioning on Z in the tail-tail case makes X and Y independent, that is X \bigci Y | Z.
Head-tail
The second case is called the head-tail and looks as the following.
We can again write the joint distribution for the graph
p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)
and again conditioning on Z we get (using rules of conditional probability)
\begin{align} p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\\\ &= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\\\ &= p(X | Z) p(Y | Z) \end{align}
and so again, X and Y are conditionally independent given Z, that is X \bigci Y | Z.
Checking marginal independence
For completeness, we can also check if X and Y are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.
p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)
which gives us the following when marginalizing over Z
p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X)
from which we can immediately see it does not factorize into p(X) p(Y) in the general case, and thus X and Y are not marginally independent.
Head-head
The last case is called the head-head and is a little bit tricky
We can again write out the joint distribution
p(X, Y, Z) = p(X) p(Y) p(Z | X, Y),
but this does not immediately help us when we try to condition on Z, we would want
p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z)
which does not hold in general. For example, consider X, Y \sim Bernoulli(0.5) and Z = 1 if X = Y, and 0 otherwise. In this case if we know Z and observe X, it immediately tells us the value of Y, hence X and Y are not conditionally independent given Z.
We can however do a little trick and write the p(X, Y) as a marginalization over Z, that is
p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y)
since \sum_Z p(Z | X, Y) = 1. As a result, in the head-head case we have marginal independence between X and Y, that is X \bigci Y.
D-separation
Having shown the three cases, we can finally define d-separation. Let G be a DAG, and let A, B, C be disjoint subsets of vertices.
A path between two vertices is blocked if it passes through a vertex v, such that either:
- the edges are head-tail or tail-tail, and v \in C, or
- the edges are head-head, and v \not \in C, and neither are any of its descendants.
We say that A and B are d-separated by C if all paths from a vertex of A to a vertex of B are blocked w.r.t. C. And now comes the important part, if A and B are d-separated by C, then A \bigci B\ |\ C.
Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.
Examples
To get a feel for d-separation, let us look at the following example (B is observed).
We can immediately see that A \bigci D | B since this is the head-tail case. We can also see that A \not{\bigci} E | B (not conditionally independent), because while the path through B is blocked, the path through C is not.
Graphical Models