Loading [MathJax]/extensions/TeX/newcommand.js

Jakub Arnold's Blog


Graphical Models: D-Separation

\newcommand{\bigci}{\perp\mkern-10mu\perp}

This article is a brief overview of conditional independence in graphical models, and the related d-separation. Let us begin with a definition.

For three random variables X, Y and Z, we say X is conditionally independent of Y given Z iff

p(X, Y | Z) = p(X | Z) p(Y | Z).

We can use a shorthand notation

X \bigci Y | Z

Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe Z.

Tail-tail

The first case is called the tail-tail.

We can factor the joint distribution to get

p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z)

and conditioning on the value of Z we get (using the Bayes’ theorem)

p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z).

From this we can immediately see that conditioning on Z in the tail-tail case makes X and Y independent, that is X \bigci Y | Z.

Head-tail

The second case is called the head-tail and looks as the following.

We can again write the joint distribution for the graph

p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)

and again conditioning on Z we get (using rules of conditional probability)

\begin{align} p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\\\ &= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\\\ &= p(X | Z) p(Y | Z) \end{align}

and so again, X and Y are conditionally independent given Z, that is X \bigci Y | Z.

Checking marginal independence

For completeness, we can also check if X and Y are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.

p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)

which gives us the following when marginalizing over Z

p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X)

from which we can immediately see it does not factorize into p(X) p(Y) in the general case, and thus X and Y are not marginally independent.

Head-head

The last case is called the head-head and is a little bit tricky

We can again write out the joint distribution

p(X, Y, Z) = p(X) p(Y) p(Z | X, Y),

but this does not immediately help us when we try to condition on Z, we would want

p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z)

which does not hold in general. For example, consider X, Y \sim Bernoulli(0.5) and Z = 1 if X = Y, and 0 otherwise. In this case if we know Z and observe X, it immediately tells us the value of Y, hence X and Y are not conditionally independent given Z.

We can however do a little trick and write the p(X, Y) as a marginalization over Z, that is

p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y)

since \sum_Z p(Z | X, Y) = 1. As a result, in the head-head case we have marginal independence between X and Y, that is X \bigci Y.

D-separation

Having shown the three cases, we can finally define d-separation. Let G be a DAG, and let A, B, C be disjoint subsets of vertices.

A path between two vertices is blocked if it passes through a vertex v, such that either:

We say that A and B are d-separated by C if all paths from a vertex of A to a vertex of B are blocked w.r.t. C. And now comes the important part, if A and B are d-separated by C, then A \bigci B\ |\ C.

Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.

Examples

To get a feel for d-separation, let us look at the following example (B is observed).

We can immediately see that A \bigci D | B since this is the head-tail case. We can also see that A \not{\bigci} E | B (not conditionally independent), because while the path through B is blocked, the path through C is not.