Graphical Models: D-Separation | Jakub Arnold's Blog

$$ \newcommand{\bigci}{\perp\mkern-10mu\perp} $$

This article is a brief overview of conditional independence in graphical models, and the related d-separation. Let us begin with a definition.

For three random variables $X$, $Y$ and $Z$, we say $X$ is conditionally independent of $Y$ given $Z$ iff

$$ p(X, Y | Z) = p(X | Z) p(Y | Z). $$

We can use a shorthand notation

$$ X \bigci Y | Z $$

Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe $Z$.

Tail-tail

The first case is called the tail-tail.

We can factor the joint distribution to get

$$ p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z) $$

and conditioning on the value of $Z$ we get (using the Bayes’ theorem)

$$ p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z). $$

From this we can immediately see that conditioning on $Z$ in the tail-tail case makes $X$ and $Y$ independent, that is $X \bigci Y | Z$.

Head-tail

The second case is called the head-tail and looks as the following.

We can again write the joint distribution for the graph

$$ p(X, Y, Z) = p(X) p(Z | X) p(Y | Z) $$

and again conditioning on $Z$ we get (using rules of conditional probability)

$$ \begin{align} p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\\\ &= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\\\ &= p(X | Z) p(Y | Z) \end{align} $$

and so again, $X$ and $Y$ are conditionally independent given $Z$, that is $X \bigci Y | Z$.

Checking marginal independence

For completeness, we can also check if $X$ and $Y$ are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.

$$ p(X, Y, Z) = p(X) p(Z | X) p(Y | Z) $$

which gives us the following when marginalizing over $Z$

$$ p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X) $$

from which we can immediately see it does not factorize into $p(X) p(Y)$ in the general case, and thus $X$ and $Y$ are not marginally independent.

Head-head

The last case is called the head-head and is a little bit tricky

We can again write out the joint distribution

$$ p(X, Y, Z) = p(X) p(Y) p(Z | X, Y), $$

but this does not immediately help us when we try to condition on $Z$, we would want

$$ p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z) $$

which does not hold in general. For example, consider $X, Y \sim Bernoulli(0.5)$ and $Z = 1$ if $X = Y$, and $0$ otherwise. In this case if we know $Z$ and observe $X$, it immediately tells us the value of $Y$, hence $X$ and $Y$ are not conditionally independent given $Z$.

We can however do a little trick and write the $p(X, Y)$ as a marginalization over $Z$, that is

$$ p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y) $$

since $\sum_Z p(Z | X, Y) = 1$. As a result, in the head-head case we have marginal independence between $X$ and $Y$, that is $X \bigci Y$.

D-separation

Having shown the three cases, we can finally define d-separation. Let $G$ be a DAG, and let $A, B, C$ be disjoint subsets of vertices.

A path between two vertices is blocked if it passes through a vertex $v$, such that either:

the edges are head-tail or tail-tail, and $v \in C$, or
the edges are head-head, and $v \not \in C$, and neither are any of its descendants.

We say that $A$ and $B$ are d-separated by $C$ if all paths from a vertex of $A$ to a vertex of $B$ are blocked w.r.t. $C$. And now comes the important part, if $A$ and $B$ are d-separated by $C$, then $A \bigci B\ |\ C$.

Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.

Examples

To get a feel for d-separation, let us look at the following example ($B$ is observed).

We can immediately see that $A \bigci D | B$ since this is the head-tail case. We can also see that $A \not{\bigci} E | B$ (not conditionally independent), because while the path through $B$ is blocked, the path through $C$ is not.

Previous:
Variational Inference - Deriving ELBO

Next:
The Gaussian Distribution - Basic Properties