# Graphical Models: D-Separation

5 min read • Published: November 29, 2018

For three random variables $X$, $Y$ and $Z$, we say $X$ is conditionally independent of $Y$ given $Z$ iff

$p(X, Y | Z) = p(X | Z) p(Y | Z).$

We can use a shorthand notation

$X \bigci Y | Z$

Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe $Z$.

## Tail-tail

The first case is called the tail-tail.

We can factor the joint distribution to get

$p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z)$

and conditioning on the value of $Z$ we get (using the Bayes’ theorem)

$p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z).$

From this we can immediately see that conditioning on $Z$ in the tail-tail case makes $X$ and $Y$ independent, that is $X \bigci Y | Z$.

The second case is called the head-tail and looks as the following.

We can again write the joint distribution for the graph

$p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)$

and again conditioning on $Z$ we get (using rules of conditional probability)

\begin{aligned} p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\ &= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\ &= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\ &= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\ &= p(X | Z) p(Y | Z) \end{aligned}

and so again, $X$ and $Y$ are conditionally independent given $Z$, that is $X \bigci Y | Z$.

#### Checking marginal independence

For completeness, we can also check if $X$ and $Y$ are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.

$p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)$

which gives us the following when marginalizing over $Z$

$p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X)$

from which we can immediately see it does not factorize into $p(X) p(Y)$ in the general case, and thus $X$ and $Y$ are not marginally independent.

The last case is called the head-head and is a little bit tricky

We can again write out the joint distribution

$p(X, Y, Z) = p(X) p(Y) p(Z | X, Y),$

but this does not immediately help us when we try to condition on $Z$, we would want

$p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z)$

which does not hold in general. For example, consider $X, Y \sim Bernoulli(0.5)$ and $Z = 1$ if $X = Y$, and $0$ otherwise. In this case if we know $Z$ and observe $X$, it immediately tells us the value of $Y$, hence $X$ and $Y$ are not conditionally independent given $Z$.

We can however do a little trick and write the $p(X, Y)$ as a marginalization over $Z$, that is

$p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y)$

since $\sum_Z p(Z | X, Y) = 1$. As a result, in the head-head case we have marginal independence between $X$ and $Y$, that is $X \bigci Y$.

## D-separation

Having shown the three cases, we can finally define d-separation. Let $G$ be a DAG, and let $A, B, C$ be disjoint subsets of vertices.

A path between two vertices is blocked if it passes through a vertex $v$, such that either:

• the edges are head-tail or tail-tail, and $v \in C$, or
• the edges are head-head, and $v \not \in C$, and neither are any of its descendants.

We say that $A$ and $B$ are d-separated by $C$ if all paths from a vertex of $A$ to a vertex of $B$ are blocked w.r.t. $C$. And now comes the important part, if $A$ and $B$ are d-separated by $C$, then $A \bigci B\ |\ C$.

Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.

## Examples

To get a feel for d-separation, let us look at the following example ($B$ is observed).

We can immediately see that $A \bigci D | B$ since this is the head-tail case. We can also see that $A \not{\bigci} E | B$ (not conditionally independent), because while the path through $B$ is blocked, the path through $C$ is not.