$$ \newcommand{\bigci}{\perp\mkern-10mu\perp} $$
This article is a brief overview of conditional independence in graphical models, and the related d-separation. Let us begin with a definition.
For three random variables $X$, $Y$ and $Z$, we say $X$ is conditionally independent of $Y$ given $Z$ iff
$$ p(X, Y | Z) = p(X | Z) p(Y | Z). $$
We can use a shorthand notation
$$ X \bigci Y | Z $$
Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe $Z$.
Tail-tail
The first case is called the tail-tail.
We can factor the joint distribution to get
$$ p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z) $$
and conditioning on the value of $Z$ we get (using the Bayes’ theorem)
$$ p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z). $$
From this we can immediately see that conditioning on $Z$ in the tail-tail case makes $X$ and $Y$ independent, that is $X \bigci Y | Z$.
Head-tail
The second case is called the head-tail and looks as the following.
We can again write the joint distribution for the graph
$$ p(X, Y, Z) = p(X) p(Z | X) p(Y | Z) $$
and again conditioning on $Z$ we get (using rules of conditional probability)
$$ \begin{align} p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\\\ &= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\\\ &= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\\\ &= p(X | Z) p(Y | Z) \end{align} $$
and so again, $X$ and $Y$ are conditionally independent given $Z$, that is $X \bigci Y | Z$.
Checking marginal independence
For completeness, we can also check if $X$ and $Y$ are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.
$$ p(X, Y, Z) = p(X) p(Z | X) p(Y | Z) $$
which gives us the following when marginalizing over $Z$
$$ p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X) $$
from which we can immediately see it does not factorize into $p(X) p(Y)$ in the general case, and thus $X$ and $Y$ are not marginally independent.
Head-head
The last case is called the head-head and is a little bit tricky
We can again write out the joint distribution
$$ p(X, Y, Z) = p(X) p(Y) p(Z | X, Y), $$
but this does not immediately help us when we try to condition on $Z$, we would want
$$ p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z) $$
which does not hold in general. For example, consider $X, Y \sim Bernoulli(0.5)$ and $Z = 1$ if $X = Y$, and $0$ otherwise. In this case if we know $Z$ and observe $X$, it immediately tells us the value of $Y$, hence $X$ and $Y$ are not conditionally independent given $Z$.
We can however do a little trick and write the $p(X, Y)$ as a marginalization over $Z$, that is
$$ p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y) $$
since $\sum_Z p(Z | X, Y) = 1$. As a result, in the head-head case we have marginal independence between $X$ and $Y$, that is $X \bigci Y$.
D-separation
Having shown the three cases, we can finally define d-separation. Let $G$ be a DAG, and let $A, B, C$ be disjoint subsets of vertices.
A path between two vertices is blocked if it passes through a vertex $v$, such that either:
- the edges are head-tail or tail-tail, and $v \in C$, or
- the edges are head-head, and $v \not \in C$, and neither are any of its descendants.
We say that $A$ and $B$ are d-separated by $C$ if all paths from a vertex of $A$ to a vertex of $B$ are blocked w.r.t. $C$. And now comes the important part, if $A$ and $B$ are d-separated by $C$, then $A \bigci B\ |\ C$.
Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.
Examples
To get a feel for d-separation, let us look at the following example ($B$ is observed).
We can immediately see that $A \bigci D | B$ since this is the head-tail case. We can also see that $A \not{\bigci} E | B$ (not conditionally independent), because while the path through $B$ is blocked, the path through $C$ is not.
Graphical Models