Graphical Models: D-Separation

5 min read • Published: November 29, 2018

This article is a brief overview of conditional independence in graphical models, and the related d-separation. Let us begin with a definition.

For three random variables XX, YY and ZZ, we say XX is conditionally independent of YY given ZZ iff

p(X,YZ)=p(XZ)p(YZ).p(X, Y | Z) = p(X | Z) p(Y | Z).

We can use a shorthand notation

XYZX \bigci Y | Z

Before we can define d-separation, let us first show three different types of graphs. Consider the same three variables as before, we’ll be interested in conditional independence based on whether we observe ZZ.


The first case is called the tail-tail.

g Z Z X X Z->X Y Y Z->Y

We can factor the joint distribution to get

p(X,Y,Z)=p(XZ)p(YZ)p(Z)p(X, Y, Z) = p(X | Z) p(Y | Z) p(Z)

and conditioning on the value of ZZ we get (using the Bayes’ theorem)

p(X,YZ)=p(X,Y,Z)p(Z)=p(XZ)p(YZ)p(Z)p(Z)=p(XZ)p(YZ).p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} = \frac{p(X | Z) p(Y | Z) p(Z)}{p(Z)} = p(X | Z) p(Y | Z).

From this we can immediately see that conditioning on ZZ in the tail-tail case makes XX and YY independent, that is XYZX \bigci Y | Z.


The second case is called the head-tail and looks as the following.

g Z Z Y Y Z->Y X X X->Z

We can again write the joint distribution for the graph

p(X,Y,Z)=p(X)p(ZX)p(YZ)p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)

and again conditioning on ZZ we get (using rules of conditional probability)

p(X,YZ)=p(X,Y,Z)p(Z)=p(X)p(ZX)p(YZ)p(Z)=p(X,Z)p(YZ)p(Z)=p(XZ)p(Z)p(YZ)p(Z)=p(XZ)p(YZ)\begin{aligned} p(X, Y | Z) &= \frac{p(X, Y, Z)}{p(Z)} \\ &= \frac{p(X) p(Z | X) p(Y | Z)}{p(Z)} \\ &= \frac{p(X, Z) p(Y | Z)}{p(Z)} \\ &= \frac{p(X | Z) p(Z) p(Y | Z)}{p(Z)} \\ &= p(X | Z) p(Y | Z) \end{aligned}

and so again, XX and YY are conditionally independent given ZZ, that is XYZX \bigci Y | Z.

Checking marginal independence

For completeness, we can also check if XX and YY are marginally independent, which they shouldn’t be, since we just showed they’re conditionally independent.

p(X,Y,Z)=p(X)p(ZX)p(YZ)p(X, Y, Z) = p(X) p(Z | X) p(Y | Z)

which gives us the following when marginalizing over ZZ

p(X,Y)=Zp(X,Y,Z)=p(X)Zp(ZX)p(YZ)=p(X)Zp(Y,ZX)=p(X)p(YX)p(X, Y) = \sum_Z p(X, Y, Z) = p(X) \sum_Z p(Z | X) p(Y | Z) = p(X) \sum_Z p(Y, Z | X) = p(X) p(Y | X)

from which we can immediately see it does not factorize into p(X)p(Y)p(X) p(Y) in the general case, and thus XX and YY are not marginally independent.


The last case is called the head-head and is a little bit tricky

g X X Z Z X->Z Y Y Y->Z

We can again write out the joint distribution

p(X,Y,Z)=p(X)p(Y)p(ZX,Y),p(X, Y, Z) = p(X) p(Y) p(Z | X, Y),

but this does not immediately help us when we try to condition on ZZ, we would want

p(X,YZ)=p(X,Y,Z)p(Z)=?p(XZ)p(YZ)p(X, Y | Z) = \frac{p(X, Y, Z)}{p(Z)} \stackrel{?}{=} p(X|Z) p(Y|Z)

which does not hold in general. For example, consider X,YBernoulli(0.5)X, Y \sim Bernoulli(0.5) and Z=1Z = 1 if X=YX = Y, and 00 otherwise. In this case if we know ZZ and observe XX, it immediately tells us the value of YY, hence XX and YY are not conditionally independent given ZZ.

We can however do a little trick and write the p(X,Y)p(X, Y) as a marginalization over ZZ, that is

p(X,Y)=Zp(X,Y,Z)=Zp(X)p(Y)p(ZX,Y)=p(X)p(Y)p(X, Y) = \sum_Z p(X, Y, Z) = \sum_Z p(X) p(Y) p(Z | X, Y) = p(X) p(Y)

since Zp(ZX,Y)=1\sum_Z p(Z | X, Y) = 1. As a result, in the head-head case we have marginal independence between XX and YY, that is XYX \bigci Y.


Having shown the three cases, we can finally define d-separation. Let GG be a DAG, and let A,B,CA, B, C be disjoint subsets of vertices.

A path between two vertices is blocked if it passes through a vertex vv, such that either:

  • the edges are head-tail or tail-tail, and vCv \in C, or
  • the edges are head-head, and v∉Cv \not \in C, and neither are any of its descendants.

We say that AA and BB are d-separated by CC if all paths from a vertex of AA to a vertex of BB are blocked w.r.t. CC. And now comes the important part, if AA and BB are d-separated by CC, then AB  CA \bigci B\ |\ C.

Thig might all look very complicated, but this property of directed graphical models is actually extremely useful, and very easy to do quickly after seeing just a few examples.


To get a feel for d-separation, let us look at the following example (BB is observed).

g B B D D B->D E E B->E A A A->B C C A->C C->E

We can immediately see that ADBA \bigci D | B since this is the head-tail case. We can also see that A̸EBA \not{\bigci} E | B (not conditionally independent), because while the path through BB is blocked, the path through CC is not.

Share on Twitter and Facebook

Discussion of "Graphical Models: D-Separation"

If you have any questions, feedback, or suggestions, please do share them in the comments! I'll try to answer each and every one. If something in the article wasn't clear don't be afraid to mention it. The goal of these articles is to be as informative as possible.

If you'd prefer to reach out to me via email, my address is loading ..