The Gaussian Distribution - Basic Properties

7 min read • Published: December 01, 2018

The Gaussian distribution has many interesting properties, many of which make it useful in various different applications. Before moving further, let us just define the univariate PDF with a mean μ\mu and variance σ2\sigma^2

N(xμ,σ2)=12πσ2exp((xμ)22σ2).\mathcal{N}(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right).

In the general multi-dimensional case, the mean becomes a mean vector, and the variance turns into a D×DD \times D covariance matrix. The PDF then becomes

N(xμ,Σ)=1(2π)kdet(Σ)exp(12(xμ)TΣ1(xμ))\mathcal{N}(\mathbf{x} | \mathbf{\mu}, \mathbf{\Sigma}) = \frac{1}{\sqrt{(2 \pi)^k det(\mathbf{\Sigma})}} \exp \left( -\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \right)

where det(Σ)det(\Sigma) is the determinant of the covariance matrix Σ\Sigma. The term in the exponent is called Mahalanobis distance and is useful to study in more detail.

Affine property

The first property of the Gaussian states that if XN(μ,Σ)X \sim \mathcal{N}(\mu, \Sigma), then Y=AX+bY = A X + b is also a Gaussian, specifically YN(Aμ+b,AΣAT)Y \sim \mathcal{N}(A \mu + b, A \Sigma A^T). We can prove this using the definition of mean and covariance. The mean of YY (denoted μY\mu_Y) can be derived simply from the linearity of expectation, that is

μY=E[Y]=E[AX+b]=E[AX]+E[b]=AE[X]+b=Aμ+b.\mu_Y = E[Y] = E[A X + b] = E[A X] + E[b] = A E[X] + b = A \mu + b.

And now the covariance ΣY\Sigma_Y we again substitute into the definition of covariance and get

ΣY=E[(YμY)(YμY)T]=E[((AX+b)(Aμ+b))((AX+b)(Aμ+b))T]=E[(A(Xμ))(A(Xμ))T]=E[A(Xμ)(Xμ)TAT]=AE[(Xμ)(Xμ)T]AT=AΣAT\begin{aligned} \Sigma_Y &= E[(Y - \mu_Y) (Y - \mu_Y)^T] \\ &= E[((A X + b) - (A \mu + b)) ((A X + b) - (A \mu + b))^T] \\ &= E[(A(X - \mu)) (A (X - \mu))^T] \\ &= E[A (X - \mu) (X - \mu)^T A^T] \\ &= A E[(X - \mu) (X - \mu)^T] A^T \\ &= A \Sigma A^T \end{aligned}

and thus ΣY=AΣAT\Sigma_Y = A \Sigma A^T, which gives the final result of

YN(Aμ,AΣAT).Y \sim \mathcal{N}(A \mu, A \Sigma A^T).

Sampling from a Gaussian

We can immediately make use of the affine property to define how to sample from a multivariate Gaussian. We’ll make use of Cholesky decomposition, which for a positive-definite matrix Σ\Sigma returns a lower triangular matrix LL, such that

LLT=Σ.L L^T = \Sigma.

This together with the affine property defined above gives us

N(μ,Σ)=μ+LN(0,I).\mathcal{N}(\mu, \Sigma) = \mu + L \mathcal{N}(0, I).

Sampling from the former is thus equivalent to sampling from the latter. Since μ\mu and LL are constant factors with respect to sampling, we simply have to figure out how to draw samples from N(0,I)\mathcal{N}(0, I) and then do the affine transform back to our original distribution.

Observe that since the covariance of N(0,I)\mathcal{N}(0, I) is diagonal, the individual values in the random vector are independent. Note that this property is special to the Gaussian and is a little bit tricky, but holds in our case, because in this case we’re inferring that individual random variables which are jointly Gaussian but uncorrelated are independent.

Finally, because the variables are independent, we can sample them independently, which can be done easily using the Box-Muller transform. Once we obtain our DD independent samples, we simply multiply by LL and add μ\mu to obtain correlated samples from our original distribution.

Sum of two independent Gaussians is a Gaussian

If XX and YY random variables with a Gaussian distributions, where XN(μX,σX2)X \sim \mathcal{N}(\mu_X, \sigma_X^2) and XN(μX,σX2)X \sim \mathcal{N}(\mu_X, \sigma_X^2), then

X+YN(μX+μY,σX2+σY2).X + Y \sim \mathcal{N}(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2).

This can be proven many different ways, the simplest of which is probably using moment generating functions. With the moment generating function of a Gaussian being

MX(t)=exp(tμ+12σ2t2),M_X(t) = \exp \left( t\mu + \frac{1}{2} \sigma^2 t^2 \right),

and using the property of moment generating functions which says how to combine two independent variables XX and YY, specifically

MX+Y(t)=MX(t)MY(t),M_{X + Y}(t) = M_X(t) M_Y(t),

we can simply plug in our moment generating function for the Gaussian and get our result

MX+Y(t)=MX(t)MY(t)=exp(tμX+12σX2t2)exp(tμY+12σY2t2)=exp(t(μX+μY)+12t2(σX2+σY2))\begin{aligned} M_{X + Y}(t) &= M_X(t) M_Y(t) \\ &= \exp \left( t\mu_X + \frac{1}{2} \sigma_X^2 t^2 \right) \exp \left( t\mu_Y + \frac{1}{2} \sigma_Y^2 t^2 \right) \\ &= \exp \left( t(\mu_X + \mu_Y) + \frac{1}{2} t^2 (\sigma_X^2 + \sigma_Y^2) \right) \end{aligned}

Deriving the normalizing constant

We can compute the Gaussian integral using polar coordinates. Consider the zero mean unit variance case.

(ex2dx)2=ex2dxex2dx=ex2dxey2dyrename x to y=e(x2+y2)dx dy\begin{aligned} \left( \int_{-\infty}^\infty e^{-x^2} dx \right)^2 &= \int_{-\infty}^\infty e^{-x^2} dx \int_{-\infty}^\infty e^{-x^2} dx \\ &= \int_{-\infty}^\infty e^{-x^2} dx \int_{-\infty}^\infty e^{-y^2} dy \qquad \text{rename $x$ to $y$}\\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2)} dx\ dy \end{aligned}

And now comes an important trick, we’ll do a polar coordinate substitution, since e(x2+y2)=er2e^{-(x^2 + y^2)} = e^{-r^2} in R2R^2.

=e(x2+y2)dx dy=02π0er2r dr dθ=2π0er2r dr\begin{aligned} &= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2)} dx\ dy\\ &= \int_0^{2\pi} \int_0^\infty e^{-r^2} r\ dr\ d\theta \\ &= 2\pi \int_0^\infty e^{-r^2} r\ dr \\ \end{aligned}

now substituting s=r2s = -r^2 and ds=2r drds = -2 r\ dr, giving us

=2π0er2r dr=2π012es ds=π0es dsflipping integration bounds=π0es ds=π(e0e)=π\begin{aligned} &= 2\pi \int_0^\infty e^{-r^2} r\ dr \\ &= 2\pi \int_0^\infty -\frac{1}{2} e^s\ ds \\ &= \pi \int_0^\infty -e^s\ ds \qquad\text{flipping integration bounds} \\ &= \pi \int_{-\infty}^0 e^s\ ds \\ &= \pi (e^0 - e^{-\infty}) \\ &= \pi \end{aligned}

Finally, combining this with the initial integral we get

(ex2dx)2=π\left( \int_{-\infty}^\infty e^{-x^2} dx \right)^2 = \pi

and as a result

ex2dx=π.\int_{-\infty}^\infty e^{-x^2} dx = \sqrt{\pi}.

Deriving the mean and standard deviation

Lastly, while not necessarily a property of the Gaussian, it is a useful exercise to derive the mean and standard deviation from the PDF. Once again, the PDF is

p(xμ,σ2)=12πσ2exp((xμ)22σ2)p(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)

and the general formula for E[X]E[X] is

E[X]=xp(x) dx=x12πσ2exp((xμ)22σ2) dx.E[X] = \int_{-\infty}^\infty x p(x)\ dx = \int_{-\infty}^\infty x \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)\ dx.

We can pull out the constant outside of the integral and substitute u=xμu = x - \mu and du=dxdu = dx, giving us

=12πσ2(u+μ)exp(u22σ2) du=12πσ2((uexp(u22σ2) du)+μ(exp(u22σ2) du))=12πσ2(uexp(u22σ2) du)+μ\begin{aligned} &= \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty (u + \mu) \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \\ &= \frac{1}{\sqrt{2 \pi \sigma^2}} \left( \left( \int_{-\infty}^\infty u \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) + \mu \left( \int_{-\infty}^\infty \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) \right) \\ &= \frac{1}{\sqrt{2 \pi \sigma^2}} \left( \int_{-\infty}^\infty u \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) + \mu \\ \end{aligned}

Here we note that the function being integrated is odd, which means the integral adds up to 00, and we’re left with only μ\mu, that is

E[X]=μE[X] = \mu

which is what we wanted to prove.

Now for the variance, which is defined as

var(X)=E[(Xμ)2]var(X) = E[(X - \mu)^2]

which written again as an integral gives us

var(X)=(xμ)2p(x) dx=(xμ)212πσ2exp((xμ)22σ2) dx.var(X) = \int_{-\infty}^\infty (x - \mu)^2 p(x)\ dx = \int_{-\infty}^\infty (x - \mu)^2 \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)\ dx.

again pulling out the constant and substituting u=xμu = x - \mu and du=dxdu = dx we get

var(X)=12πσ2u2exp(u22σ2) du.\begin{aligned} var(X) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty u^2 \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du. \end{aligned}

Integrating by parts using the u v=u vvu\int u\ v' = u\ v - \int v u' where we set

u=yu=1v=yey2/2σ2.\begin{aligned} u &= y \\ u' &= 1 \\ v' &= y \cdot e^{-y^2 / 2\sigma^2}. \end{aligned}

To get vv we have to compute the integral of vv', which we can easily do substituting u=y22σ2u = -\frac{y^2}{2\sigma^2} and du=yσ2dydu = -\frac{y}{\sigma^2} dy, giving us

yey2/2σ2 dy=σ2eu du=σ2eu=σ2ey22σ2.\begin{aligned} \int y \cdot e^{-y^2 / 2\sigma^2}\ dy &= -\int \sigma^2 e^u\ du \\ &= -\sigma^2 e^u \\ &= -\sigma^2 e^{-\frac{y^2}{2\sigma^2}}. \end{aligned}

Now finishing our integration by parts we can write out the final formula

uv=u vv u=12πσ2([y(σ2)ey22σ2](s2)ey22s2 dy)=0+σ21=σ2.\begin{aligned} \int u v' &= u\ v - \int v\ u' \\ &= \frac{1}{2 \pi \sigma^2} \left( \left[y (-\sigma^2) e^{-\frac{y^2}{2\sigma^2}}\right]_{-\infty}^\infty - \int_{-\infty}^\infty (-s^2) e^{-\frac{y^2}{2s^2}} \ dy \right) \\ &= 0 + \sigma^2 \cdot 1 = \sigma^2. \end{aligned}

That is, var(X)=σ2var(X) = \sigma^2.

Share on Twitter and Facebook

Discussion of "The Gaussian Distribution - Basic Properties"

If you have any questions, feedback, or suggestions, please do share them in the comments! I'll try to answer each and every one. If something in the article wasn't clear don't be afraid to mention it. The goal of these articles is to be as informative as possible.

If you'd prefer to reach out to me via email, my address is loading ..