## The Gaussian Distribution - Basic Properties

The Gaussian distribution has many interesting properties, many of which make it useful in various different applications. Before moving further, let us just define the univariate PDF with a mean $\mu$ and variance $\sigma^2$

$$\mathcal{N}(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right).$$

In the general multi-dimensional case, the mean becomes a mean vector, and the variance turns into a $D \times D$ covariance matrix. The PDF then becomes

$$\mathcal{N}(\mathbf{x} | \mathbf{\mu}, \mathbf{\Sigma}) = \frac{1}{\sqrt{(2 \pi)^k det(\mathbf{\Sigma})}} \exp \left( -\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \right)$$

where $det(\Sigma)$ is the determinant of the covariance matrix $\Sigma$. The term in the exponent is called Mahalanobis distance and is useful to study in more detail.

## Affine property

The first property of the Gaussian states that if $X \sim \mathcal{N}(\mu, \Sigma)$, then $Y = A X + b$ is also a Gaussian, specifically $Y \sim \mathcal{N}(A \mu + b, A \Sigma A^T)$. We can prove this using the definition of mean and covariance. The mean of $Y$ (denoted $\mu_Y$) can be derived simply from the linearity of expectation, that is

$$\mu_Y = E[Y] = E[A X + b] = E[A X] + E[b] = A E[X] + b = A \mu + b.$$

And now the covariance $\Sigma_Y$ we again substitute into the definition of covariance and get

\begin{align} \Sigma_Y &= E[(Y - \mu_Y) (Y - \mu_Y)^T] \\\ &= E[((A X + b) - (A \mu + b)) ((A X + b) - (A \mu + b))^T] \\\ &= E[(A(X - \mu)) (A (X - \mu))^T] \\\ &= E[A (X - \mu) (X - \mu)^T A^T] \\\ &= A E[(X - \mu) (X - \mu)^T] A^T \\\ &= A \Sigma A^T \end{align}

and thus $\Sigma_Y = A \Sigma A^T$, which gives the final result of

$$Y \sim \mathcal{N}(A \mu, A \Sigma A^T).$$

## Sampling from a Gaussian

We can immediately make use of the affine property to define how to sample from a multivariate Gaussian. We’ll make use of Cholesky decomposition, which for a positive-definite matrix $\Sigma$ returns a lower triangular matrix $L$, such that

$$L L^T = \Sigma.$$

This together with the affine property defined above gives us

$$\mathcal{N}(\mu, \Sigma) = \mu + L \mathcal{N}(0, I).$$

Sampling from the former is thus equivalent to sampling from the latter. Since $\mu$ and $L$ are constant factors with respect to sampling, we simply have to figure out how to draw samples from $\mathcal{N}(0, I)$ and then do the affine transform back to our original distribution.

Observe that since the covariance of $\mathcal{N}(0, I)$ is diagonal, the individual values in the random vector are independent. Note that this property is special to the Gaussian and is a little bit tricky, but holds in our case, because in this case we’re inferring that individual random variables which are jointly Gaussian but uncorrelated are independent.

Finally, because the variables are independent, we can sample them independently, which can be done easily using the Box-Muller transform. Once we obtain our $D$ independent samples, we simply multiply by $L$ and add $\mu$ to obtain correlated samples from our original distribution.

### Sum of two independent Gaussians is a Gaussian

If $X$ and $Y$ random variables with a Gaussian distributions, where $X \sim \mathcal{N}(\mu_X, \sigma_X^2)$ and $X \sim \mathcal{N}(\mu_X, \sigma_X^2)$, then

$$X + Y \sim \mathcal{N}(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2).$$

This can be proven many different ways, the simplest of which is probably using moment generating functions. With the moment generating function of a Gaussian being

$$M_X(t) = \exp \left( t\mu + \frac{1}{2} \sigma^2 t^2 \right),$$

and using the property of moment generating functions which says how to combine two independent variables $X$ and $Y$, specifically

$$M_{X + Y}(t) = M_X(t) M_Y(t),$$

we can simply plug in our moment generating function for the Gaussian and get our result

\begin{align} M_{X + Y}(t) &= M_X(t) M_Y(t) \\\ &= \exp \left( t\mu_X + \frac{1}{2} \sigma_X^2 t^2 \right) \exp \left( t\mu_Y + \frac{1}{2} \sigma_Y^2 t^2 \right) \\\ &= \exp \left( t(\mu_X + \mu_Y) + \frac{1}{2} t^2 (\sigma_X^2 + \sigma_Y^2) \right) \end{align}

### Deriving the normalizing constant

We can compute the Gaussian integral using polar coordinates. Consider the zero mean unit variance case.

\begin{align} \left( \int_{-\infty}^\infty e^{-x^2} dx \right)^2 &= \int_{-\infty}^\infty e^{-x^2} dx \int_{-\infty}^\infty e^{-x^2} dx \\\ &= \int_{-\infty}^\infty e^{-x^2} dx \int_{-\infty}^\infty e^{-y^2} dy \qquad \text{rename x to y}\\\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2)} dx\ dy \end{align}

And now comes an important trick, we’ll do a polar coordinate substitution, since $e^{-(x^2 + y^2)} = e^{-r^2}$ in $R^2$.

\begin{align} &= \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2)} dx\ dy\\\ &= \int_0^{2\pi} \int_0^\infty e^{-r^2} r\ dr\ d\theta \\\ &= 2\pi \int_0^\infty e^{-r^2} r\ dr \\\ \end{align}

now substituting $s = -r^2$ and $ds = -2 r\ dr$, giving us

\begin{align} &= 2\pi \int_0^\infty e^{-r^2} r\ dr \\\ &= 2\pi \int_0^\infty -\frac{1}{2} e^s\ ds \\\ &= \pi \int_0^\infty -e^s\ ds \qquad\text{flipping integration bounds} \\\ &= \pi \int_{-\infty}^0 e^s\ ds \\\ &= \pi (e^0 - e^{-\infty}) \\\ &= \pi \end{align}

Finally, combining this with the initial integral we get

$$\left( \int_{-\infty}^\infty e^{-x^2} dx \right)^2 = \pi$$

and as a result

$$\int_{-\infty}^\infty e^{-x^2} dx = \sqrt{\pi}.$$

### Deriving the mean and standard deviation

Lastly, while not necessarily a property of the Gaussian, it is a useful exercise to derive the mean and standard deviation from the PDF. Once again, the PDF is

$$p(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)$$

and the general formula for $E[X]$ is

$$E[X] = \int_{-\infty}^\infty x p(x)\ dx = \int_{-\infty}^\infty x \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)\ dx.$$

We can pull out the constant outside of the integral and substitute $u = x - \mu$ and $du = dx$, giving us

\begin{align} &= \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty (u + \mu) \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \\\ &= \frac{1}{\sqrt{2 \pi \sigma^2}} \left( \left( \int_{-\infty}^\infty u \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) + \mu \left( \int_{-\infty}^\infty \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) \right) \\\ &= \frac{1}{\sqrt{2 \pi \sigma^2}} \left( \int_{-\infty}^\infty u \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du \right) + \mu \\\ \end{align}

Here we note that the function being integrated is odd, which means the integral adds up to $0$, and we’re left with only $\mu$, that is

$$E[X] = \mu$$

which is what we wanted to prove.

Now for the variance, which is defined as

$$var(X) = E[(X - \mu)^2]$$

which written again as an integral gives us

$$var(X) = \int_{-\infty}^\infty (x - \mu)^2 p(x)\ dx = \int_{-\infty}^\infty (x - \mu)^2 \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right)\ dx.$$

again pulling out the constant and substituting $u = x - \mu$ and $du = dx$ we get

\begin{align} var(X) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty u^2 \exp \left( -\frac{u^2}{2 \sigma^2} \right)\ du. \end{align}

Integrating by parts using the $\int u\ v’ = u\ v - \int v\ u’$ where we set

\begin{align} u &= y \\\ u’ &= 1 \\\ v’ &= y \cdot e^{-y^2 / 2\sigma^2}. \end{align}

To get $v$ we have to compute the integral of $v’$, which we can easily do substituting $u = -\frac{y^2}{2\sigma^2}$ and $du = -\frac{y}{\sigma^2} dy$, giving us

\begin{align} \int y \cdot e^{-y^2 / 2\sigma^2}\ dy &= -\int \sigma^2 e^u\ du \\\ &= -\sigma^2 e^u \\\ &= -\sigma^2 e^{-\frac{y^2}{2\sigma^2}}. \end{align}

Now finishing our integration by parts we can write out the final formula

\begin{align} \int u v’ &= u\ v - \int v\ u’ \\\\ &= \frac{1}{2 \pi \sigma^2} \left( \left[y (-\sigma^2) e^{-\frac{y^2}{2\sigma^2}}\right]_{-\infty}^\infty - \int_{-\infty}^\infty (-s^2) e^{-\frac{y^2}{2s^2}} \ dy \right) \\\ &= 0 + \sigma^2 \cdot 1 = \sigma^2. \end{align}

That is, $var(X) = \sigma^2$.