Eigenvalues and eigenvectors of a matrix $\boldsymbol A$ tell us a lot about the matrix. On the other hand, if we know our matrix $\boldsymbol A$ is somehow special (say symmetric) it will tell us some information about how its eigenvalues and eigenvectors look like.

Let us begin with a definition. Given a matrix $\boldsymbol A$, the vector $x$ is an eigenvector of $\boldsymbol A$ and has a corresponding eigenvalue $\lambda$, if

$$ \boldsymbol A \boldsymbol x = \lambda \boldsymbol x. $$

The eigenvectors of a matrix $\boldsymbol A$ are exactly those vectors which when transformed by the mapping defined by $\boldsymbol A$ are only scaled by $\lambda$, but their direction does not change.

## Eigenvalues and eigenvectors of a projection matrix

To understand what eigenvectors are and how they behave, let us consider a projection matrix $\boldsymbol P$. What are $x$’s and $\lambda$’s for a projection matrix?

The key property we’ll use is $\boldsymbol P^2 = \boldsymbol P$. This is because when we project a vector $x$ onto a plane to get $\hat x$, that is $\boldsymbol P x = \hat x$, we would expect that projecting $\hat x$ again to do nothing, since it already lies in the plane, that is

$$ \hat x = \boldsymbol P \hat x = \boldsymbol P (\boldsymbol P x) = (\boldsymbol P \boldsymbol P) x = \boldsymbol P^2 x. $$

Now thinking about eigenvectors as those vectors which don’t change direction when a projection matrix is applied, we can deduce two cases:

- Any $x$ already in the plane: $\boldsymbol P x = x, \lambda = 1$.
- Any $x$ perpendicular to the plane: $\boldsymbol P x = 0 x, \lambda = 0$.

As a result, a projection matrix $\boldsymbol P$ has two eigenvalues, $\lambda = 0$ and $\lambda = 1$, and two sets of eigenvectors. Those that lie in the projection plane, and those that are perpendicular to it.

## Eigenvalues of a $2 \times 2$ permutation matrix

One more small example, consider a $2 \times 2$ permutation matrix $\boldsymbol A = \begin{pmatrix}0 & 1 \\ 1 & 0 \end{pmatrix}$.

We can find the eigenvectors straight away, at least the first one, which is simply $x = (1\ 1)^T$, since $\boldsymbol A x = x$, and so its corresponding eigenvalue is $\lambda = 1$.

If we think a little harder, we can guess the second eigenvector to be $x = (-1\ 1)^T$, since $\boldsymbol A = -x$ with an eigenvalue $\lambda = -1$.

## Computing eigenvalues and eigenvectors

We can re-arrange the terms in our definition to get a direct way to compute eigenvalues and eigenvectors of a matrix $\boldsymbol A$. Simply move $\lambda x$ to the left

$$ \begin{aligned} \boldsymbol A x &= \lambda x \\\\ (\boldsymbol A - \lambda \boldsymbol I) x &= 0 \\\\ \end{aligned} $$

and then notice that $\boldsymbol A - \lambda \boldsymbol I$ must be singular, because $x$ lies in its nullspace. We know that singular matrices have a zero determinant, and we can use this to compute the eigenvalues $\lambda$ simply by writing

$$ \det (\boldsymbol A - \lambda \boldsymbol I) = 0. $$

This is called the **characteristic equation**. The equation $det(\boldsymbol A

- \lambda \boldsymbol I) = 0$ gives us a polynomial of degree $n$, which we can use to find $n$ solutions $\lambda$. These need not be different, and can even be complex numbers. But once we obtain the $\lambda$’s we can plug them back into

$$ (\boldsymbol A - \lambda \boldsymbol I) x = 0 $$

and one by one obtain their corresponding eigenvectors $x$.

## Eigenvalues and eigenvectors of an upper triangular matrix

For a triangular matrix, the determinant is just the diagonal

$$ det(\boldsymbol A) = \prod_{i=1}^n \boldsymbol A_{ii} $$

which means solving the characteristic equation of $\boldsymbol A$ simply amounts to multiplying out the diagonal

$$ det(\boldsymbol A - \lambda \boldsymbol I) = \prod_{i=1}^n (\boldsymbol A - \lambda \boldsymbol I)_{ii}, $$

which gives us a factored polynomial $(\boldsymbol A_{11} - \lambda)(\boldsymbol A_{22} - \lambda)\ldots(\boldsymbol A_{nn} - \lambda)$, from which we immediately see that the eigenvalues are the diagonal elements.

## Diagonalization $\boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda$

Suppose we have $n$ linearly independent eigenvectors of $\boldsymbol A$. Put them int the columns of $\boldsymbol S$. We now write

$$ \def\vertbar{{\rule[-1ex]{0.5pt}{2.5ex}}} $$

$$ \boldsymbol A \boldsymbol S = A \begin{bmatrix} \vertbar & \vertbar & & \vertbar \\\\ x_1 & x_2 & \cdots & x_n \\\\ \vertbar & \vertbar & & \vertbar \end{bmatrix} = \begin{bmatrix} \vertbar & \vertbar & & \vertbar \\\\ \lambda_1 x_1 & \lambda_2 x_2 & \cdots & \lambda_n x_n \\\\ \vertbar & \vertbar & & \vertbar \end{bmatrix} = \boldsymbol S \boldsymbol \Lambda $$

where $\boldsymbol \Lambda$ is a diagonal matrix of eigenvalues. Thus we get $\boldsymbol A \boldsymbol S = \boldsymbol S \boldsymbol \Lambda$. If we have $n$ independent eigenvectors in $\boldsymbol A$, we also get

$$ \begin{align} \boldsymbol A \boldsymbol S &= \boldsymbol S \boldsymbol \Lambda \\\\ \boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda \\\\ \boldsymbol A &= \boldsymbol S \boldsymbol \Lambda \boldsymbol S^{-1} \end{align} $$

The matrix $\boldsymbol A$ is sure to have $n$ independent eigenvectors (and be diagonalizable) if all the $\lambda$’s are different (no repeated $\lambda$’s). Repeated eigenvalues mean $\boldsymbol A$ may or may not have $n$ independent eigenvectors.

Proof (ref G. Strang, Introduction to LA): Suppose $c_1 + x_1 + c_2 x_2 = 0$. Multiply by $\boldsymbol A$ to find $c_1 \lambda_1 x_1 + c_2 \lambda_2 x_2 = 0$. Multiply by $\lambda_2$ to find $c_1 \lambda_2 x_1 + c_2 \lambda_2 x_2 = 0$. Now subtracting these two equations gives us

$$ (\lambda_1 - \lambda_2) c_1 x_1 = 0. $$

Since $\lambda_1 \neq \lambda_2$ and $x_1 \neq 0$, we conclude $c_1 = 0$. We can derive $c_2 = 0$ the same way. Since $c_1 = c_2 = 0$ are the only coefficients for which $c_1 x_1 + c_2 x_2 = 0$, we see that $x_1$ and $x_2$ are linearly independent.

The same argument can be extended to $n$ eigenvectors and eigenvalues.

## Sum of eigenvalues equlas the trace

Another very useful fact is that the sum of the eigenvalues equals
the sum of the main diagonal (called the **trace** of $\boldsymbol A$),
that is

$$ \lambda_1 + \lambda_2 + \ldots + \lambda_n = \boldsymbol A_{11} + \boldsymbol A_{22} + \ldots + \boldsymbol A_{nn} = Tr(\boldsymbol A). $$

To prove this we’ll first show that $Tr(\boldsymbol A \boldsymbol B) = Tr(\boldsymbol B \boldsymbol A)$.

To get a single element on the diagonal of $\boldsymbol A \boldsymbol B$ we simply write

$$ (\boldsymbol A \boldsymbol B)_{jj} = \sum_{k} \boldsymbol A_{jk} \boldsymbol B_{kj} $$

and to get the trace we just sum over all possible $j$ as

$$ Tr(\boldsymbol A \boldsymbol B) = \sum_{j} \sum_{k} \boldsymbol A_{jk} \boldsymbol B_{kj}. $$

On the other hand, the $k$-th element on the diagonal of $\boldsymbol B \boldsymbol A$ is

$$ (\boldsymbol B \boldsymbol A)_{kk} = \sum_{j} \boldsymbol B_{kj} \boldsymbol A_{jk} $$

and the trace $Tr(\boldsymbol B \boldsymbol A)$ is

$$ Tr(\boldsymbol B \boldsymbol A) = \sum_{k} \sum_{j} \boldsymbol B_{kj} \boldsymbol A_{jk}. $$

But since we can swap the order of summation and also swap the order of multiplication, we get

$$ Tr(\boldsymbol B \boldsymbol A) = \sum_{k} \sum_{j} \boldsymbol B_{kj} \boldsymbol A_{jk} = \sum_{j} \sum_{k} \boldsymbol A_{jk} \boldsymbol B_{kj} = Tr(\boldsymbol A \boldsymbol B). $$

Now consider we have $n$ different eigenvalues. We can diagonalize the matrix

$$ \boldsymbol S^{-1} \boldsymbol A \boldsymbol S = \boldsymbol \Lambda $$

where $\boldsymbol \Lambda$ is a diagonal matrix of eigenvalues of $\boldsymbol A$. Using our trace trick we can write

$$ Tr(\boldsymbol \Lambda) = Tr(\boldsymbol S^{-1} \boldsymbol A \boldsymbol S) = Tr((\boldsymbol S^{-1} \boldsymbol A) \boldsymbol S) = Tr(\boldsymbol S (\boldsymbol S^{-1} \boldsymbol A)) = Tr((\boldsymbol S \boldsymbol S^{-1}) \boldsymbol A) = Tr(\boldsymbol I \boldsymbol A) = Tr(\boldsymbol A) $$

and thus the sum of eigenvalues is equal the trace of $\boldsymbol A$. We’ve only shown this for the case of $n$ different eigenvalues. This property does hold in general, but requires some properties we haven’t proven yet (Jordan normal form), and thus we skip the rest of the proof.

If you’re interested, check out the following article which shows the whole proof, and possibly the Wikipedia article on Jordan normal form.

## Powers of a matrix

If $\boldsymbol A x = \lambda x$, then we multiply by $\boldsymbol A$ and get

$$ \boldsymbol A^2 x = \lambda \boldsymbol A x = \lambda^2 x. $$

Continuing

$$ \boldsymbol A^2 = \boldsymbol S \boldsymbol \Lambda \boldsymbol S^{-1} \boldsymbol S \boldsymbol \Lambda \boldsymbol S^{-1} = \boldsymbol S \boldsymbol \Lambda^2 \boldsymbol S^{-1} $$

or in general

$$ \boldsymbol A^k = \boldsymbol S \boldsymbol \Lambda^k \boldsymbol S^{-1}. $$

Theorem: $\boldsymbol A^k \rightarrow 0$ as $k \rightarrow \infty$ if all $|\lambda_i| < 1$.

## More properties

$\boldsymbol A$ and $\boldsymbol B$ share the same $n$ independent eigenvectors if and only if $\boldsymbol A \boldsymbol B = \boldsymbol B \boldsymbol A$.

This is true because $\boldsymbol A \boldsymbol B x = \lambda \beta x$ and $\boldsymbol B \boldsymbol A x = \lambda \beta x$ since

$$ \boldsymbol A \boldsymbol B x = \boldsymbol A \beta x = \beta \boldsymbol A x = \beta \lambda x. $$

**But this only holds if $\boldsymbol A$ and $\boldsymbol B$ share the same eigenvectors!**

One last interesting fact we can show is what happens to the eigenvalues when we add a constant $c$ to the matrix $\boldsymbol A$. The proof is rather trivial, if $\boldsymbol A x = \lambda x$, then

$$ (\boldsymbol A + c) x = (\boldsymbol A + c \boldsymbol I) x = (\lambda + c) x. $$

Adding a constant to a matrix causes its eigenvalues to increase by exactly that constant.

## References and visualizations

Mixture of Categoricals and Latent Dirichlet Allocation (LDA)

Git Command Overview with Useful Flags and Aliases

Linear Algebra