Compared to the previous article where we simply used vector derivatives we’ll now try to derive the formula for least squares simply by the properties of linear transformations and the four fundamental subspaces of linear algebra. These are:
- Kernel : The set of all solutions to . Sometimes we can say nullspace instead of kernel.
- Image : The set of all right sides , for which there is a solution . We’ll show that this is equal to the column space , which is the span of the column vectors in .
- Row space : Span of the row vectors in , sometimes also referred to as (the image of ). We can also refer to this as , because since , then .
- Left kernel (or left nullspace): The set of all solutions to . The name comes from left multiplying by , specifically the set of solutions to .
For this derivation we assume that and .
A is not invertible (could be rectangular), there is no exact solution to , because has a component in , which is outside the range of (literally). We can define where is the ortogonal projection of onto , and is the ortogonal projection of onto . In other words, .
The above is valid, because we assume , and that , in other words that and together generate the whole range of our linear mapping . Now just using basic algebra:
Here we used the fact that is always a symmetric positive semi-definite matrix, and in case we have linearly independent columns, it is actually positive-definite, which means it is also invertible. This is actually easy to show.
First we show that is symmetric. This is easy to see, because is just the dot product of -th row of with the column of . Note that -th row of is actually -th column of . From this we see that , because dot product is symmetric.
Now we show that is positive semi-definite. For an arbitrary matrix , we say that is positive semi-definite if and only if for all . We can directly substitute and use the same trick as below:
Since satisfies the definition directly, it is positive-semidefinite.
There is also another very nice way to show that is invertible, without showing that it is positive semi-definite.
Starting with , this follows immediately from .
Next : , left multiply by and we get:
Since the norm is zero only if the vector is zero, we get that any vector for which , it is also true that , which can only be true when , and hence .
Because , we also know that , which means if has linearly independent columns, is invertible, because it has a full rank (this is because is square and has the same number of rows/columns as has columns).
Share on Twitter and Facebook
Discussion of "Linear Regression - least squares with orthogonal projection"
If you have any questions, feedback, or suggestions, please do share them in the comments! I'll try to answer each and every one. If something in the article wasn't clear don't be afraid to mention it. The goal of these articles is to be as informative as possible.
If you'd prefer to reach out to me via email, my address is loading ..