Important matrices
We introduce notation for some commonly used and important matrices.
The \(n \times n\) identity matrix is
\[
\mathbf{I} =
\begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1 \\
\end{bmatrix}.
\]
For every \(\mathbf{A}\in\mathbb{R}^{n\times n}\), then \(\mathbf{AI} = \mathbf{IA}\).
In practice the identity matrix leaves any vector unchanged. For example, \[
\begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix}
\begin{bmatrix}3\\ -2 \end{bmatrix} = \begin{bmatrix}3\\ -2 \end{bmatrix}.
\] Because of this property we sometimes call \(I\) the do nothing transform.
The inverse \(A^{-1}\in\mathbb{R}^{n\times n}\) is defined as the matrix for which \[AA^{-1} = A^{-1}A = I,\]
When \(A^{-1}\) exists the matrix is said to be invertible.
Note that \((AB)^{-1} = B^{-1}A^{-1}\) for invertible \(B\in\mathbb{R}^{n\times n}\).
For example, the \(2\times 2\) matrix \[
A = \begin{bmatrix}1 & 2\\ 3 & 4\end{bmatrix}
\] has inverse \[
A^{-1} = \begin{bmatrix}-2 & 1\\ 1.5 & -0.5\end{bmatrix},
\] and one can check that \(AA^{-1} = I\) and \(A^{-1}A = I\).
Class activity (1–2 minutes): Show that \(AA^{-1} = I\) and \(A^{-1}A = I\).
A diagonal matrix \(D\in\mathbb{R}^{n\times n}\) has entries \(d_{ij}=0\) if \(i\neq j\), i.e.,
\[
D =
\begin{bmatrix}
d_{11} & 0 & \cdots & 0 \\
0 & d_{22} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & d_{nn} \\
\end{bmatrix}.
\]
A square matrix \(Q\in\mathbb{R}^{n}\) is orthogonal if
\[QQ^{T}=Q^{T}Q=I.\]
In particular, the inverse of an orthogonal matrix is it’s transpose.
Note: Think about this in terms of matrix multiplication of rows and columns and the dot product of orthogonal vectors.
A familiar example of an orthogonal matrix is the \(2\times 2\) rotation matrix \[
R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta\\ \sin \theta & \cos \theta\end{bmatrix}.
\] For \(\theta=45^\circ\) this becomes \(\big[\tfrac{\sqrt{2}}{2}, -\tfrac{\sqrt{2}}{2}; \tfrac{\sqrt{2}}{2}, \tfrac{\sqrt{2}}{2}\big]\), and one can verify that \(R(\theta)R(\theta)^T=I\).
Orthogonal matrices preserve lengths and angles, which is why they model rigid rotations and reflections.
Class activity (1–2 minutes): Show that the \(2\times 2\) matrix \(\begin{bmatrix}0 & -1\\ 1 & 0\end{bmatrix}\) and check that it is orthogonal. What geometric transformation does it represent?
A lower triangular matrix \(L\in\mathbb{R}^{n\times n}\) is a matrix where all the entries above the main diagonal are zero
\[
L =
\begin{bmatrix}
l_{11} & 0 & \cdots & 0 \\
l_{12} & l_{22} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
l_{1n} & l_{n2} & \cdots & l_{nn} \\
\end{bmatrix}.
\]
Solving a linear system \(L\mathbf{x}=\mathbf{b}\) with \(L\) lower triangular can be done efficiently by forward substitution.
For example, let \[
L = \begin{bmatrix}1 & 0\\ 2 & 3\end{bmatrix},\quad \mathbf{b} = \begin{bmatrix}1\\4\end{bmatrix}.
\] Then the system \(L\mathbf{x}=\mathbf{b}\) yields \(x_1=1\) from the first equation and \(2x_1+3x_2=4\) from the second, giving \(x_2=\tfrac{2}{3}\).
Class activity (1–2 minutes): Solve the lower triangular system \(\begin{bmatrix}1 & 0\\ -1 & 2\end{bmatrix}\mathbf{x} = \begin{bmatrix}3\\1\end{bmatrix}\) by forward substitution.
An upper triangular matrix \(U\in\mathbb{R}^{n\times n}\) is a matrix where all the entries below the main diagonal are zero
\[
U =
\begin{bmatrix}
u_{11} & u_{12} & \cdots & u_{1n} \\
0 & u_{22} & \cdots & u_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & u_{nn} \\
\end{bmatrix}.
\]
Upper triangular systems are solved by backward substitution. For example, if \[
U = \begin{bmatrix}4 & 1\\ 0 & 5\end{bmatrix},\quad \mathbf{b} = \begin{bmatrix}9\\10\end{bmatrix},
\] the equation \(U\mathbf{x}=\mathbf{b}\) implies \(5x_2=10\) so \(x_2=2\), and then \(4x_1 + x_2 =9\) gives \(x_1 = \tfrac{7}{4}\).
Class activity (1–2 minutes): Solve the upper triangular system \(\begin{bmatrix}2 & 3\\ 0 & -1\end{bmatrix}\mathbf{x} = \begin{bmatrix}7\\ -2\end{bmatrix}\) by backward substitution.
The inverse of a lower triangular matrix is itself a lower triangular matrix. This is also true for upper triangular matrices, i.e., the inverse is also upper triangular.
Eigenvalues and eigenvectors
An eigenvector of an \(n\times n\) matrix \(\mathbf{A}\) is a nonzero vector \(\mathbf{x}\) such that
\[
A\mathbf{x} = \lambda\mathbf{x}
\]
for some scalar \(\lambda.\)
The scalar \(\lambda\) is called an eigenvalue.
An \(n \times n\) matrix has at most \(n\) distinct eigenvectors and at most \(n\) distinct eigenvalues.
For example, consider the matrix \[
A = \begin{bmatrix}2 & 1\\1 & 2\end{bmatrix}.
\] It has eigenvalues \(\lambda_1=3\) and \(\lambda_2=1\) with corresponding eigenvectors \(\mathbf{v}_1=[1,1]^T\) and \(\mathbf{v}_2=[1,-1]^T\).
Eigenvectors reveal directions that the matrix stretches or compresses without changing direction.
Class activity (1–2 minutes): Verify that \(A\mathbf{v}_1 = 3\mathbf{v}_1\) and \(A\mathbf{v}_2 = 1\mathbf{v}_2\).
Matrix decompositions
We introduce here important matrix decompositions. These are useful in solving linear equations. Furthermore they play an important role in various data science applications.
LU factorization
An LU decomposition of a square matrix \(A\in\mathbb{R}^{n\times n}\) is a factorization of \(A\) into a product of matrices
\[
A = LU,
\]
where \(L\) is a lower triangular square matrix and \(U\) is an upper triangular square matrix. For example, when \(n=3\), we have
\[
\begin{bmatrix}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33} \\
\end{bmatrix}
=
\begin{bmatrix}
l_{11} & 0 & 0 \\
l_{21} & l_{22} & 0\\
l_{31} & l_{32} & a_{33} \\
\end{bmatrix}
\begin{bmatrix}
u_{11} & u_{12} & u_{13} \\
0 & u_{22} & u_{23} \\
0 & 0 & u_{33} \\
\end{bmatrix}
\]
It simplifies the process of solving linear equations and is more numerically stable than computing the inverse of \(A\).
Then instad of computing the inverse of \(A\), we can solve the linear system \(A\mathbf{x}=\mathbf{b}\) in two steps:
- Forward substitution: First, solve the equation \(L\mathbf{y} = \mathbf{b}\) for \(\mathbf{y}\).
- Backward substitution: Then, solve \(U\mathbf{x} = \mathbf{y}\) for \(\mathbf{x}\).
This method is particularly useful because triangular matrices are much easier to work with. It can make solving linear systems faster and more efficient, especially for large matrices.
For example, consider the matrix \[
A = \begin{bmatrix}4 & 3\\6 & 3\end{bmatrix}.
\] One possible LU factorization is \[
A = LU = \begin{bmatrix}1 & 0\\ 1.5 & 1\end{bmatrix}\begin{bmatrix}4 & 3\\ 0 & -1.5\end{bmatrix}.
\] Once we have \(L\) and \(U\) we can solve \(A\mathbf{x}=\mathbf{b}\) by first solving \(L\mathbf{y}=\mathbf{b}\) and then \(U\mathbf{x}=\mathbf{y}\).
QR decomposition
A QR decomposition of a square matrix \(A\in\mathbb{R}^{n\times n}\) is a factorization of \(A\) into a product of matrices
\[
A=QR,
\] where \(Q\) is an orthogonal square matrix and \(R\) is an upper-triangular square matrix.
QR factorization is useful in solving linear systems and performing least squares fitting.
This factorization has a couple of important benefits:
Solving Linear Systems: When you’re working with a system of equations represented by \(A\mathbf{x} = \mathbf{b}\), you can substitute the QR factorization into this equation:
\[
QR\mathbf{x} = \mathbf{b}
\]
Since \(Q\) is orthogonal, you can multiply both sides by \(Q^T\) (the transpose of \(Q\)) to simplify it:
\[
R\mathbf{x} = Q^T\mathbf{b}
\]
Now, you can solve this upper triangular system for \(\mathbf{x}\) using backward substitution, which is typically easier and more stable.
Least Squares Problems: In many data science applications, you want to find the best fit line or hyperplane for your data. QR factorization is particularly useful here because it helps in minimizing the error when the system is overdetermined (more equations than unknowns). You can solve the least squares problem by leveraging the QR decomposition to find:
\[
\hat{\mathbf{x}} = R^{-1}Q^T\mathbf{b}
\]
By converting the problem into a triangular system, QR factorization often provides a more stable numerical solution than other methods, especially for poorly conditioned matrices.
As a simple example, consider the \(2\times 2\) matrix \[
A = \begin{bmatrix}1 & 1\\1 & -1\end{bmatrix}.
\] Using the Gram–Schmidt process we can factor it as \[
A = QR \quad\text{with}\quad Q = \tfrac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\1 & -1\end{bmatrix},\quad R = \sqrt{2}\begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix}.
\] Here \(Q\) has orthonormal columns and \(R\) is upper triangular.
Eigendecomposition
Let \(A\in\mathbb{R}^{n\times n}\) have \(n\) linearly independent eigenvectors \(\mathbf{x}_i\) for \(i=1,\ldots, n\), then \(A\) can be factorized as
\[
A = X\Lambda X^{-1},
\]
where the columns of matrix \(X\) are the eigenvectors of \(A\), and
\[
\Lambda =
\begin{bmatrix}
\lambda_{1} & 0 & \cdots & 0 \\
0 & \lambda_{2} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \lambda_{n} \\
\end{bmatrix},
\]
is a diagonal matrix of the eigenvalues.
In this case, the matrix \(A\) is said to be diagonalizable.
Instead of calulating \(A\mathbf{x}\) directly, we can map it by first transforming it using \(X^{-1}\) and then stretching it by \(\Lambda\) and finally transforming it using \(X\).
\[
A \mathbf{x} = X \Lambda X^{-1} \mathbf{x}
\]
We’ll use eigendecomposition in Principal Component Analysis (PCA) to reduce dimensionality in datasets while preserving as much variance as possible.
A special case occurs when \(A\) is symmetric. Recall that a matrix is symmetric when \(A = A^T.\)
In this case, it can be shown that the eigenvectors of \(A\) are all mutually orthogonal. Consequently, \(X^{-1} = X^{T}\) and we can decompose \(A\) as:
\[A = XDX^T.\]
This is known as the spectral theorem and this decomposition of \(A\) is its spectral decomposition. The eigenvalues of a matrix are also called its spectrum.
For symmetric matrices the eigenvectors can be chosen to be orthonormal, so \(X^{-1} = X^T\). In the earlier example \(A=\begin{bmatrix}2&1\\1&2\end{bmatrix}\), which is symmetric, its eigendecomposition can be written in orthonormal form as \(A = X \Lambda X^T\) with \[
X = \tfrac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\ 1 & -1\end{bmatrix},\quad \Lambda = \mathrm{diag}(3,1).
\] Because the columns of \(X\) are orthonormal, the factorization \(A = X\Lambda X^T\) is often easier to use numerically and conceptually.
Singular value decomposition
For the previous few examples, we required \(\mathbf{A}\) to be square. Now let \(A\in\mathbb{R}^{m\times n}\) with \(m>n\), then \(A\) admits a decomposition
\[
A = U\Sigma V^{T}.
\] The matrices \(U\in\mathbb{R}^{m\times m}\) and \(V\in\mathbb{R}^{n\times n}\) are orthogonal. The columns of \(U\) are the left singular vectors and the columns of \(V\) are the right singular vectors.
The matrix \(\Sigma\in\mathbb{R}^{m\times n}\) is a diagonal matrix of the form
\[
\Sigma =
\begin{bmatrix}
\sigma_{11} & 0 & \cdots & 0 \\
0 & \sigma_{22} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \sigma_{mn} \\
\end{bmatrix}.
\]
The values \(\sigma_{ij}\) are the singular values of the matrix \(A\).
Amazingly, it can be proven that every matrix \(A\in\mathbb{R}^{m\times n}\) has a singular value decomposition.
For example, let \[
A = \begin{bmatrix}3 & 1\\ 1 & 3\end{bmatrix}.
\] This matrix has singular value decomposition \(A = U\Sigma V^T\) where the singular values are \(\sigma_1=4\) and \(\sigma_2=2\). The first singular vector (in \(U\)) points along the direction \([1,1]^T\) and corresponds to the dominant variation in the matrix. In data analysis we often keep only a few largest singular values to approximate a matrix with lower rank—a technique used in latent semantic analysis and image compression.
Class activity (1–2 minutes): Use NumPy (or by hand if you prefer) to compute the singular values of the matrix \(\begin{bmatrix}2 & 0\\ 0 & 1\end{bmatrix}\). Discuss how the singular values relate to how the matrix scales different directions.