Linear Algebra Refresher

Introduction

Introduction to linear algebra

Linear algebra is the branch of mathematics involving vectors and matrices. In particular, how vectors are transformed. Knowledge of linear algebra is essential in data science.

Linear algebra

Linear algebra allows us to understand the operations and transformations used to manipulate and extract information from data.

Examples

  • Deep neural networks use matrix-vector and matrix-matrix multiplication.
  • Natural language processing use the dot product to determine word similarity.
  • Least squares uses matrix inverses and matrix factorizations to compute models for predicting continuous values.
  • PCA (dimensionality reduction) uses the matrix factorization called the Singular Value Decomposition (SVD).
  • Graphs are described by adjacency matrices. Eigenvectors and eigenvalues of this matrix provide information about the graph structure.
  • Recommendation systems represent the interactions between users and items as a large matrix. Linear algebra is then used to compute similarities and predict missing ratings.

Linear algebra is used to implement data science algorithms efficiently and accurately.

You will not have to program linear algebra algorithms. You will use appropriate Python packages.

The goal of this lecture is to refresh the following topics:

  • vectors,
  • matrices,
  • operations with vectors and matrices,
  • eigenvectors and eigenvalues,
  • linear systems and least squares,
  • matrix factorizations.

Below is a list of very useful resources for learning about linear algebra:

  • Linear Algebra and Its Applications (6th edition), David C. Lay, Judi J. McDonald, and Steven R. Lay, Pearson, 2021,
  • Introduction to Linear Algebra (6th edition), Gilbert Strang, Wellesley-Cambridge Press, 2023,
  • Linear Algebra and Learning from Data, Gilbert Strang, Wellesley-Cambridge Press, 2019,
  • Numerical Linear Algebra, Lloyn N. Trefethen and David Bau, SIAM, 1997.

Vectors and Vector Operations

Vectors

A vector of length \(n\), \(\mathbf{x}\in\mathbb{R}^{n}\), is a 1-dimensional (1-D) array of real numbers

\[ \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}. \]

When discussing vectors we will only consider column vectors. A row vector can always be obtained from a column vector via transposition

\[ \mathbf{x}^{T} = [x_1, x_2, \ldots, x_n]. \]

Examples of Vectors

In practice we use vectors to encode measurements or features.

  • a housing dataset with two features: \(\mathbf{x}=[\text{bedrooms},\text{area}]^T\) where the first component counts bedrooms and the second measures square footage.

  • an image patch can be represented by a long vector of pixel intensities.

Class activity (1–2 minutes): With a partner, list three different kinds of data that you can naturally represent as vectors. Share your examples with the class.

Examples: Geographic coordinates (Lat, Long, Elevation), RGB color intensities, term frequencies, sensor readings…

Geometric interpretation of vectors

  • Vectors in \(\mathbb{R}^{2}\) can be visualized as points in a 2-D plane (or arrows originating at the origin),
  • Vectors in \(\mathbb{R}^{3}\) can be visualized as points in a 3-D space (or arrows originating at the origin).
  • Although we can’t visualize vectors in \(\mathbb{R}^{n}\) for \(n>3\), we can still think of them as points in an \(n\)-dimensional space, although the curse of dimensionality comes into play (more later)

Let

\[ \mathbf{x}=\begin{bmatrix} 2 \\ 2 \end{bmatrix},~ \mathbf{y} = \begin{bmatrix} 3 \\ -1 \end{bmatrix},~ \mathbf{z} = \begin{bmatrix} -2 \\ -1 \end{bmatrix}. \]

These vectors are illustrated in Figure 1.

Code
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = plt.gca()
x = np.array([2, 2])
y = np.array([3, -1])
z = np.array([-2, -1])
V = np.array([x, y, z])
origin = np.array([[0, 0, 0], [0, 0, 0]])
plt.quiver(*origin, V[:, 0], V[:, 1], 
           color=['r', 'b', 'g'], 
           angles='xy', 
           scale_units='xy', 
           scale=1)
ax.set_xlim([-6, 6])
ax.set_ylim([-2, 4])
ax.text(3.3, -1.1, '$(3,-1)$', size=16)
ax.text(2.3, 1.9, '$(2,2)$', size=16)
ax.text(-3.7, -1.3, '$(-2,-1)$', size=16)
ax.grid()
plt.show()
Figure 1: Illustration of vectors

Vector Operations

Scalar multiplication: Let \(c\in\mathbb{R}\), \(\mathbf{x}\in\mathbb{R}^{n}\), then \[ c\mathbf{x} = \begin{bmatrix} cx_1 \\ cx_2 \\ \vdots \\ cx_n \end{bmatrix}. \]


Multiplication by a scalar \(c\in\mathbb{R}\).

  • For \(c>1\) the vector is lengthened.
  • For \(0<c<1\) the vector shrinks.
  • If we negate \(c\) the direction of the vector is flipped 180 degrees.

Figure Figure 2 shows the vector \(\mathbf{x} = [2, 2]\) multiplied by the scalar value \(c=2\).

Code
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = plt.gca()
x = np.array([2, 2])
y = np.array([4, 4])
V = np.array([x, y])
origin = np.array([[0, 0], [0, 0]])
plt.quiver(*origin, V[:, 0], V[:, 1], 
           color=['r', 'b'], 
           angles='xy', 
           scale_units='xy', 
           scale=1,
           alpha= 0.5)
ax.set_xlim([-5, 5])
ax.set_ylim([-1, 5])
ax.text(2.3, 1.9, '$x$', size=16)
ax.text(4.3, 3.9, '$cx$', size=16)
ax.grid()
plt.show()
Figure 2: Scalar multiplication of a vector

Example: Image processing

  • in image processing each pixel’s color is typically represented by a 3‑element RGB vector.
  • Multiplying the RGB vector by a scalar less than one darkens the pixel, while
  • multiplying by a scalar greater than one brightens it.

E.g. scaling the color \((0.2,0.5,0.7)\) by \(0.5\) yields \((0.1,0.25,0.35)\)

Scaling by \(2\) yields \((0.4,1.0,1.0)\) (clipping values to stay between \(0\) and \(1\)).


Vector addition

Let \(\mathbf{u},\mathbf{v}\in\mathbb{R}^{n}\) then

\[ \mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{bmatrix}. \]


We plot the sum of \(\mathbf{u} = [1, 2]\) and \(\mathbf{v} = [4, 1]\) in Figure 3. The sum \(\mathbf{u} + \mathbf{v} = [5, 3]\) is obtained by placing the tip of one vector to the tail of the other vector.

Code
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = plt.gca()
u = np.array([1, 2])
v = np.array([4, 1])
w = np.array([5, 3])
V = np.array([u, v, w])
origin = np.array([[0, 0, 0], [0, 0, 0]])
plt.quiver(*origin, V[:, 0], V[:, 1], 
           color=['b', 'b', 'r'], 
           angles='xy', 
           scale_units='xy', 
           scale=1)
ax.set_xlim([-1, 6])
ax.set_ylim([-1, 4])
ax.text(1.3, 1.9, '$u$', size=16)
ax.text(4.3, 1.2, '$v$', size=16)
ax.text(5.3, 2.9, '$u+v$', size=16)
plt.plot([1, 5], [2, 3], 'g--')
plt.plot([4, 5], [1, 3], 'g--')
ax.grid()
plt.show()
Figure 3: Vector addition

Vector addition occurs whenever you combine measurements componentwise.

For example,

  • if \(\mathbf{u}=[100,80]\) and \(\mathbf{v}=[20,50]\) record the number of items sold by two shops on Monday and Tuesday, then
  • their sum \(\mathbf{u}+\mathbf{v}=[120,130]\) gives the total items sold by both shops on those days.

Class activity (1–2 minutes): Add the vectors \(\mathbf{u}=[-1,4]\) and \(\mathbf{v}=[3,-2]\) by hand and sketch the result. Discuss how the parallelogram rule visualizes vector addition.


Dot product: Let \(\mathbf{u},\mathbf{v}\in\mathbb{R}^{n}\) then the dot product is defined as \[ \mathbf{u}\cdot\mathbf{v} = \sum_{i=0}^n u_i v_i. \]

Vector 2-norm: The 2-norm of a vector \(\mathbf{v}\in\mathbb{R}^{n}\) is defined as \[ \Vert \mathbf{v}\Vert_2 = \sqrt{\mathbf{v}\cdot\mathbf{v}} = \sqrt{\sum_{i=1}^n v_i^2}. \]

This norm is referred to as the \(\ell_2\) norm. In these notes, the notation \(\Vert \mathbf{v} \Vert\), indicates the 2-norm.


Dot products and norms arise constantly in data science.

Let’s look at another numerical example.

If \(\mathbf{u}=[1,2,0]\) and \(\mathbf{v}=[3,-1,4]\), then \[ \mathbf{u}\cdot\mathbf{v} = 1\cdot 3 + 2\cdot(-1) + 0\cdot 4 = 1. \]

  • norm of \(\mathbf{u}\) is \(\|\mathbf{u}\|_2 = \sqrt{1^2+2^2+0^2} = \sqrt{5}\) and the
  • norm of \(\mathbf{v}\) is \(\sqrt{3^2+(-1)^2+4^2}=\sqrt{26}\).

Definitions

Unit vector: A unit vector \(\mathbf{v}\) is a vector such that \(\Vert \mathbf{v} \Vert_2 = 1\). - All vectors of the form \(\frac{\mathbf{v}}{\Vert \mathbf{v} \Vert_2 }\) are unit vectors.

Distance: Let \(\mathbf{u},\mathbf{v}\in\mathbb{R}^{n}\), the distance between \(\mathbf{u}\) and \(\mathbf{v}\) is \[ \Vert \mathbf{u} - \mathbf{v} \Vert_2. \]

Orthogonality: Two vectors \(\mathbf{u},\mathbf{v}\in\mathbb{R}^{n}\) are orthogonal if and only if \(\mathbf{u}\cdot\mathbf{v}=0\).

Angle between vectors: Let \(\mathbf{u},\mathbf{v}\in\mathbb{R}^{n}\), the angle between these vectors is \[ \cos{\theta} = \frac{\mathbf{u}\cdot\mathbf{v}}{\Vert \mathbf{u}\Vert_2 \Vert\mathbf{v}\Vert_2}. \]


The dot product of two vectors \(\mathbf{u}, \mathbf{v}\) can be used to project \(\mathbf{u}\) onto \(\mathbf{v}\)

\[ \mathrm{proj}_{\mathbf{v}}\mathbf{u} = \frac{\mathbf{u}\cdot\mathbf{v}}{\Vert \mathbf{v} \Vert^2}\mathbf{v}. \]

This is illustrated in Figure 4.

Code
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = plt.gca()

# Define vectors u and v
u = np.array([1, 2])
v = np.array([4, 1])

# Project u onto v
projection = np.dot(u, v) * v / np.dot(v, v)

V = np.array([u, v, projection])

origin = np.array([[0, 0, 0], [0, 0, 0]])
plt.quiver(*origin, V[:, 0], V[:, 1], color=['b', 'b', 'r'], angles='xy', scale_units='xy', scale=1)

ax.set_xlim([-1, 6])
ax.set_ylim([-1, 4])
ax.text(1.3, 1.9, r'$\mathbf{u}$', size=16)
ax.text(4.3, 1.2, r'$\mathbf{v}$', size=16)
ax.text(0.4, -0.3, r'$\mathrm{proj}_{\mathbf{v}}\mathbf{u}$', size=16)

plt.plot([u[0], projection[0]], [u[1], projection[1]], 'g--')

ax.grid()
plt.show()
Figure 4: Dot product

Observe that a right angle forms between the vectors \(\mathbf{u}\) and \(\mathbf{v}\) when \(\mathbf{u}\cdot \mathbf{v} = 0\).

And we can calculate the angle between \(\mathbf{u}\) and \(\mathbf{v}\).

Code
# Define vectors u and v
u = np.array([1, 2])
v = np.array([4, 1])

theta = np.arccos(np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v)))
print(f"The angle between u and v is {theta} radians or {np.degrees(theta)} degrees.")
The angle between u and v is 0.8621700546672264 radians or 49.398705354995535 degrees.

Such cosine similarities are used in natural language processing to quantify how similar two word‑embedding vectors are.

Class activity (1–2 minutes):

Compute the dot product of \(\mathbf{u}=[1,2,0]\) and \(\mathbf{v}=[3,-1,4]\).

Are the vectors orthogonal?

Next compute \(\|\mathbf{u}\|_2\), \(\|\mathbf{v}\|_2\) and the angle between them.

More Definitions

Linear dependence: A set of \(n\) vectors \(\mathbf{v}_{1}, \ldots, \mathbf{v}_{n}\in\mathbb{R}^n\) is linearly dependent if there exists scalars \(a_1,\ldots, a_n\) not all zero such that \[ \sum_{i=1}^{n} a_i \mathbf{v}_i = 0. \]

Linear independence: The vectors \(\mathbf{v}_{1}, \ldots, \mathbf{v}_{n}\) are linearly independent if they are not linearly dependent, i.e., the equation \[ a_1 \mathbf{v}_1 + \cdots + a_n \mathbf{v}_n = 0, \]

is only satisfied if \(a_i=0\) for \(i=1, \ldots,n\).

Span: Given a set of vectors \(V = \{\mathbf{v}_{1}, \ldots, \mathbf{v}_{n}\}\), where \(\mathbf{v}_i\in\mathbb{R}^n\), the span(V) is the set of all linear combinations of vectors in \(V\).


Let’s look at an example.

For example, the vectors \(\mathbf{v}_1=(1,1,0)\) and \(\mathbf{v}_2=(2,2,0)\) are linearly dependent because \(\mathbf{v}_2=2\mathbf{v}_1\).

In contrast, the standard basis vectors \(e_1=(1,0,0)\), \(e_2=(0,1,0)\) and \(e_3=(0,0,1)\) in \(\mathbb{R}^3\) are linearly independent, and their span is the entire three‑dimensional space.

When working with data, linear independence tells us whether a set of features carries redundant information.

Class activity (1–2 minutes): Consider the set of vectors \(\{[1,0], [0,1], [1,1]\}\) in \(\mathbb{R}^2\). Work with a neighbor to decide whether these vectors are linearly independent. What is their span? Justify your answer.

Matrices

Matrices

A matrix \(A\in\mathbb{R}^{m\times n}\) is a 2-D array of numbers

\[ A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} &a_{m2} & \cdots & a_{mn} \\ \end{bmatrix}, \]

with \(m\) rows and \(n\) columns. The element at row \(i\) and column \(j\) is denoted \(a_{ij}\). If \(m=n\) we call it a square matrix.

Note: By convention, matrix indexing is opposite of graphical vector indexing.


Similar to vectors, we can multiply matrices by scalar values and add matrices of the same dimension, i.e.,

Scalar multiplication: Let \(c\in\mathbb{R}\) and \(A\in\mathbb{R}^{m\times n}\), then \[ cA = \begin{bmatrix} ca_{11} & ca_{12} & \cdots & ca_{1n} \\ ca_{21} & ca_{22} & \cdots & ca_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ ca_{m1} & ca_{m2} & \cdots & ca_{mn} \\ \end{bmatrix}. \]


Matrix addition: Let \(A, B\in\mathbb{R}^{m\times n}\), then \[ A + B = \begin{bmatrix} a_{11} + b_{11} & a_{12} + b_{12} & \cdots & a_{1n} + b_{1n} \\ a_{21} + b_{21} & a_{22} + b_{22} & \cdots & a_{2n} + b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & a_{m2} + b_{m2} & \cdots & a_{mn} + b_{mn} \\ \end{bmatrix} \]


Transpose: The transpose \(A^{T}\) is defined as

\[ A^{T} = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} &a_{2n} & \cdots & a_{nm} \\ \end{bmatrix}. \]

The transpose turns columns of the matrix into rows (equivalently rows into columns). A square matrix is called symmetric if \(A=A^{T}\).

Matrix multiplication

We discuss the following two important matrix multiplication operations

  • matrix-vector multiplication,
  • matrix-matrix multiplication.

Matrix-vector multiplication

Let \(A\in\mathbb{R}^{m\times n}\) and \(\mathbf{x}\in\mathbb{R}^{n}\), then \(A\mathbf{x}\in\mathbb{R}^{m}\) can be defined row-wise as

\[ A\mathbf{x} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} x_1a_{11} + x_2 a_{12} + \cdots + x_na_{1n} \\ x_1a_{21} + x_2 a_{22} + \cdots + x_na_{2n} \\ \vdots \\ x_1a_{m1} + x_2 a_{m2} + \cdots + x_na_{mn} \\ \end{bmatrix}. \]

Equivalently, this means that \(A\mathbf{x}\) is a linear combination of the columns of \(A\), i.e.,

\[ A\mathbf{x} = x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{bmatrix} + \cdots + x_n \begin{bmatrix} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{bmatrix}. \]

Observe that the matrix \(A\) is a linear transformation that maps vectors in \(\mathbb{R}^{n}\) to \(\mathbb{R}^{m}\).

Matrix-matrix multiplication

Let \(A\in\mathbb{R}^{m\times n}\) and \(B\in\mathbb{R}^{n\times p}\), then the elements of \(C=AB\in\mathbb{R}^{m\times p}\) are

\[ c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj}, \]

for \(i=1,\ldots, m\) and \(j=1, \ldots, p\).

\[ AB = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1p} \\ b_{21} & b_{22} & \cdots & b_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{np} \\ \end{bmatrix} = \begin{bmatrix} c_{11} & c_{12} & \cdots & c_{1p} \\ c_{21} & c_{22} & \cdots & c_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ c_{m1} & c_{m2} & \cdots & c_{mp} \\ \end{bmatrix}. \]

Vector spaces

Vector spaces

An essential concept in linear algebra is the notion of a vector space.

A vector space is a set \(V\) such that for any two vectors in the set, say \(\mathbf{u},\mathbf{v}\in V\), and any scalars \(c\) and \(d\), the linear combination \(c\mathbf{u} + d\mathbf{v}\) is also in \(V\).

In addition, a vector space must satisfy the following properties

  1. \(\mathbf{u} + (\mathbf{v} + \mathbf{w}) = (\mathbf{u} + \mathbf{v}) + \mathbf{w}\) (associativity).
  2. \(\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}\) (commutativity).
  3. There exists \(\mathbf{0}\in V\) such that \(\mathbf{v} + \mathbf{0} = \mathbf{v}\) for all \(\mathbf{v}\in V\) (additive identity element).
  4. For every \(\mathbf{v}\in V\), there exists \(-\mathbf{v}\in V\) such that \(\mathbf{v} + (-\mathbf{v}) = \mathbf{0}\) (additive inverse).
  5. \(c(d\mathbf{v}) = (cd)\mathbf{v}\) (associativity).
  6. \(1\mathbf{v} = \mathbf{v}\) (multiplicative identity element).
  7. \(c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}\) (distributivity).
  8. \((c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}\) (distributivity).

Some examples of vector spaces are:

  • The set of n-dimensional vectors with real numbers, i.e., \(\mathbb{R}^n\).
  • Given a matrix \(A\in\mathbb{R}^{m\times n}\),
    • the column space \(col(A)\), which is the span of all columns in the matrix \(A\) is a vector space.
    • The similarly defined row space is also a vector space.
  • Given a matrix \(A\in\mathbb{R}^{n\times n}\), the set of all solutions to the equation \(A\mathbf{x} = \mathbf{0}\) is a vector space. This space is called the null space of matrix \(A\).
  • The set of all \(m\times n\) matrices with real numbers is also a vector space.
  • The set of all polynomials of degree \(n\) is a vector space.

Important matrices

Important matrices

We introduce notation for some commonly used and important matrices.

The \(n \times n\) identity matrix is

\[ \mathbf{I} = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \\ \end{bmatrix}. \]

For every \(\mathbf{A}\in\mathbb{R}^{n\times n}\), then \(\mathbf{AI} = \mathbf{IA}\).

In practice the identity matrix leaves any vector unchanged. For example, \[ \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix}3\\ -2 \end{bmatrix} = \begin{bmatrix}3\\ -2 \end{bmatrix}. \] Because of this property we sometimes call \(I\) the do nothing transform.

The inverse \(A^{-1}\in\mathbb{R}^{n\times n}\) is defined as the matrix for which \[AA^{-1} = A^{-1}A = I,\]

When \(A^{-1}\) exists the matrix is said to be invertible.

Note that \((AB)^{-1} = B^{-1}A^{-1}\) for invertible \(B\in\mathbb{R}^{n\times n}\).


For example, the \(2\times 2\) matrix \[ A = \begin{bmatrix}1 & 2\\ 3 & 4\end{bmatrix} \] has inverse \[ A^{-1} = \begin{bmatrix}-2 & 1\\ 1.5 & -0.5\end{bmatrix}, \] and one can check that \(AA^{-1} = I\) and \(A^{-1}A = I\).



Class activity (1–2 minutes): Show that \(AA^{-1} = I\) and \(A^{-1}A = I\).

A diagonal matrix \(D\in\mathbb{R}^{n\times n}\) has entries \(d_{ij}=0\) if \(i\neq j\), i.e.,

\[ D = \begin{bmatrix} d_{11} & 0 & \cdots & 0 \\ 0 & d_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_{nn} \\ \end{bmatrix}. \]

A square matrix \(Q\in\mathbb{R}^{n}\) is orthogonal if

\[QQ^{T}=Q^{T}Q=I.\]

In particular, the inverse of an orthogonal matrix is it’s transpose.

Note: Think about this in terms of matrix multiplication of rows and columns and the dot product of orthogonal vectors.


A familiar example of an orthogonal matrix is the \(2\times 2\) rotation matrix \[ R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta\\ \sin \theta & \cos \theta\end{bmatrix}. \] For \(\theta=45^\circ\) this becomes \(\big[\tfrac{\sqrt{2}}{2}, -\tfrac{\sqrt{2}}{2}; \tfrac{\sqrt{2}}{2}, \tfrac{\sqrt{2}}{2}\big]\), and one can verify that \(R(\theta)R(\theta)^T=I\).

Orthogonal matrices preserve lengths and angles, which is why they model rigid rotations and reflections.



Class activity (1–2 minutes): Show that the \(2\times 2\) matrix \(\begin{bmatrix}0 & -1\\ 1 & 0\end{bmatrix}\) and check that it is orthogonal. What geometric transformation does it represent?

A lower triangular matrix \(L\in\mathbb{R}^{n\times n}\) is a matrix where all the entries above the main diagonal are zero

\[ L = \begin{bmatrix} l_{11} & 0 & \cdots & 0 \\ l_{12} & l_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ l_{1n} & l_{n2} & \cdots & l_{nn} \\ \end{bmatrix}. \]


Solving a linear system \(L\mathbf{x}=\mathbf{b}\) with \(L\) lower triangular can be done efficiently by forward substitution.

For example, let \[ L = \begin{bmatrix}1 & 0\\ 2 & 3\end{bmatrix},\quad \mathbf{b} = \begin{bmatrix}1\\4\end{bmatrix}. \] Then the system \(L\mathbf{x}=\mathbf{b}\) yields \(x_1=1\) from the first equation and \(2x_1+3x_2=4\) from the second, giving \(x_2=\tfrac{2}{3}\).

Class activity (1–2 minutes): Solve the lower triangular system \(\begin{bmatrix}1 & 0\\ -1 & 2\end{bmatrix}\mathbf{x} = \begin{bmatrix}3\\1\end{bmatrix}\) by forward substitution.

An upper triangular matrix \(U\in\mathbb{R}^{n\times n}\) is a matrix where all the entries below the main diagonal are zero

\[ U = \begin{bmatrix} u_{11} & u_{12} & \cdots & u_{1n} \\ 0 & u_{22} & \cdots & u_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & u_{nn} \\ \end{bmatrix}. \]


Upper triangular systems are solved by backward substitution. For example, if \[ U = \begin{bmatrix}4 & 1\\ 0 & 5\end{bmatrix},\quad \mathbf{b} = \begin{bmatrix}9\\10\end{bmatrix}, \] the equation \(U\mathbf{x}=\mathbf{b}\) implies \(5x_2=10\) so \(x_2=2\), and then \(4x_1 + x_2 =9\) gives \(x_1 = \tfrac{7}{4}\).

Class activity (1–2 minutes): Solve the upper triangular system \(\begin{bmatrix}2 & 3\\ 0 & -1\end{bmatrix}\mathbf{x} = \begin{bmatrix}7\\ -2\end{bmatrix}\) by backward substitution.

The inverse of a lower triangular matrix is itself a lower triangular matrix. This is also true for upper triangular matrices, i.e., the inverse is also upper triangular.

Eigenvalues and eigenvectors

An eigenvector of an \(n\times n\) matrix \(\mathbf{A}\) is a nonzero vector \(\mathbf{x}\) such that

\[ A\mathbf{x} = \lambda\mathbf{x} \]

for some scalar \(\lambda.\)

The scalar \(\lambda\) is called an eigenvalue.

An \(n \times n\) matrix has at most \(n\) distinct eigenvectors and at most \(n\) distinct eigenvalues.


For example, consider the matrix \[ A = \begin{bmatrix}2 & 1\\1 & 2\end{bmatrix}. \] It has eigenvalues \(\lambda_1=3\) and \(\lambda_2=1\) with corresponding eigenvectors \(\mathbf{v}_1=[1,1]^T\) and \(\mathbf{v}_2=[1,-1]^T\).

Eigenvectors reveal directions that the matrix stretches or compresses without changing direction.



Class activity (1–2 minutes): Verify that \(A\mathbf{v}_1 = 3\mathbf{v}_1\) and \(A\mathbf{v}_2 = 1\mathbf{v}_2\).

Matrix decompositions

We introduce here important matrix decompositions. These are useful in solving linear equations. Furthermore they play an important role in various data science applications.

LU factorization

An LU decomposition of a square matrix \(A\in\mathbb{R}^{n\times n}\) is a factorization of \(A\) into a product of matrices

\[ A = LU, \]

where \(L\) is a lower triangular square matrix and \(U\) is an upper triangular square matrix. For example, when \(n=3\), we have

\[ \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0\\ l_{31} & l_{32} & a_{33} \\ \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \\ \end{bmatrix} \]

It simplifies the process of solving linear equations and is more numerically stable than computing the inverse of \(A\).


Then instad of computing the inverse of \(A\), we can solve the linear system \(A\mathbf{x}=\mathbf{b}\) in two steps:

  1. Forward substitution: First, solve the equation \(L\mathbf{y} = \mathbf{b}\) for \(\mathbf{y}\).
  2. Backward substitution: Then, solve \(U\mathbf{x} = \mathbf{y}\) for \(\mathbf{x}\).

This method is particularly useful because triangular matrices are much easier to work with. It can make solving linear systems faster and more efficient, especially for large matrices.


For example, consider the matrix \[ A = \begin{bmatrix}4 & 3\\6 & 3\end{bmatrix}. \] One possible LU factorization is \[ A = LU = \begin{bmatrix}1 & 0\\ 1.5 & 1\end{bmatrix}\begin{bmatrix}4 & 3\\ 0 & -1.5\end{bmatrix}. \] Once we have \(L\) and \(U\) we can solve \(A\mathbf{x}=\mathbf{b}\) by first solving \(L\mathbf{y}=\mathbf{b}\) and then \(U\mathbf{x}=\mathbf{y}\).

QR decomposition

A QR decomposition of a square matrix \(A\in\mathbb{R}^{n\times n}\) is a factorization of \(A\) into a product of matrices

\[ A=QR, \] where \(Q\) is an orthogonal square matrix and \(R\) is an upper-triangular square matrix.

QR factorization is useful in solving linear systems and performing least squares fitting.


This factorization has a couple of important benefits:

  1. Solving Linear Systems: When you’re working with a system of equations represented by \(A\mathbf{x} = \mathbf{b}\), you can substitute the QR factorization into this equation:

    \[ QR\mathbf{x} = \mathbf{b} \]

    Since \(Q\) is orthogonal, you can multiply both sides by \(Q^T\) (the transpose of \(Q\)) to simplify it:

    \[ R\mathbf{x} = Q^T\mathbf{b} \]

    Now, you can solve this upper triangular system for \(\mathbf{x}\) using backward substitution, which is typically easier and more stable.

  2. Least Squares Problems: In many data science applications, you want to find the best fit line or hyperplane for your data. QR factorization is particularly useful here because it helps in minimizing the error when the system is overdetermined (more equations than unknowns). You can solve the least squares problem by leveraging the QR decomposition to find:

    \[ \hat{\mathbf{x}} = R^{-1}Q^T\mathbf{b} \]

By converting the problem into a triangular system, QR factorization often provides a more stable numerical solution than other methods, especially for poorly conditioned matrices.


As a simple example, consider the \(2\times 2\) matrix \[ A = \begin{bmatrix}1 & 1\\1 & -1\end{bmatrix}. \] Using the Gram–Schmidt process we can factor it as \[ A = QR \quad\text{with}\quad Q = \tfrac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\1 & -1\end{bmatrix},\quad R = \sqrt{2}\begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix}. \] Here \(Q\) has orthonormal columns and \(R\) is upper triangular.

Eigendecomposition

Let \(A\in\mathbb{R}^{n\times n}\) have \(n\) linearly independent eigenvectors \(\mathbf{x}_i\) for \(i=1,\ldots, n\), then \(A\) can be factorized as

\[ A = X\Lambda X^{-1}, \]

where the columns of matrix \(X\) are the eigenvectors of \(A\), and

\[ \Lambda = \begin{bmatrix} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n} \\ \end{bmatrix}, \]

is a diagonal matrix of the eigenvalues.

In this case, the matrix \(A\) is said to be diagonalizable.


Instead of calulating \(A\mathbf{x}\) directly, we can map it by first transforming it using \(X^{-1}\) and then stretching it by \(\Lambda\) and finally transforming it using \(X\).

\[ A \mathbf{x} = X \Lambda X^{-1} \mathbf{x} \]

We’ll use eigendecomposition in Principal Component Analysis (PCA) to reduce dimensionality in datasets while preserving as much variance as possible.

A special case occurs when \(A\) is symmetric. Recall that a matrix is symmetric when \(A = A^T.\)

In this case, it can be shown that the eigenvectors of \(A\) are all mutually orthogonal. Consequently, \(X^{-1} = X^{T}\) and we can decompose \(A\) as:

\[A = XDX^T.\]

This is known as the spectral theorem and this decomposition of \(A\) is its spectral decomposition. The eigenvalues of a matrix are also called its spectrum.

For symmetric matrices the eigenvectors can be chosen to be orthonormal, so \(X^{-1} = X^T\). In the earlier example \(A=\begin{bmatrix}2&1\\1&2\end{bmatrix}\), which is symmetric, its eigendecomposition can be written in orthonormal form as \(A = X \Lambda X^T\) with \[ X = \tfrac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\ 1 & -1\end{bmatrix},\quad \Lambda = \mathrm{diag}(3,1). \] Because the columns of \(X\) are orthonormal, the factorization \(A = X\Lambda X^T\) is often easier to use numerically and conceptually.

Singular value decomposition

For the previous few examples, we required \(\mathbf{A}\) to be square. Now let \(A\in\mathbb{R}^{m\times n}\) with \(m>n\), then \(A\) admits a decomposition

\[ A = U\Sigma V^{T}. \] The matrices \(U\in\mathbb{R}^{m\times m}\) and \(V\in\mathbb{R}^{n\times n}\) are orthogonal. The columns of \(U\) are the left singular vectors and the columns of \(V\) are the right singular vectors.

The matrix \(\Sigma\in\mathbb{R}^{m\times n}\) is a diagonal matrix of the form

\[ \Sigma = \begin{bmatrix} \sigma_{11} & 0 & \cdots & 0 \\ 0 & \sigma_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{mn} \\ \end{bmatrix}. \]

The values \(\sigma_{ij}\) are the singular values of the matrix \(A\).

Amazingly, it can be proven that every matrix \(A\in\mathbb{R}^{m\times n}\) has a singular value decomposition.


For example, let \[ A = \begin{bmatrix}3 & 1\\ 1 & 3\end{bmatrix}. \] This matrix has singular value decomposition \(A = U\Sigma V^T\) where the singular values are \(\sigma_1=4\) and \(\sigma_2=2\). The first singular vector (in \(U\)) points along the direction \([1,1]^T\) and corresponds to the dominant variation in the matrix. In data analysis we often keep only a few largest singular values to approximate a matrix with lower rank—a technique used in latent semantic analysis and image compression.

Class activity (1–2 minutes): Use NumPy (or by hand if you prefer) to compute the singular values of the matrix \(\begin{bmatrix}2 & 0\\ 0 & 1\end{bmatrix}\). Discuss how the singular values relate to how the matrix scales different directions.

Linear Systems and Least Squares

Linear systems of equations

A system of \(m\) linear equations in \(n\) unknowns can be written as

\[ \begin{align*} a_{11} x_{1} + a_{12} x_{2} + \cdots + a_{1n} x_{n} &= b_1, \\ a_{21} x_{1} + a_{22} x_{2} + \cdots + a_{2n} x_{n} &= b_2, \\ \vdots \qquad \qquad \quad \\ a_{m1} x_{1} + a_{m2} x_{2} + \cdots + a_{mn} x_{n} &= b_m.\\ \end{align*} \]

Observe that this is simply the matrix vector equation

\[ A\mathbf{x}=\mathbf{b}. \]

\[ A\mathbf{x} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix}. \]


A linear system of equations may have

  • no solution,
  • a unique solution,
  • or infinitely many solutions

depending on the relationship between the rows and columns of \(A\).

Whether \(m<n\) or \(m>n\) does not by itself determine the number of solutions

the key is the rank of \(A\).


For example, the system \[ 2x + y = 5,\\ x - y = 1 \] has a unique solution \((x,y)=(2,1)\).

In contrast, the system \[ 2x + y = 5,\\ 4x + 2y = 10 \] has infinitely many solutions because the second equation is just twice the first.

Finally the system \[ x + y = 2,\\ x + y = 3 \] has no solution because the two lines are parallel and never intersect.


When the coefficient matrix \(A\) is invertible (square and full rank), there is always a unique solution given by \(\mathbf{x}=A^{-1}\mathbf{b}\).

When \(A\) is rank deficient or when there are more equations than unknowns, a solution may not exist or there may be infinitely many solutions.

Rank deficient means that the matrix has less than \(n\) linearly independent columns.

Linear systems of equations continued

We can use matrix factorizations to help us solve a linear system of equation. We demonstrate how to do this with the LU decomposition. Observe that \[A\mathbf{x} = LU\mathbf{x} = \mathbf{b}.\]

Then

\[ \mathbf{x} = U^{-1}L^{-1}\mathbf{b}. \]

The process of inverting \(L\) and \(U\) is called backward and forward substitution.

Least squares

In data science it is often the case that we have to solve the linear system

\[ A \mathbf{x} = \mathbf{b}. \]

This problem may have no solution – perhaps due to noise or measurement error.

In such a case, we look for a vector \(\mathbf{x}\) such that \(A\mathbf{x}\) is a good approximation to \(\mathbf{b}.\)

The quality of the approximation can be measured using the distance from \(A\mathbf{x}\) to \(\mathbf{b},\) i.e.,

\[ \Vert A\mathbf{x} - \mathbf{b}\Vert_2. \]

The general least-squares problem is given \(A\in\mathbb{R}^{m\times n}\) and and \(\mathbf{b}\in\mathbb{R}^{m}\), find a vector \(\hat{\mathbf{x}}\in\mathbb{R}^{n}\) such that \(\Vert A\mathbf{x}-\mathbf{b}\Vert_2\) is minimized, i.e. 

\[ \hat{\mathbf{x}} = \arg\min_\mathbf{x} \Vert A\mathbf{x} - \mathbf{b}\Vert. \]

This emphasizes the fact that the least squares problem is a minimization problem.

Minimizations problems are an example of a broad class of problems called optimization problems. In optimization problems we attempt to find an optimal solution that minimizes (or maximizes) a set particular set of equations (and possibly constraints).

We can connect the above minimization of the distance between vectors to the minimization of the sum of squared errors. Let \(\mathbf{y} = A\mathbf{x}\) and observe that

\[\Vert A\mathbf{x}-\mathbf{b}\Vert_2^2 = \Vert \mathbf{y}-\mathbf{b}\Vert_2^2 = \sum_i (y_i-b_i)^2.\]

The above expression is the sum of squared errors. In statistics, the \(y_i\) are the estimated values and the \(b_i\) are the measured values. This is the most common measure of error used in statistics and is a key principle.

Minimizing the length of \(A\mathbf{x} - \mathbf{b}\) is equivalent to minimizing the sum of the squared errors.

We can find \(\hat{\mathbf{x}}\) using either

  • geometric arguments based on projections of the vector \(\mathbf{b}\),
  • by calculus (taking the derivative of the right-hand-side expression above and setting it equal to zero).

Either way, we obtain the result that \(\hat{\mathbf{x}}\) is the solution of:

\[A^TA\mathbf{x} = A^T\mathbf{b}.\]

This system of equations is called the normal equations.

We can prove that these equations always have at least one solution.

When \(A^TA\) is invertible, the system is said to be overdetermined. This means that there is a unique solution

\[\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}.\]

Be aware that computing the solution using \((A^TA)^{-1}A^T\) can be numerically unstable. A more stable method is to use the QR decomposition of \(A\), i.e., \(\hat{\mathbf{x}} = R^{-1}Q^T\mathbf{b}\).

The NumPy function np.linalg.lstsq() solves the least squares problem in a stable way.

Consider the following example where the solution to the least squares problem is x=np.array([1, 1]).

import numpy as np
A = np.array([[-1.42382504, -1.4238264 ],
              [ 1.26372846,  1.26372911], 
              [-0.87066174, -0.87066138]])
b = A @ np.array([1, 1])

# Pseudoinverse
x1 = np.linalg.inv(A.T @ A) @ A.T @ b
print(x1)
# QR
[Q, R] = np.linalg.qr(A)
x2 = np.linalg.solve(R, Q.T @ b)
print(x2)
# np.linalg.lstsq
x3, _, _, _ = np.linalg.lstsq(A, b, rcond=None)
print(x3) 
print(np.linalg.norm(x1-np.array([1, 1])))
print(np.linalg.norm(x2-np.array([1, 1])))
print(np.linalg.norm(x3-np.array([1, 1])))
[1.00010437 0.99918931]
[1. 1.]
[1. 1.]
0.0008173802348693703
1.3212084675495204e-10
4.1498941065303375e-10

Appendix: Python Implementations of Linear Algebra Operations

1. Vector Operations

Vector Creation and Basic Properties

Code
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import qr, lu, svd, eig, solve_triangular
from scipy.sparse.linalg import eigs

# Create vectors
x = np.array([2, 2])
y = np.array([3, -1])
z = np.array([-2, -1])

print(f"Vector x: {x}")
print(f"Vector y: {y}")
print(f"Vector z: {z}")
Vector x: [2 2]
Vector y: [ 3 -1]
Vector z: [-2 -1]

Scalar Multiplication

Code
# Scalar multiplication
c = 2
scaled_x = c * x
print(f"Scalar multiplication: {c} * {x} = {scaled_x}")

# Example: Image processing (RGB scaling)
rgb_color = np.array([0.2, 0.5, 0.7])
darkened = 0.5 * rgb_color
brightened = np.clip(2 * rgb_color, 0, 1)  # Clip to stay in [0,1]
print(f"Original RGB: {rgb_color}")
print(f"Darkened: {darkened}")
print(f"Brightened: {brightened}")
Scalar multiplication: 2 * [2 2] = [4 4]
Original RGB: [0.2 0.5 0.7]
Darkened: [0.1  0.25 0.35]
Brightened: [0.4 1.  1. ]

Vector Addition

Code
# Vector addition
u = np.array([1, 2])
v = np.array([4, 1])
w = u + v
print(f"Vector addition: {u} + {v} = {w}")

# Example: Sales data
shop1_sales = np.array([100, 80])  # Monday, Tuesday
shop2_sales = np.array([20, 50])
total_sales = shop1_sales + shop2_sales
print(f"Total sales: {total_sales}")
Vector addition: [1 2] + [4 1] = [5 3]
Total sales: [120 130]

Dot Product

Code
# Dot product
u = np.array([1, 2, 0])
v = np.array([3, -1, 4])
dot_product = np.dot(u, v)
print(f"Dot product: {u} · {v} = {dot_product}")

# Alternative syntax
dot_product_alt = u @ v  # Matrix multiplication operator
print(f"Dot product (alternative): {dot_product_alt}")
Dot product: [1 2 0] · [ 3 -1  4] = 1
Dot product (alternative): 1

Vector Norms

Code
# Vector 2-norm (Euclidean norm)
norm_u = np.linalg.norm(u)
norm_v = np.linalg.norm(v)
print(f"||u||_2 = {norm_u}")
print(f"||v||_2 = {norm_v}")

# Manual calculation
norm_u_manual = np.sqrt(np.sum(u**2))
print(f"||u||_2 (manual): {norm_u_manual}")
||u||_2 = 2.23606797749979
||v||_2 = 5.0990195135927845
||u||_2 (manual): 2.23606797749979

Unit Vectors

Code
# Create unit vector
unit_u = u / np.linalg.norm(u)
print(f"u: {u}")
print(f"Unit vector from u: {unit_u}")
print(f"Norm of unit vector: {np.linalg.norm(unit_u)}")
u: [1 2 0]
Unit vector from u: [0.4472136  0.89442719 0.        ]
Norm of unit vector: 0.9999999999999999

Distance Between Vectors

Code
# Distance between vectors
distance = np.linalg.norm(u - v)
print(f"Distance between u and v: {distance}")
Distance between u and v: 5.385164807134504

Orthogonality Check

Code
# Check orthogonality
def are_orthogonal(vec1, vec2, tolerance=1e-10):
    return abs(np.dot(vec1, vec2)) < tolerance

print(f"Are u and v orthogonal? {are_orthogonal(u, v)}")

# Example of orthogonal vectors
orth1 = np.array([1, 0])
orth2 = np.array([0, 1])
print(f"Are orth1 and orth2 orthogonal? {are_orthogonal(orth1, orth2)}")
Are u and v orthogonal? False
Are orth1 and orth2 orthogonal? True

Angle Between Vectors

Code
# Angle between vectors
def angle_between_vectors(vec1, vec2):
    cos_theta = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
    # Clamp to avoid numerical errors
    cos_theta = np.clip(cos_theta, -1.0, 1.0)
    return np.arccos(cos_theta)

u_2d = np.array([1, 2])
v_2d = np.array([4, 1])
theta = angle_between_vectors(u_2d, v_2d)
print(f"Angle between u and v: {theta} radians = {np.degrees(theta)} degrees")
Angle between u and v: 0.8621700546672264 radians = 49.398705354995535 degrees

Vector Projection

Code
# Project u onto v
def project_vector(u, v):
    return (np.dot(u, v) / np.dot(v, v)) * v

projection = project_vector(u_2d, v_2d)
print(f"Projection of u onto v: {projection}")
Projection of u onto v: [1.41176471 0.35294118]

Linear Independence Check

Code
# Check linear independence
def check_linear_independence(vectors):
    """Check if a set of vectors is linearly independent"""
    matrix = np.column_stack(vectors)
    rank = np.linalg.matrix_rank(matrix)
    return rank == len(vectors)

# Example vectors
v1 = np.array([1, 0])
v2 = np.array([0, 1])
v3 = np.array([1, 1])

vectors = [v1, v2, v3]
is_independent = check_linear_independence(vectors)
print(f"Are vectors {v1}, {v2}, {v3} linearly independent? {is_independent}")
Are vectors [1 0], [0 1], [1 1] linearly independent? False

2. Matrix Operations

Matrix Creation and Basic Properties

Code
# Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])
B = np.array([[7, 8],
              [9, 10],
              [11, 12]])

print(f"Matrix A (2x3):\n{A}")
print(f"Matrix B (3x2):\n{B}")
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")
Matrix A (2x3):
[[1 2 3]
 [4 5 6]]
Matrix B (3x2):
[[ 7  8]
 [ 9 10]
 [11 12]]
A shape: (2, 3)
B shape: (3, 2)

Scalar Multiplication of Matrices

Code
# Scalar multiplication
c = 2
scaled_A = c * A
print(f"Scalar multiplication:\n{scaled_A}")
Scalar multiplication:
[[ 2  4  6]
 [ 8 10 12]]

Matrix Addition

Code
# Matrix addition (matrices must have same dimensions)
C = np.array([[1, 1, 1],
              [2, 2, 2]])
matrix_sum = A + C
print(f"Matrix addition A + C:\n{matrix_sum}")
Matrix addition A + C:
[[2 3 4]
 [6 7 8]]

Matrix Transpose

Code
# Matrix transpose
A_T = A.T
print(f"A transpose:\n{A_T}")

# Check if matrix is symmetric
symmetric_matrix = np.array([[1, 2, 3],
                           [2, 4, 5],
                           [3, 5, 6]])
is_symmetric = np.allclose(symmetric_matrix, symmetric_matrix.T)
print(f"Is matrix symmetric? {is_symmetric}")
A transpose:
[[1 4]
 [2 5]
 [3 6]]
Is matrix symmetric? True

Matrix-Vector Multiplication

Code
# Matrix-vector multiplication
x = np.array([1, 2, 3])
result = A @ x  # or np.dot(A, x)
print(f"Matrix-vector multiplication A @ x:\n{result}")

# Interpretation as linear combination of columns
col1, col2, col3 = A[:, 0], A[:, 1], A[:, 2]
linear_combination = x[0]*col1 + x[1]*col2 + x[2]*col3
print(f"As linear combination of columns:\n{linear_combination}")
Matrix-vector multiplication A @ x:
[14 32]
As linear combination of columns:
[14 32]

Matrix-Matrix Multiplication

Code
# Matrix-matrix multiplication
product = A @ B
print(f"Matrix multiplication A @ B:\n{product}")

# Element-wise calculation (for demonstration)
def matrix_multiply_manual(A, B):
    m, n = A.shape
    n_b, p = B.shape
    assert n == n_b, "Incompatible dimensions"
    
    C = np.zeros((m, p))
    for i in range(m):
        for j in range(p):
            C[i, j] = np.sum(A[i, :] * B[:, j])
    return C

manual_product = matrix_multiply_manual(A, B)
print(f"Manual matrix multiplication:\n{manual_product}")
Matrix multiplication A @ B:
[[ 58  64]
 [139 154]]
Manual matrix multiplication:
[[ 58.  64.]
 [139. 154.]]

3. Special Matrices

Identity Matrix

Code
# Identity matrix
I_3 = np.eye(3)
print(f"3x3 Identity matrix:\n{I_3}")

# Verify identity property
square_matrix = np.array([[1, 2], [3, 4]])
I_2 = np.eye(2)
print(f"A @ I = I @ A:")
print(f"A @ I:\n{square_matrix @ I_2}")
print(f"I @ A:\n{I_2 @ square_matrix}")
3x3 Identity matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
A @ I = I @ A:
A @ I:
[[1. 2.]
 [3. 4.]]
I @ A:
[[1. 2.]
 [3. 4.]]

Matrix Inverse

Code
# Matrix inverse
A_square = np.array([[1, 2], [3, 4]])
A_inv = np.linalg.inv(A_square)
print(f"Matrix A:\n{A_square}")
print(f"Matrix A inverse:\n{A_inv}")

# Verify inverse property
print(f"A @ A_inv:\n{A_square @ A_inv}")
print(f"A_inv @ A:\n{A_inv @ A_square}")

# Check if matrix is invertible
def is_invertible(matrix):
    return np.linalg.det(matrix) != 0

print(f"Is A invertible? {is_invertible(A_square)}")
Matrix A:
[[1 2]
 [3 4]]
Matrix A inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]
A @ A_inv:
[[1.0000000e+00 0.0000000e+00]
 [8.8817842e-16 1.0000000e+00]]
A_inv @ A:
[[1.00000000e+00 0.00000000e+00]
 [1.11022302e-16 1.00000000e+00]]
Is A invertible? True

Diagonal Matrices

Code
# Diagonal matrix
diagonal_values = [1, 2, 3]
D = np.diag(diagonal_values)
print(f"Diagonal matrix:\n{D}")

# Extract diagonal elements
diag_elements = np.diag(D)
print(f"Diagonal elements: {diag_elements}")
Diagonal matrix:
[[1 0 0]
 [0 2 0]
 [0 0 3]]
Diagonal elements: [1 2 3]

Orthogonal Matrices

Code
# Rotation matrix (orthogonal)
theta = np.pi/4  # 45 degrees
rotation_matrix = np.array([[np.cos(theta), -np.sin(theta)],
                           [np.sin(theta), np.cos(theta)]])
print(f"Rotation matrix (45°):\n{rotation_matrix}")

# Verify orthogonality
is_orthogonal = np.allclose(rotation_matrix @ rotation_matrix.T, np.eye(2))
print(f"Is rotation matrix orthogonal? {is_orthogonal}")

# 90-degree rotation matrix
rotation_90 = np.array([[0, -1], [1, 0]])
print(f"90° rotation matrix:\n{rotation_90}")
print(f"Is 90° rotation orthogonal? {np.allclose(rotation_90 @ rotation_90.T, np.eye(2))}")
Rotation matrix (45°):
[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
Is rotation matrix orthogonal? True
90° rotation matrix:
[[ 0 -1]
 [ 1  0]]
Is 90° rotation orthogonal? True

Triangular Matrices

Code
# Lower triangular matrix
L = np.array([[1, 0, 0],
              [2, 3, 0],
              [4, 5, 6]])
print(f"Lower triangular matrix:\n{L}")

# Upper triangular matrix
U = np.array([[1, 2, 3],
              [0, 4, 5],
              [0, 0, 6]])
print(f"Upper triangular matrix:\n{U}")

# Extract lower/upper triangular parts
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
lower_part = np.tril(matrix)
upper_part = np.triu(matrix)
print(f"Lower triangular part:\n{lower_part}")
print(f"Upper triangular part:\n{upper_part}")
Lower triangular matrix:
[[1 0 0]
 [2 3 0]
 [4 5 6]]
Upper triangular matrix:
[[1 2 3]
 [0 4 5]
 [0 0 6]]
Lower triangular part:
[[1 0 0]
 [4 5 0]
 [7 8 9]]
Upper triangular part:
[[1 2 3]
 [0 5 6]
 [0 0 9]]

4. Matrix Decompositions

LU Decomposition

Code
# LU decomposition
A_square = np.array([[4, 3], [6, 3]], dtype=float)
P, L, U = lu(A_square)
print(f"Original matrix A:\n{A_square}")
print(f"P (permutation):\n{P}")
print(f"L (lower triangular):\n{L}")
print(f"U (upper triangular):\n{U}")
print(f"Verification P@L@U:\n{P @ L @ U}")

# Solve system using LU decomposition
def solve_with_lu(A, b):
    P, L, U = lu(A)
    # Solve Ly = Pb for y
    Pb = P @ b
    y = solve_triangular(L, Pb, lower=True)
    # Solve Ux = y for x
    x = solve_triangular(U, y, lower=False)
    return x

b = np.array([7, 9])
x_lu = solve_with_lu(A_square, b)
print(f"Solution using LU: {x_lu}")
Original matrix A:
[[4. 3.]
 [6. 3.]]
P (permutation):
[[0. 1.]
 [1. 0.]]
L (lower triangular):
[[1.         0.        ]
 [0.66666667 1.        ]]
U (upper triangular):
[[6. 3.]
 [0. 1.]]
Verification P@L@U:
[[4. 3.]
 [6. 3.]]
Solution using LU: [1. 1.]

QR Decomposition

FIXME: This is not working.

# QR decomposition
A_rect = np.array([[1, 1], [1, -1]], dtype=float)
Q, R = qr(A_rect)
print(f"Original matrix A:\n{A_rect}")
print(f"Q (orthogonal):\n{Q}")
print(f"R (upper triangular):\n{R}")
print(f"Verification Q@R:\n{Q @ R}")

# Verify Q is orthogonal
print(f"Q@Q.T (should be identity):\n{Q @ Q.T}")

# Solve least squares using QR
def solve_least_squares_qr(A, b):
    Q, R = qr(A)
    return solve_triangular(R, Q.T @ b, lower=False)

# Example overdetermined system
A_over = np.array([[1, 1], [1, 2], [1, 3]], dtype=float)
b_over = np.array([2, 3, 4])
x_qr = solve_least_squares_qr(A_over, b_over)
print(f"Least squares solution using QR: {x_qr}")

Eigendecomposition

Code
# Eigenvalues and eigenvectors
symmetric_A = np.array([[2, 1], [1, 2]], dtype=float)
eigenvalues, eigenvectors = eig(symmetric_A)
print(f"Matrix A:\n{symmetric_A}")
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# Verify eigenvalue equation: Av = λv
for i, (lam, v) in enumerate(zip(eigenvalues, eigenvectors.T)):
    Av = symmetric_A @ v
    lambda_v = lam * v
    print(f"Eigenvalue {i+1}: λ={lam:.3f}")
    print(f"Av = {Av}")
    print(f"λv = {lambda_v}")
    print(f"Close? {np.allclose(Av, lambda_v)}\n")

# Eigendecomposition: A = XΛX^(-1)
X = eigenvectors
Lambda = np.diag(eigenvalues)
A_reconstructed = X @ Lambda @ np.linalg.inv(X)
print(f"Reconstructed A:\n{A_reconstructed}")
print(f"Original A:\n{symmetric_A}")
print(f"Reconstruction accurate? {np.allclose(A_reconstructed, symmetric_A)}")

# For symmetric matrices: A = XΛX^T
if np.allclose(symmetric_A, symmetric_A.T):
    A_symmetric_recon = X @ Lambda @ X.T
    print(f"Symmetric reconstruction:\n{A_symmetric_recon}")
Matrix A:
[[2. 1.]
 [1. 2.]]
Eigenvalues: [3.+0.j 1.+0.j]
Eigenvectors:
[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
Eigenvalue 1: λ=3.000+0.000j
Av = [2.12132034 2.12132034]
λv = [2.12132034+0.j 2.12132034+0.j]
Close? True

Eigenvalue 2: λ=1.000+0.000j
Av = [-0.70710678  0.70710678]
λv = [-0.70710678+0.j  0.70710678+0.j]
Close? True

Reconstructed A:
[[2.+0.j 1.+0.j]
 [1.+0.j 2.+0.j]]
Original A:
[[2. 1.]
 [1. 2.]]
Reconstruction accurate? True
Symmetric reconstruction:
[[2.+0.j 1.+0.j]
 [1.+0.j 2.+0.j]]

Singular Value Decomposition (SVD)

Code
# SVD decomposition
A_rect = np.array([[3, 1], [1, 3]], dtype=float)
U, sigma, Vt = svd(A_rect)
print(f"Original matrix A:\n{A_rect}")
print(f"U (left singular vectors):\n{U}")
print(f"Singular values: {sigma}")
print(f"V^T (right singular vectors transposed):\n{Vt}")

# Reconstruct matrix
Sigma = np.diag(sigma)
A_reconstructed = U @ Sigma @ Vt
print(f"Reconstructed A:\n{A_reconstructed}")
print(f"Reconstruction accurate? {np.allclose(A_reconstructed, A_rect)}")

# Low-rank approximation
def low_rank_approximation(A, k):
    U, sigma, Vt = svd(A)
    return U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :]

# Example: approximate with rank 1
A_rank1 = low_rank_approximation(A_rect, 1)
print(f"Rank-1 approximation:\n{A_rank1}")

# SVD for rectangular matrix
A_tall = np.array([[1, 2], [3, 4], [5, 6]], dtype=float)
U_tall, sigma_tall, Vt_tall = svd(A_tall)
print(f"Tall matrix A:\n{A_tall}")
print(f"U shape: {U_tall.shape}")
print(f"Sigma shape: {sigma_tall.shape}")
print(f"Vt shape: {Vt_tall.shape}")
Original matrix A:
[[3. 1.]
 [1. 3.]]
U (left singular vectors):
[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]
Singular values: [4. 2.]
V^T (right singular vectors transposed):
[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]
Reconstructed A:
[[3. 1.]
 [1. 3.]]
Reconstruction accurate? True
Rank-1 approximation:
[[2. 2.]
 [2. 2.]]
Tall matrix A:
[[1. 2.]
 [3. 4.]
 [5. 6.]]
U shape: (3, 3)
Sigma shape: (2,)
Vt shape: (2, 2)

5. Linear Systems and Least Squares

Solving Linear Systems

Code
# Solve Ax = b for square, invertible A
A_system = np.array([[2, 1], [1, -1]], dtype=float)
b_system = np.array([5, 1])

# Method 1: Direct inverse (not recommended for large systems)
x_inverse = np.linalg.inv(A_system) @ b_system
print(f"Solution using inverse: {x_inverse}")

# Method 2: Using numpy.linalg.solve (recommended)
x_solve = np.linalg.solve(A_system, b_system)
print(f"Solution using solve: {x_solve}")

# Verify solution
print(f"Verification A@x = b: {A_system @ x_solve}")
print(f"Original b: {b_system}")
Solution using inverse: [2. 1.]
Solution using solve: [2. 1.]
Verification A@x = b: [5. 1.]
Original b: [5 1]

Forward and Backward Substitution

Code
# Forward substitution for lower triangular system Ly = b
def forward_substitution(L, b):
    n = len(b)
    y = np.zeros(n)
    for i in range(n):
        y[i] = (b[i] - np.dot(L[i, :i], y[:i])) / L[i, i]
    return y

# Backward substitution for upper triangular system Ux = y
def backward_substitution(U, y):
    n = len(y)
    x = np.zeros(n)
    for i in range(n-1, -1, -1):
        x[i] = (y[i] - np.dot(U[i, i+1:], x[i+1:])) / U[i, i]
    return x

# Example
L_example = np.array([[1, 0], [2, 3]], dtype=float)
b_example = np.array([1, 4])
y_forward = forward_substitution(L_example, b_example)
print(f"Forward substitution result: {y_forward}")

U_example = np.array([[4, 1], [0, 5]], dtype=float)
y_example = np.array([9, 10])
x_backward = backward_substitution(U_example, y_example)
print(f"Backward substitution result: {x_backward}")
Forward substitution result: [1.         0.66666667]
Backward substitution result: [1.75 2.  ]

Least Squares Problems

FIXME: This is not working.

# Overdetermined system (more equations than unknowns)
A_over = np.array([[-1.42382504, -1.4238264],
                   [1.26372846, 1.26372911],
                   [-0.87066174, -0.87066138]])
b_over = A_over @ np.array([1, 1])  # Known solution

# Method 1: Normal equations (can be numerically unstable)
x_normal = np.linalg.inv(A_over.T @ A_over) @ A_over.T @ b_over
print(f"Normal equations solution: {x_normal}")

# Method 2: QR decomposition (more stable)
Q, R = qr(A_over)
x_qr = solve_triangular(R, Q.T @ b_over, lower=False)
print(f"QR solution: {x_qr}")

# Method 3: NumPy's least squares (most robust)
x_lstsq, residuals, rank, s = np.linalg.lstsq(A_over, b_over, rcond=None)
print(f"lstsq solution: {x_lstsq}")

# Compare errors
true_solution = np.array([1, 1])
print(f"Normal equations error: {np.linalg.norm(x_normal - true_solution)}")
print(f"QR error: {np.linalg.norm(x_qr - true_solution)}")
print(f"lstsq error: {np.linalg.norm(x_lstsq - true_solution)}")

# Pseudoinverse approach
x_pinv = np.linalg.pinv(A_over) @ b_over
print(f"Pseudoinverse solution: {x_pinv}")

Computing Residuals and Fit Quality

FIXME: This is not working.

# Compute residuals and R-squared
def compute_fit_statistics(A, x, b):
    y_pred = A @ x
    residuals = b - y_pred
    ss_res = np.sum(residuals**2)
    ss_tot = np.sum((b - np.mean(b))**2)
    r_squared = 1 - (ss_res / ss_tot)
    
    return {
        'residuals': residuals,
        'ss_res': ss_res,
        'r_squared': r_squared,
        'rmse': np.sqrt(ss_res / len(b))
    }

stats = compute_fit_statistics(A_over, x_lstsq, b_over)
print(f"Fit statistics:")
print(f"R-squared: {stats['r_squared']:.6f}")
print(f"RMSE: {stats['rmse']:.6f}")
print(f"Sum of squared residuals: {stats['ss_res']:.6f}")

6. Advanced Operations and Applications

Matrix Rank and Condition Number

Code
# Matrix rank
A_rank = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rank = np.linalg.matrix_rank(A_rank)
print(f"Matrix rank: {rank}")

# Condition number
A_cond = np.array([[1, 2], [2, 4.0001]])  # Nearly singular
cond_num = np.linalg.cond(A_cond)
print(f"Condition number: {cond_num}")
print(f"Matrix is {'well-conditioned' if cond_num < 1e12 else 'ill-conditioned'}")
Matrix rank: 2
Condition number: 250008.00009100154
Matrix is well-conditioned

Matrix Powers and Matrix Functions

Code
# Matrix power
A_power = np.array([[2, 1], [1, 2]])
A_squared = np.linalg.matrix_power(A_power, 2)
print(f"A^2:\n{A_squared}")

# Matrix exponential
from scipy.linalg import expm, logm, sqrtm

A_exp = expm(A_power)
print(f"Matrix exponential e^A:\n{A_exp}")

# Matrix square root
A_sqrt = sqrtm(A_power)
print(f"Matrix square root:\n{A_sqrt}")
print(f"Verification (sqrt(A))^2:\n{A_sqrt @ A_sqrt}")
A^2:
[[5 4]
 [4 5]]
Matrix exponential e^A:
[[11.40190938  8.68362755]
 [ 8.68362755 11.40190938]]
Matrix square root:
[[1.3660254 0.3660254]
 [0.3660254 1.3660254]]
Verification (sqrt(A))^2:
[[2. 1.]
 [1. 2.]]

Kronecker Product

Code
# Kronecker product
A_kron = np.array([[1, 2], [3, 4]])
B_kron = np.array([[5, 6], [7, 8]])
kron_product = np.kron(A_kron, B_kron)
print(f"Kronecker product A ⊗ B:\n{kron_product}")
Kronecker product A ⊗ B:
[[ 5  6 10 12]
 [ 7  8 14 16]
 [15 18 20 24]
 [21 24 28 32]]

Vector and Matrix Norms

Code
# Different vector norms
v_norm = np.array([1, -2, 3])
l1_norm = np.linalg.norm(v_norm, ord=1)  # L1 norm
l2_norm = np.linalg.norm(v_norm, ord=2)  # L2 norm (default)
linf_norm = np.linalg.norm(v_norm, ord=np.inf)  # L∞ norm

print(f"Vector: {v_norm}")
print(f"L1 norm: {l1_norm}")
print(f"L2 norm: {l2_norm}")
print(f"L∞ norm: {linf_norm}")

# Matrix norms
A_norm = np.array([[1, 2], [3, 4]])
frobenius_norm = np.linalg.norm(A_norm, ord='fro')
spectral_norm = np.linalg.norm(A_norm, ord=2)  # Largest singular value

print(f"Matrix Frobenius norm: {frobenius_norm}")
print(f"Matrix spectral norm: {spectral_norm}")
Vector: [ 1 -2  3]
L1 norm: 6.0
L2 norm: 3.7416573867739413
L∞ norm: 3.0
Matrix Frobenius norm: 5.477225575051661
Matrix spectral norm: 5.464985704219043
Back to top