Why Linear Algebra Keeps Showing Up

Linear algebra keeps showing up everywhere in CS. The moment you need to rotate something on screen, compress data, train a model, or solve a system of equations, you’re doing linear algebra whether you call it that or not. The reason is simple: linearity is the most tractable structure we have. When you can phrase a problem as a matrix operation, you get access to decades of fast, well-understood algorithms. And even when a problem is nonlinear, the first thing we do is approximate it linearly (gradients, Jacobians, Taylor expansions) because that’s where the tools are.

Note

If you take one thing from this note: every matrix is a function. An matrix is a linear map from to . Once that clicks, everything else (rank, null space, eigenvalues) becomes a question about the behavior of that function.

Vectors and Vector Spaces

A vector is an ordered -tuple of real numbers. A vector space is a set of vectors closed under addition and scalar multiplication. The dimension of a space equals the size of any basis for it.

A set of vectors spans a space if every vector in that space can be written as a linear combination of them. The set is a basis if it spans the space and is linearly independent, meaning no vector in the set is redundant. For , the standard basis has vectors: , etc.

Tip

Thinking in terms of span and basis is more useful than thinking about individual vectors. The question “what can this set of vectors reach?” comes up constantly, from solving systems of equations to understanding what a neural network layer can represent.

Matrices as Transformations

An matrix represents a linear map from to . The key operations:

  • Matrix-vector product : applies the transformation to .
  • Matrix multiplication : composes two transformations (apply first, then ).
  • Transpose : swaps rows and columns; .
  • Inverse : exists when is square and .

A function is linear if . Every such transformation can be represented by a matrix. Rotation, scaling, shearing, projection are all linear and compose via matrix multiplication. This is exactly what happens in a graphics pipeline: a vertex in object space becomes a pixel via , where each matrix is (using homogeneous coordinates so translation becomes linear too). GPUs are essentially massively parallel matrix multiplication engines.

Eigenvalues and Eigenvectors

For a square matrix , a nonzero vector is an eigenvector with eigenvalue if:

Eigenvalues are roots of the characteristic polynomial . Eigenvectors reveal the directions along which a transformation acts as pure scaling, which is enormously useful for understanding what a matrix “does.”

Example

Consider . The characteristic polynomial is , giving eigenvalues and . For : eigenvector . For : eigenvector . The matrix stretches space by factor 2 along and by factor 3 along .

Google’s original PageRank is an eigenvector application. The web is modeled as a matrix where is the probability of following a link from page to page . The dominant eigenvector (eigenvalue ) gives the steady-state page rankings. This is the theorem that made web search work.

Fundamental Subspaces and Rank

The null space of is , the set of inputs the transformation kills. The column space is the set of all possible outputs . The rank-nullity theorem ties them together:

for an matrix. This tells you exactly how many “degrees of freedom” a system of equations has.

Other key structural facts:

  • Rank: dimension of the column space; determines solvability of .
  • Determinant: iff is invertible; geometrically, it measures volume scaling.
  • Orthogonality: vectors are orthogonal. Orthonormal bases simplify nearly every computation, which is why algorithms keep constructing them (Gram-Schmidt, QR factorization).

Systems and Solving Them

The equation asks: which input maps to output under transformation ? Three cases:

  • Unique solution when is invertible.
  • Infinitely many when the rank is less than but is in the column space.
  • No solution otherwise.

Gaussian elimination solves this in time. In practice we almost never compute directly; we factor or reduce instead.

Warning

When has no exact solution (overdetermined system), the best approximation minimizes . The solution is the foundation of linear regression. If you’ve ever fit a line to data, this is what’s happening underneath.

SVD: The Swiss Army Knife

Singular Value Decomposition factorizes any matrix as . It’s central to dimensionality reduction (PCA), recommendation systems, and data compression.

Netflix-style recommendations factorize a sparse user-item rating matrix via SVD. The top singular values and their vectors capture the most significant latent factors (genre preference, production era, etc.), enabling predictions for unrated items.

Computational Notes

  • Naive matrix multiplication of two matrices: .
  • Strassen’s algorithm: .
  • Current best theoretical bound: approximately , though practical implementations rarely beat Strassen for typical sizes.

Tip

2D rotation is a good sanity check for linear algebra intuition. The matrix rotates a vector by counterclockwise. Apply to and you get . If that doesn’t feel obvious yet, work through a few by hand until it does.

  • Graph Theory - adjacency matrices are the bridge between graph theory and linear algebra
  • Discrete Probability - Markov chains use stochastic matrices and eigenvector analysis
  • Mathematical Induction - induction on matrix dimension proves many linear algebra theorems
  • Combinatorics - counting arguments for matrix properties like the permanent and determinant