Overview
A matrix is a rectangular array of numbers with m rows and n columns (an m×n matrix). Beyond the algebra, matrices are a data layout with performance implications: indexing conventions, memory layout (row-major vs column-major), and sparsity determine the speed of common operations. This note covers shapes, indexing, core transforms (transpose, identity, diagonal), and outlines of matrix–vector and matrix–matrix multiplication with attention to cache and sparse representations.
Note
Throughout, use 0-based indices
A[i][j]withi∈[0..m-1],j∈[0..n-1]. Rowiis lengthn; columnjis lengthm.
Motivation
Matrices appear across DS&A and systems:
-
Graph algorithms: adjacency matrix vs Adjacency List trade time for space.
-
Dynamic programming: table-filling is matrix-like; careful iteration order improves locality.
-
Geometry/graphics: transforms (rotation/scale) use small fixed-size matrices.
-
Linear algebra in ML: heavy reliance on mat–vec and mat–mat kernels; cache-friendly layouts and sparsity are decisive.
Understanding shapes and traversals prevents shape mismatch, cache-thrashing loops, and silent bugs in transposes or in-place updates.
Definition and Formalism
-
Shape:
A ∈ ℝ^{m×n}hasmrows,ncolumns. Row vector:1×n. Column vector:m×1. -
Indexing:
A[i][j]denotes element at rowi, columnj. -
Special matrices (by shape/values):
-
Zero
0_{m×n}— all zeros. -
Identity
I_n— square, ones on diagonali==j, zeros elsewhere. -
Diagonal — only
A[i][i]possibly nonzero. -
Upper/lower triangular — constrained to one side of diagonal.
-
Symmetric —
A = A^T(square).
-
-
Transpose:
B = A^TwithB[j][i] = A[i][j]→ shapen×m. -
Arithmetic (dimension rules):
-
Add/subtract: same shape.
-
Mat–vec:
(m×n)·(n×1) → (m×1). -
Mat–mat:
(m×k)·(k×n) → (m×n).
-
Tip
Think “row by column”: the inner dimensions (
k) must match; the outer dimensions become the result shape.
Example or Illustration
Let
A = [[1, 2, 3],
[4, 5, 6]] // A is 2×3
x = [7, 8, 9]^T // x is 3×1
A·x = [1*7+2*8+3*9,
4*7+5*8+6*9]^T
= [50, 122]^T // result 2×1
The transpose A^T is 3×2:
A^T = [[1,4],
[2,5],
[3,6]]
Properties and Relationships
-
Identity as neutral element:
I_m·A = A(when shapes allow),A·I_n = A. -
Diagonal scaling: Multiplying by a diagonal left-scales rows; right-scales columns.
-
Symmetry & transpose: If
Ais symmetric (A=A^T), matrix–vector operations can reuse either lower or upper triangle. -
Sparsity pattern: Many real matrices are sparse (mostly zeros). Storing only nonzeros reduces memory and speeds operations that skip zeros.
-
Graph tie-in: An adjacency matrix
Adj[u][v]indicates edges; rowuiteration isO(n)regardless of degree, contrasting with adjacency lists (see Graph Representations).
Implementation or Practical Context
Memory layout and traversal
Two canonical layouts:
-
Row-major (C/C++ vectors of vectors, many DS libs): rows laid out contiguously;
A[i][0..n-1]is one contiguous block. -
Column-major (Fortran, MATLAB, many BLAS): columns are contiguous;
A[0..m-1][j]is contiguous.
Rule: Walk memory in the order it’s laid out to exploit caches and prefetchers.
// Row-major: cache-friendly row sweep
for i in 0..m-1:
sum = 0
for j in 0..n-1:
sum += A[i][j] * x[j]
y[i] = sum// Column-major alternative: prefer column-sweeps
for j in 0..n-1:
y_add = A[0..m-1][j] * x[j] // conceptual: scale column j by x[j]
y += y_addWarning
Transposed loops: In row-major, iterating columns fastest (inner loop over
i) causes strided access and cache misses. Flip loops or pre-transpose as needed.
Core transforms & kernels
Transpose (dense)
Naïve out-of-place:
function TRANSPOSE(B, A): // B is n×m, A is m×n
for i in 0..m-1:
for j in 0..n-1:
B[j][i] = A[i][j]Blocked/batched transpose (operate on tiles) improves locality:
function TRANSPOSE_BLOCKED(B, A, BS): // BS = block size
for ii in 0..m-1 step BS:
for jj in 0..n-1 step BS:
for i in ii..min(ii+BS-1, m-1):
for j in jj..min(jj+BS-1, n-1):
B[j][i] = A[i][j]Matrix–vector (gemv)
Row-major pattern shown above; complexity Θ(m·n).
Matrix–matrix (gemm)
Classical triple loop (Θ(m·k·n)):
function MATMUL(C, A, B): // A m×k, B k×n, C m×n
for i in 0..m-1:
for t in 0..k-1:
a = A[i][t]
for j in 0..n-1:
C[i][j] += a * B[t][j]- The loop order
i–t–j(a.k.a. ijk with hoisteda) improves reuse ofA[i][t]and gives row-major friendly access toB[t][j]ifBis laid out row-major; otherwise block bothAandBby tiles to keep cache lines hot.
Tip
Blocking (tiling) is essential for performance: operate on
BS×BSsubmatrices so that a tile fits in L1/L2 cache. This changes constant factors dramatically even though asymptotics stayΘ(n^3).
Identity & diagonal helpers
function MAKE_IDENTITY(I, n):
fill I with 0
for i in 0..n-1: I[i][i] = 1
function SCALE_ROWS_BY_DIAG(B, D, A): // B = D*A, D diagonal (D[i][i])
for i in 0..m-1:
for j in 0..n-1:
B[i][j] = D[i] * A[i][j]Sparse representations
Storing only nonzeros:
-
COO (Coordinate): list of
(i, j, val)triplets. -
CSR (Compressed Sparse Row): arrays
row_ptr[0..m],col_idx[0..nnz-1],val[0..nnz-1]. Rowinonzeros live in slicerow_ptr[i]..row_ptr[i+1]-1. -
CSC (Compressed Sparse Column): column-major analog.
Sparse mat–vec (CSR):
function SPMV_CSR(y, A_csr, x): // y = A*x
for i in 0..m-1:
sum = 0
for p in row_ptr[i]..row_ptr[i+1]-1:
j = col_idx[p]
sum += val[p] * x[j]
y[i] = sum- Complexity:
Θ(nnz)ops; memory traffic dominates. Reordering rows/cols can improve locality ofx[j].
Complexity snapshots
-
Transpose:
Θ(m·n)operations; strictly memory-bound. -
Mat–vec:
Θ(m·n)(dense),Θ(nnz)(sparse). -
Mat–mat: classical
Θ(m·k·n); advanced algorithms (Strassen,O(n^ω)) reduce exponentω<3but are complex and numerically sensitive; not typical for hand-rolled DS&A code.
Common Misunderstandings
Warning
Shape mismatch. Multiplication
(m×k)·(k×n)requires inner dimensions equal. Unit tests should assert shapes at runtime.
Warning
Row vs column major confusion. Traversal order must match layout. A nested loop that steps down columns in a row-major array will appear “
Θ(m·n)slow” but spend time on cache misses.
Warning
In-place transpose of non-square matrices. A true in-place transpose for
m≠nrequires complex cycle decomposition on a single buffer; prefer out-of-place or block-swap into a new buffer.
Warning
Integer overflow. Accumulators in mat–mat can overflow 32-bit quickly. Promote to 64-bit (or wider) when multiplying and summing integers.
Warning
Sparse misuse. Sparse formats win only when
nnz ≪ m·n. On moderately dense data, indirection and poor locality can make sparse slower than dense.
Broader Implications
-
Algorithm design: Many DP and table algorithms are “matrix programs”; understanding strides and blocking yields large constant-factor speedups.
-
Graphs: Adjacency matrices enable O(1) edge queries but cost
Θ(n^2)space; choose representation based on density and operation profile (see Graph Representations). -
Numerics & stability: Accumulation order matters for floating-point; pairwise (tree) summation reduces roundoff error versus naive left-to-right.
-
Parallelism: Mat–mat (GEMM) is highly parallelizable via tiling; mat–vec less so but benefits from threading over rows/blocks and careful NUMA placement.
Summary
Matrices combine shape rules, indexing, and layout realities. Know the dimension algebra, implement core transforms (transpose, identity, diagonal scaling), and write cache-aware mat–vec/mat–mat kernels. For large, mostly-zero problems, adopt sparse formats like CSR/CSC to achieve Θ(nnz) performance. Validate shapes, respect memory layout, and choose representations that fit the problem’s density and access patterns.