Central Limit Theorem

Intuition

Take any population - skewed, bimodal, uniform, it doesn’t matter - and repeatedly draw random samples of size $n$ . Compute the sample mean each time. As $n$ grows, those sample means form a distribution that looks increasingly normal, regardless of what the original population looked like. This is the Central Limit Theorem (CLT), and it is the reason the normal distribution dominates statistics: even when individual data aren’t Gaussian, averages of enough data points are.

Progression showing convergence of sample means to normal distribution as n increases

Definition

Let $X_{1}, X_{2}, \dots, X_{n}$ be independent and identically distributed (i.i.d.) random variables with mean $μ$ and finite variance $σ^{2}$ . The CLT states that as $n \to \infty$ :

$\frac{X ˉ - μ}{σ / n} d N (0, 1)$

where $\overset{ˉ}{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ is the sample mean and $d$ denotes convergence in distribution.

In practice, the approximation is considered reliable when $n \geq 30$ , though the threshold depends on how non-normal the underlying distribution is. Highly skewed populations may need larger $n$ .

Key Formulas

Standard error of the mean:

$SE = \frac{σ}{n}$

The standard error shrinks as $n$ - quadrupling the sample size halves the standard error.

Standardized test statistic:

$Z = \frac{X ˉ - μ}{σ / n}$

When $σ$ is unknown and estimated by the sample standard deviation $s$ , use the $t$ -distribution instead:

$t = \frac{X ˉ - μ}{s / n}, df = n - 1$

Sum version: The CLT also applies to sums. If $S_{n} = \sum_{i = 1}^{n} X_{i}$ , then:

$\frac{S _{n} - n μ}{σ n} d N (0, 1)$

Tip

The CLT explains why many test statistics (z-tests, t-tests) and confidence intervals rely on the normal distribution - even when the raw data are not normal.

Example

Resistor quality control. A factory produces resistors with mean resistance $μ = 100 Ω$ and standard deviation $σ = 8 Ω$ . The individual resistance distribution is right-skewed (not normal). A quality inspector samples $n = 36$ resistors and measures the average.

By the CLT, $\overset{ˉ}{X}$ is approximately normal:

$\overset{ˉ}{X} \sim N (100, \frac{8 ^{2}}{36}) = N (100, 1.78)$

What is the probability the sample average exceeds $102 Ω$ ?

$Z = \frac{102 - 100}{8/ 36} = \frac{2}{1.333} = 1.50$

$P (\overset{ˉ}{X} > 102) = 1 - Φ (1.50) \approx 0.067$

About 6.7% - even though individual resistances are skewed, the CLT lets us use normal probability calculations on the sample mean.

Notice that increasing the sample to $n = 64$ would tighten the standard error to $8/ 64 = 1.0 Ω$ , making the same deviation more significant ( $Z = 2.0$ , $p \approx 0.023$ ). The CLT quantifies exactly how more data sharpens inference.

Why It Matters in CS

Monte Carlo simulation: averaging many random simulation runs yields normally distributed estimates, enabling confidence intervals on the result.
Algorithm analysis: when benchmarking runtime over many random inputs, the mean runtime is approximately normal, justifying Gaussian-based statistical tests for performance comparisons.
Large-scale data: in big-data pipelines, aggregate statistics (means, counts per partition) behave normally, which simplifies anomaly detection and threshold setting.
A/B testing: conversion rate differences across thousands of users are approximately normal, which is why z-tests power most A/B testing frameworks.

Normal Distribution - the distribution the CLT converges to
Hypothesis Testing - CLT justifies z-tests and t-tests
Probability Distributions - CLT connects non-normal populations to the normal family
Bayesian Inference - large-sample posteriors become approximately normal (Bernstein–von Mises theorem)

Cam's Cyberspace

Recent Notes

Algorithm Efficiency - Bridging Theory and Practice

Home

Best, Worst & Average Cases

Explorer

Central Limit Theorem

Intuition

Definition

Key Formulas

Example

Why It Matters in CS

Graph View

Table of Contents

Backlinks

Cam's Cyberspace

Recent Notes

Algorithm Efficiency - Bridging Theory and Practice

Home

Best, Worst & Average Cases

Explorer

Central Limit Theorem

Intuition

Definition

Key Formulas

Example

Why It Matters in CS

Related Notes

Graph View

Table of Contents

Backlinks