Intuition
A random variable is just a rule that turns outcomes into numbers. Flip a coin and you get heads or tails - but if you assign heads = 1 and tails = 0, you now have a random variable. Roll two dice and sum the faces: that sum is a random variable. The outcome is still random, but now it lives on a number line, so you can compute averages, measure spread, and apply the full machinery of mathematics.
Random variables are the entry point to everything else in statistics. Without them, concepts like expected value, variance, and distributions have no object to act on.
Definition
A random variable is a function from a sample space to the real numbers:
Each outcome maps to a real number . The randomness comes from the underlying experiment, not from itself - is a deterministic function applied to a random outcome.
Discrete vs. continuous
| Type | Values | Described by | Example |
|---|---|---|---|
| Discrete | Countable set | Probability mass function | Number of bugs in a release |
| Continuous | Uncountable interval | Probability density function | Response time of an API call |
For a discrete random variable, probabilities are assigned to individual values: . For a continuous random variable, probability is defined over intervals: , and for any single point.
Key Formulas
Probability mass function (discrete):
Probability density function (continuous):
Cumulative distribution function (both types):
Expected value:
Variance:
Example
Modelling packet loss. A network link drops each packet independently with probability . Define = number of dropped packets in a batch of .
Each packet is a Bernoulli trial, so :
The random variable lets us move from “packets might get dropped” to precise quantitative statements: on average 2 drops per batch, with standard deviation . This informs retry buffer sizing and SLA calculations.
Why It Matters in CS
- Formalizing randomness. Randomized algorithms (quicksort pivot selection, hash functions, skip lists) are analyzed by defining random variables over their internal coin flips.
- Algorithm analysis. The running time of a randomized algorithm is a random variable. Its expected value gives the average-case complexity; its variance tells you how reliable that average is.
- Probabilistic data structures. Bloom filters, count-min sketches, and HyperLogLog all define random variables whose distributions determine error guarantees.
- Machine learning. Features are random variables. Labels are random variables. The entire supervised learning framework is built on the joint distribution .
Related Notes
- Expected Value - the mean of a random variable
- Variance and Covariance - measures spread and co-movement of random variables
- Probability Distributions - the families that random variables follow
- Binomial Distribution - a discrete random variable counting successes
- Normal Distribution - the most common continuous random variable model
- Poisson Distribution - a discrete random variable for event counts