Intuition
If you could repeat an experiment infinitely many times and average the results, you would get the expected value. It is the long-run average, the balance point of the distribution, the single number that summarizes where outcomes tend to land.
Expected value does not have to be a value the random variable can actually take. A fair die has - you will never roll a 3.5, but over thousands of rolls the average converges to it. This is the essence of the law of large numbers.
The power of expected value lies in its linearity: you can break complex quantities into simpler pieces, compute each expectation separately, and add them up - no independence required.
Definition
The expected value (or mean) of a random variable is the probability-weighted sum (or integral) of its possible values:
- Discrete: takes values with probability mass function :
- Continuous: has probability density function :
Note
The expected value may not exist if the sum or integral diverges (e.g., the Cauchy distribution has no mean).
Key Formulas
Expectation of a function
For any function :
This is used constantly - for instance, setting gives , which appears in the variance formula.
Linearity of expectation
For any random variables and and constants :
This holds regardless of whether and are independent. It is arguably the most useful property in all of probability.
Expectation of a product (independent case)
If and are independent:
This does not hold in general. When and are dependent, the difference is exactly the covariance.
Example
Expected sales commission. A salesperson is working three deals with the following payoffs and close probabilities:
| Deal | Commission | |
|---|---|---|
| A | $5,000 | 0.40 |
| B | $8,000 | 0.25 |
| C | $2,000 | 0.60 |
The expected commission from each deal: E_A = 5000 \times 0.40 = \2{,}000E_B = 8000 \times 0.25 = $2{,}000E_C = 2000 \times 0.60 = $1{,}200$.
By linearity (even if the deals are correlated):
Expected component lifespan. A component’s lifetime follows an exponential distribution with rate failures per hour. Its expected lifespan is:
This directly informs maintenance scheduling and spare-parts inventory.
Why It Matters in CS
- Average-case algorithm analysis. The expected number of comparisons in randomized Quicksort is , derived using linearity of expectation over indicator random variables. See Best, Worst, and Average Cases.
- Performance modeling. Expected response time, expected throughput, and expected queue length (via Little’s Law: ) are the bread and butter of systems performance engineering.
- Network analysis. Expected packet delay, expected number of retransmissions, and expected path latency guide protocol design and capacity planning.
- Machine learning. Loss functions are expectations: . Training minimizes empirical expected loss; generalization theory bounds the true expected loss.
Related Notes
- Variance and Covariance - measures how far outcomes spread around the expected value
- Probability Distributions - each distribution has characteristic expected values
- Best, Worst, and Average Cases - expected value defines the average case
- Quick Sort - average-case derived via linearity of expectation