Intuition
The expected value tells you where a distribution is centered, but not how spread out it is. Variance fills that gap: it measures the average squared distance from the mean. A low variance means outcomes cluster tightly; a high variance means they are dispersed.
Covariance extends this idea to pairs of variables. It answers: when is above its mean, does tend to be above its mean too (positive covariance), below it (negative), or neither (zero)? Covariance is the raw material for correlation, regression, and dimensionality reduction.
Definition
Variance
The variance of a random variable with mean is:
Expanding the square gives the computational formula, which is often easier to evaluate:
The standard deviation has the same units as and is more interpretable as a measure of spread.
Covariance
The covariance of and with means and :
The computational form:
Note
. Variance is just the covariance of a variable with itself.
Key Formulas
Properties of variance
Adding a constant shifts the distribution but does not change spread. Scaling by scales variance by .
Variance of a sum
If and are independent, then and:
Independence and covariance
Warning
The converse is false. Zero covariance does not imply independence. Example: let and . Then but is entirely determined by .
Correlation coefficient
The Pearson correlation normalizes covariance to :
indicates a perfect linear relationship; means no linear association.
Example
Manufacturing consistency. Two companies produce resistors rated at 100 ohms. Sample measurements:
| Company | Sample mean | Sample variance |
|---|---|---|
| A | 100.2 | 1.4 |
| B | 99.8 | 8.7 |
Both hit the target mean, but Company A’s resistors are far more consistent ( vs. ). For precision circuits, Company A is the clear choice.
Covariance in practice. Suppose study hours and exam score have , , . The correlation:
A moderately strong positive linear association - more study hours correlate with higher scores.
Why It Matters in CS
- PCA and dimensionality reduction. Principal Component Analysis finds directions of maximum variance by computing eigenvectors of the covariance matrix. Features with high covariance are collapsed into single components, reducing dimensionality while preserving information.
- Stability of randomized algorithms. Low variance in a randomized algorithm’s runtime means its performance is predictable. Chebyshev’s inequality bounds tail probabilities using variance: .
- Sensor fusion and robotics. Kalman filters propagate covariance matrices to track how uncertainty evolves over time. Sensor measurements with lower variance receive more weight in the fused estimate.
- Portfolio and resource optimization. In distributed systems, covariance between server loads determines whether load-balancing reduces total variance or not - negatively correlated loads are ideal.
Related Notes
- Expected Value - variance measures spread around the expected value
- Probability Distributions - each distribution has characteristic variance formulas
- Regression Fundamentals - regression coefficients are ratios of covariance to variance