Intuition
Imagine you know something has happened - a piece of evidence, an observation, a constraint. Conditional probability answers: how does this new information change the likelihood of other events?
The idea is simple. When you learn that event occurred, you throw away every outcome where didn’t happen. Your universe shrinks from the full sample space down to just . Now you ask: of the outcomes that remain, what fraction also satisfy ? That fraction is the conditional probability .
This “reduced sample space” perspective is what makes conditional probability the gateway to all of Bayesian reasoning - every update, every inference, every learned model starts here.
Definition
The conditional probability of given is defined as:
The requirement is essential - conditioning on an impossible event is undefined.
Note
is itself a valid probability measure. It satisfies all three Kolmogorov axioms when is treated as the new sample space.
Key Formulas
Multiplicative (product) rule
Rearranging the definition gives the probability of a joint event:
This extends to chains of events:
Independence test
Two events and are independent if and only if:
Equivalently, . Knowing occurred tells you nothing new about .
Law of total probability
For a partition of the sample space:
This bridges conditional and unconditional probabilities and is the denominator in Bayes’ Rule.
Example
Flight punctuality. An airline reports: 85% of flights depart on time and 82% both depart and arrive on time. What is the probability a flight arrives on time, given that it departed on time?
Let = departs on time, = arrives on time.
So if a flight leaves on schedule, there is a 96.5% chance it also lands on schedule - departure punctuality is a strong signal of arrival punctuality.
Now suppose only 15% of flights depart late, and of those, 30% still arrive on time:
Using total probability: . Late departures sharply reduce the chance of an on-time arrival.
Why It Matters in CS
- Natural language processing. Language models estimate - the probability of the next word conditioned on all preceding words. Every autocompletion and translation system is built on conditional probability.
- Markov chains. A Markov process defines transition probabilities , assuming conditional independence from earlier states. This powers PageRank, MCMC sampling, and reinforcement learning.
- Bayesian networks. Each node stores a conditional probability table . The full joint distribution factors into a product of conditionals, making inference tractable.
- Conditional independence. Two features and may be dependent overall but independent given a third variable . Recognizing this structure () reduces model complexity dramatically.
Related Notes
- Bayes’ Rule - reverses the conditioning direction using conditional probability
- Probability Distributions - the distributions that conditional probabilities operate over
- Bayesian Inference - the full framework for updating beliefs with data