Probability Fundamentals
Unit 7: Probability and Uncertainty in AI — Section 7.2
You have an intuitive sense of probability from everyday life: a coin flip is 50/50, there is a 30% chance of rain, a certain student "probably" passed the exam. This section puts that intuition on a firm mathematical foundation. We will introduce the key concepts — sample spaces, events, conditional probability, and independence — that form the vocabulary every AI probability algorithm speaks.
Build intuition for conditional probability with a concrete visual explanation.
Sample Spaces and Events
Every probability calculation begins with a clear description of the possible outcomes.
- Sample Space
-
The set of all possible outcomes of a random experiment, usually written as Ω (omega). For a coin flip: Ω = {heads, tails}. For a six-sided die: Ω = {1, 2, 3, 4, 5, 6}. For a weather forecast: Ω = {sunny, cloudy, rainy, snowy}.
- Event
-
Any subset of the sample space — a collection of outcomes we are interested in. For a die: the event "even number" = {2, 4, 6}. For a weather forecast: the event "precipitation" = {rainy, snowy}.
The probability of an event A, written P(A), is a number between 0 and 1 that measures how likely A is to occur. Three axioms (the Kolmogorov axioms) define what counts as a valid probability:
-
Non-negativity: P(A) ≥ 0 for every event A.
-
Normalization: P(Ω) = 1 — something must happen.
-
Additivity: If A and B are mutually exclusive (they cannot both occur), then P(A ∪ B) = P(A) + P(B).
Basic Probability (equally likely outcomes)
P(A) = (number of outcomes in A) / (total number of outcomes in Ω)
Example: A bag contains 3 red marbles, 2 blue marbles, and 5 green marbles. P(red) = 3 / (3 + 2 + 5) = 3/10 = 0.30
Random Variables
In AI, we rarely work with raw outcomes like "heads" or "rainy." Instead, we define random variables that give names to the uncertain aspects of the world.
- Random Variable
-
A variable whose value is determined by the outcome of a random process. A discrete random variable takes on a countable number of values (e.g.,
Disease ∈ {flu, cold, COVID-19}). A continuous random variable can take on any value in a range (e.g.,Temperature ∈ [36.0, 42.0]).
A probability distribution over a discrete random variable lists the probability of each possible value. For example:
P(Weather = sunny) = 0.60
P(Weather = cloudy) = 0.30
P(Weather = rainy) = 0.10
----
1.00 (must sum to 1)
This is called a full joint distribution when it covers multiple variables simultaneously.
Conditional Probability: Updating Beliefs with Evidence
The most powerful concept in probability for AI applications is conditional probability. When we observe evidence, we update our beliefs. Conditional probability formalizes this update.
Conditional Probability
P(A | B) = P(A ∧ B) / P(B)
Read as: "The probability of A given B." Requires P(B) > 0.
Intuition: knowing B is true restricts our attention to only those outcomes where B occurs. Within that restricted space, we ask what fraction also satisfies A.
Medical Diagnosis: Fever and Flu
Suppose we have data on 100 patients:
| Has Flu | No Flu | Total | |
|---|---|---|---|
Has Fever |
18 |
12 |
30 |
No Fever |
2 |
68 |
70 |
Total |
20 |
80 |
100 |
P(Flu | Fever) = ?
Step 1: P(Flu ∧ Fever) = 18/100 = 0.18
Step 2: P(Fever) = 30/100 = 0.30
Step 3: P(Flu | Fever) = 0.18 / 0.30 = 0.60
Interpretation: A patient with fever has a 60% chance of flu — much higher than the baseline flu rate of 20% (20/100 = 0.20). The evidence (fever) updated our belief significantly.
Independence: When Evidence Doesn’t Change Beliefs
Two events are independent if knowing one tells you nothing about the other.
- Independence
-
Events A and B are independent if and only if:
P(A | B) = P(A)
An equivalent condition: P(A ∧ B) = P(A) × P(B)
Knowing B occurred does not change the probability of A.
Independent vs. Dependent Events
Independent:
-
Two separate coin flips: P(Heads on flip 2 | Heads on flip 1) = 0.5 = P(Heads)
-
Rolling two dice: knowing one die result tells you nothing about the other
Dependent (not independent):
-
Fever and flu: P(Flu | Fever) = 0.60 ≠ 0.20 = P(Flu)
-
Carrying an umbrella and rain: people carry umbrellas because of rain; these are highly dependent
Independence is extremely valuable for AI systems because it allows us to simplify computations dramatically. If we know that two variables A and B are independent, we can store P(A) and P(B) separately and compute P(A ∧ B) = P(A) × P(B) without needing to store the full joint distribution.
Conditional Independence
A subtler and even more useful concept is conditional independence.
- Conditional Independence
-
Variables A and B are conditionally independent given C if:
P(A | B ∧ C) = P(A | C)
Once C is known, learning B gives no additional information about A.
Example: Fever (A) and runny nose (B) are both symptoms that are conditionally independent given the disease ©. If you already know a patient has influenza, learning that they also have a runny nose does not change your belief that they have a fever.
Conditional independence is the key insight behind Bayesian networks and naive Bayes classifiers, which you will study in Sections 7.4 and 7.5.
Marginal Probability and the Law of Total Probability
Sometimes we need to compute the probability of a variable by "summing out" other variables we do not care about.
Law of Total Probability
For a variable B with mutually exclusive values b₁, b₂, …, bₙ:
P(A) = P(A | B=b₁) × P(B=b₁)
+ P(A | B=b₂) × P(B=b₂)
+ ...
+ P(A | B=bₙ) × P(B=bₙ)
This is sometimes called marginalization — computing a marginal probability by summing over all values of another variable.
Medical Test: Computing P(Positive)
Suppose:
-
P(Disease) = 0.01
-
P(Positive | Disease) = 0.95
-
P(Positive | No Disease) = 0.05
By the law of total probability:
P(Positive) = P(Positive | Disease) × P(Disease)
+ P(Positive | No Disease) × P(No Disease)
= 0.95 × 0.01 + 0.05 × 0.99
= 0.0095 + 0.0495
= 0.059
About 5.9% of people test positive in this population. We will use this result when applying Bayes' theorem in Section 7.4.
Putting It Together: A Probability Calculation Checklist
How to Work Through a Probability Problem
-
Identify the sample space and define the random variables.
-
Write down all given probabilities (priors, likelihoods, false positive rates).
-
Determine what you need to find: joint, marginal, or conditional?
-
Apply the appropriate formula:
-
Joint: product rule
-
Marginal: law of total probability (sum over other variables)
-
Conditional: definition P(A|B) = P(A ∧ B) / P(B)
-
-
Check: does your answer make intuitive sense?
Return to the medical test example above. If a patient tests positive, what is P(Disease | Positive)?
You now have P(Positive) = 0.059, P(Positive | Disease) = 0.95, and P(Disease) = 0.01. Can you calculate the answer using the conditional probability definition? (Hint: you will need the product rule first.)
We will work through exactly this calculation — and explain why the answer surprises most people — in Section 7.4 when we introduce Bayes' theorem.
Test your understanding of probability fundamentals.
Probability content adapted from OpenStax Introductory Statistics, Chapter 3, licensed under CC BY 4.0.
Based on the UC Berkeley CS 188 Online Textbook by Nikhil Sharma, Josh Hug, Jacky Liang, and Henry Zhu, licensed under CC BY-SA 4.0.
This work is licensed under CC BY-SA 4.0.