Unit 7 Wrap-Up and Self-Assessment

Unit 7: Probability and Uncertainty in AI — Wrap-Up

You have completed one of the most conceptually important units in this course. The shift from deterministic reasoning to probabilistic reasoning is the shift that makes AI applicable to the real world. Let’s consolidate what you have learned.

Real-world AI operates in environments that are partially observable, stochastic, and noisy. Probability provides a principled, consistent mathematical framework for representing uncertainty and making rational decisions despite it. Bayes' theorem is the engine that updates beliefs when evidence arrives. Bayesian networks compactly represent probabilistic relationships among many variables. Naive Bayes applies these ideas to practical classification at scale.

Key Takeaways

Why Uncertainty Matters (Section 7.1)

Logic works for closed, fully-observable worlds; probability works for the open, uncertain real world.
Three sources of uncertainty: laziness (too many rules to list), theoretical ignorance (science doesn’t fully know), and practical ignorance (agent lacks access to all data).
Noisy sensors and stochastic actions require agents to maintain probability distributions over states rather than asserting definite facts.

Probability Fundamentals (Section 7.2)

A sample space lists all possible outcomes; an event is a subset.
The Kolmogorov axioms ensure probabilities are internally consistent.
Conditional probability P(A | B) = P(A ∧ B) / P(B) is the foundation of probabilistic inference.
Independence (P(A | B) = P(A)) and conditional independence are the key structural properties that make large models tractable.
The law of total probability allows marginalization — computing a variable’s probability by summing out other variables.

From Logic to Probability (Section 7.3)

Logic is the special case of probability where all degrees of belief are 0 or 1.
Probability adds: prior knowledge, incremental belief updates, hypothesis ranking, and graceful handling of contradiction.
The same scenario analyzed with logic produces brittle, all-or-nothing conclusions; analyzed with probability, it produces calibrated, actionable estimates.

Bayesian Reasoning (Section 7.4)

Bayes' theorem: P(H | E) = P(E | H) × P(H) / P(E)
The components: prior (before evidence), likelihood (how well H explains E), posterior (after evidence), marginal likelihood (normalization constant).
The mammogram paradox shows why low base rates dominate: even an accurate test produces many false positives when the disease is rare.
Bayesian updating allows sequential reasoning — each new piece of evidence refines the posterior.

Probabilistic Models (Section 7.5)

A Bayesian network is a DAG where nodes are random variables and edges encode direct causal influence.
Each node stores a conditional probability table; the full joint distribution factorizes into a product of CPTs.
Naive Bayes assumes all features are conditionally independent given the class label.
Despite this unrealistic assumption, naive Bayes achieves excellent accuracy on text classification because ranking (not exact probability values) is what matters for classification.

Concept Map: Uncertainty → Machine Learning

Uncertainty in the world
     |
     v
Probability theory
     |
   +-----------+-----------+
   |                       |
   v                       v
Conditional           Independence
probability              structure
   |                       |
   v                       v
Bayes' theorem      Bayesian networks
   |                       |
   v                       v
Belief updates      Compact joint models
   |                       |
   +----------+------------+
              |
              v
      Naive Bayes classifier
              |
              v
         Machine Learning (Unit 8)

Summary Table: Core Concepts

Concept	Definition	Where Used
Prior P(H)	Probability of hypothesis before evidence	Medical diagnosis, spam filter
Likelihood P(E\|H)	Probability of evidence if hypothesis is true	Bayes' theorem computation
Posterior P(H\|E)	Updated probability after observing evidence	Classification output
Conditional independence	P(A\|B,C) = P(A\|C) — B adds no info once C is known	Bayesian networks, naive Bayes
CPT	Conditional probability table stored at each BN node	Bayesian network structure
Laplace smoothing	Adding 1 to all word counts to prevent zero probabilities	Naive Bayes training
Log probability	log P(x) — prevents numerical underflow for long sequences	Naive Bayes classification

Concept

Definition

Where Used

Prior P(H)

Probability of hypothesis before evidence

Medical diagnosis, spam filter

Likelihood P(E|H)

Probability of evidence if hypothesis is true

Bayes' theorem computation

Posterior P(H|E)

Updated probability after observing evidence

Classification output

Conditional independence

P(A|B,C) = P(A|C) — B adds no info once C is known

Bayesian networks, naive Bayes

CPT

Conditional probability table stored at each BN node

Bayesian network structure

Laplace smoothing

Adding 1 to all word counts to prevent zero probabilities

Naive Bayes training

Log probability

log P(x) — prevents numerical underflow for long sequences

Naive Bayes classification

Final self-assessment: Unit 7 concepts.

Glossary: Unit 7 Key Terms

Uncertainty: The condition in which an agent lacks complete information about the state of the world. Arises from incomplete observation, sensor noise, or stochastic environments.
Sample Space (Ω): The set of all possible outcomes of a random experiment.
Event: Any subset of the sample space; a collection of outcomes we care about.
Conditional Probability P(A|B): The probability of event A given that event B has occurred. Formula: P(A|B) = P(A ∧ B) / P(B).
Independence: Events A and B are independent if P(A|B) = P(A) — knowing B gives no information about A.
Conditional Independence: A is conditionally independent of B given C if P(A|B,C) = P(A|C) — once C is known, B adds nothing.
Prior Probability P(H): Probability of a hypothesis before observing any evidence.
Likelihood P(E|H): Probability of observing evidence E if hypothesis H is true.
Posterior Probability P(H|E): Probability of hypothesis H after observing evidence E. Computed via Bayes' theorem.
Bayes' Theorem: P(H|E) = P(E|H) × P(H) / P(E). The fundamental equation for updating beliefs with evidence.
Bayesian Network: A directed acyclic graph where nodes are random variables and edges encode conditional dependencies; each node has a CPT.
Conditional Probability Table (CPT): A table stored at each node of a Bayesian network giving P(node value | parent values) for all combinations.
Naive Bayes Classifier: A classifier applying Bayes' theorem with the naive assumption that all features are conditionally independent given the class label.
Laplace Smoothing: Adding a small count (usually 1) to all word counts during naive Bayes training to prevent zero-probability assignments.
Degree of Belief: A numerical value 0-1 representing an agent’s confidence that a proposition is true.

Optional Further Reading: Decision Theory and Expected Utility

Probability is not just for classification — it is also the foundation of decision theory. When an agent must choose among actions with uncertain outcomes, it can maximize expected utility: the probability-weighted average payoff.

This topic is explored in the supplementary reading: Decision Theory and Expected Utility (Supplementary)

Also recommended: the lecture on expected utility from the video youtube.com/embed/UnX8RPB5vFM (Decision Theory overview).

Preview: Unit 8 — Machine Learning Foundations

You have spent Unit 7 learning how to specify probabilistic models: you built naive Bayes by hand, explicitly computing priors and likelihoods from data.

Unit 8 asks: what if the model is too complex to specify by hand? What if we have thousands of parameters that need to be tuned, and we want the computer to figure them out from data?

That is machine learning. Every machine learning algorithm is, at its core, a system for learning parameters of a probabilistic (or function-fitting) model from data. The probability foundation you built this week is the intellectual foundation for everything in Unit 8.

Next: Unit 8 Overview — Machine Learning Foundations →

This work is licensed under CC BY-SA 4.0.