Slides 📊
4.5. Sequential Bayesian Updating
Perhaps the most powerful aspect of Bayes’ rule is that it allows us to systematically update our beliefs as new evidence emerges. In this lesson, we’ll explore the mechanics of iterative Bayesian updating by applying it to a real-world problem.
Road Map 🧭
Understand how Bayes’ rule enables sequential probability updates.
Explore the mechanics of iterative Bayesian updating through a coin flip example.
Observe how probabilities converge toward the truth over iterations with sufficient evidence.
4.5.1. Detecting a Biased Coin
Let’s explore a concrete example to illustrate Bayesian updating in action.
Have Your Tools Ready🔧
This lesson is a focused exploration of concepts from Section 4.4. Review Section 4.4 and keep your notes handy as you read.
The Problem
Suppose you have a bag containing 10 coins. One of these coins is biased, with a probability of heads equal to 0.8. The other 9 coins are fair, with a probability of heads equal to 0.5. You reach into the bag, select a coin at random, and flip it 10 times. Each flip results in heads. What is the probability that you selected the biased coin?
The Setup
To solve this problem, we need to define our events:
Let \(H_i\) denote the event that the coin showed heads on the \(i\)-th flip.
Let \(B\) denote the event that the biased coin was selected.
We know the following initial probabilities:
\(P(B) = 1/10\) (probability of selecting the biased coin)
\(P(B') = 9/10\) (probability of selecting a fair coin)
\(P(H_i|B) = 0.8\) (probability of heads on any flip, given the biased coin)
\(P(H_i|B') = 0.5\) (probability of heads on any flip, given a fair coin)
We want to find \(P(B|H_1,H_2,...,H_{10})\), the probability that the selected coin is biased given that all 10 flips resulted in heads. Rather than tackling this all at once, let us update our probability assessment after each flip.
Flip 1 🪙
After the first flip results in heads, we calculate
Substituting the values,
After seeing one heads, the probability that we have the biased coin has increased from 0.1 to about 0.151. This makes intuitive sense—getting heads is more likely with the biased coin, so observing heads should increase our confidence that we have the biased coin. However, the change is modest because heads is a common outcome even with a fair coin.
Flip 2 🪙🪙
For the second flip, we treat the updated probability after the first flip, \(P(B|H_1)\), as the new prior:
A key insight here is that individual flips of the same coin are independent events. The outcome of the first flip doesn’t affect the probability of heads on the second flip. This means:
Using these simplifications,
After the second heads, our confidence that we have the biased coin has increased further to about 0.221.
Flip 3 🪙🪙🪙
For the third flip,
Substituting the values,
The pattern continues for subsequent flips. With each additional heads, we become more confident that we have the biased coin.
Summary of All 10 Updates
Below is a complete table of the posterior probability \(p_k=P(B\mid H_1,\dots,H_k)\) after each successive throw of heads. Values are rounded to three decimals.
Flip \(k\) |
Posterior \(p_k\) |
|---|---|
0 (prior) |
0.100 |
1 |
0.151 |
2 |
0.221 |
3 |
0.313 |
4 |
0.421 |
5 |
0.538 |
6 |
0.651 |
7 |
0.749 |
8 |
0.827 |
9 |
0.884 |
10 |
0.924 |
Notice how evidence accumulates rapidly: after just five heads, the biased coin is already more likely than not, and by the tenth head, the posterior has climbed to about 0.924. If we had observed a tails at any point, our confidence would have decreased instead, since the biased coin is less likely to show tails than a fair coin.
The Convergence of Beliefs
An important property of Bayesian updating is that with sufficient evidence, the posterior probabilities tend to converge toward the truth, regardless of the initial prior (as long as the prior isn’t exactly 0 or 1, which would represent absolute certainty).
This is why Bayesian methods are widely used in scientific and statistical reasoning— even if researchers begin with different prior beliefs, accumulating evidence will eventually lead them toward similar conclusions.
4.5.2. Comparison with One-Step Computation
We could have directly calculated the probability \(P(B|H_1,H_2,...,H_{10})\) by taking the following steps:
Step 1: Apply Bayes’ rule by treating the sequence of ten heads as a single event
Step 2: Compute the components
With independent flips,
And similarly,
Step 3: Substitute
This direct approach yields the same result as sequential updating. However, the sequential approach has a few advantages:
It shows how our beliefs evolve with each new piece of evidence.
It allows us to stop and make decisions at any point in the sequence.
It more naturally accommodates scenarios where evidence arrives over time.
It often involves simpler calculations at each step.
4.5.3. Bringing It All Together
Key Takeaways 📝
Bayesian updating allows us to revise probability assessments sequentially as new evidence emerges.
The posterior probability from one calculation becomes the prior probability for the next, creating a chain of updates.
With sufficient evidence, probabilities tend to converge toward the truth regardless of initial priors (unless they’re 0 or 1).
Sequential updating reflects how we naturally learn and revise our beliefs based on accumulated experience.
4.5.4. Exercises
These exercises develop your skills in applying Bayes’ Rule sequentially as new evidence emerges.
Exercise 1: Sequential Updating Basics
A machine learning system is trying to determine whether a user is a human or a bot based on their behavior. Initially, the system assumes:
\(P(\text{Bot}) = 0.20\) (20% of traffic is from bots)
\(P(\text{Human}) = 0.80\)
The system observes behaviors and updates its belief. For the first behavior observed:
\(P(\text{Fast response} | \text{Bot}) = 0.70\)
\(P(\text{Fast response} | \text{Human}) = 0.30\)
The system observes a fast response. Calculate \(P(\text{Bot} | \text{Fast response})\).
Using the result from part (a) as the new prior, suppose the system observes a second behavior — clicking in a perfectly regular pattern:
\(P(\text{Regular click} | \text{Bot}) = 0.80\)
\(P(\text{Regular click} | \text{Human}) = 0.10\)
Calculate the updated probability that the user is a bot.
How much has the probability of “Bot” changed from the initial prior (0.20) to after observing both behaviors?
Solution
Part (a): First Update — Fast Response
Let \(B\) = Bot, \(H\) = Human, \(F\) = Fast response
Apply Bayes’ Rule:
After observing a fast response, \(P(\text{Bot}) \approx 36.84\%\).
Part (b): Second Update — Regular Click Pattern
Now use \(P(B) = 0.3684\) as the new prior.
Let \(R\) = Regular click pattern
Where \(P(H|F) = 1 - P(B|F) = 1 - 0.3684 = 0.6316\)
After both observations, \(P(\text{Bot}) \approx 82.35\%\).
Part (c): Total Change
Initial prior: \(P(\text{Bot}) = 0.20\)
After first evidence: \(P(\text{Bot}|F) \approx 0.37\)
After second evidence: \(P(\text{Bot}|F,R) \approx 0.82\)
The probability increased from 20% to 82%, a change of 62 percentage points. The system now strongly believes this is a bot.
Exercise 2: Biased Coin Detection
You have a bag containing 5 coins: 1 biased coin with \(P(\text{Heads}) = 0.9\) and 4 fair coins with \(P(\text{Heads}) = 0.5\). You randomly select one coin and flip it repeatedly.
Before any flips, what is \(P(\text{Biased})\)?
The first flip is Heads. Calculate \(P(\text{Biased} | H_1)\).
The second flip is also Heads. Calculate \(P(\text{Biased} | H_1, H_2)\).
The third flip is Tails. Calculate \(P(\text{Biased} | H_1, H_2, T_3)\).
After observing H, H, T, has your belief about the coin being biased increased or decreased compared to the prior? Explain intuitively why.
Solution
Let \(B\) = Biased coin, \(F\) = Fair coin
Part (a): Prior
Part (b): After First Heads
Part (c): After Second Heads
New prior: \(P(B) = 0.3103\), \(P(F) = 0.6897\)
Part (d): After Third Flip is Tails
New prior: \(P(B) = 0.4474\), \(P(F) = 0.5526\)
For tails: \(P(T|B) = 0.1\), \(P(T|F) = 0.5\)
Part (e): Interpretation
Prior: \(P(B) = 0.20\)
After H, H, T: \(P(B) \approx 0.14\)
The belief has decreased from 20% to about 14%.
Intuitive explanation: While two heads initially increased our belief (the biased coin is more likely to show heads), the single tail dramatically reduced it. The biased coin has only a 10% chance of tails, while a fair coin has 50%. Seeing tails is strong evidence against the biased coin. The tails observation outweighed the two heads because \(P(T|B)/P(T|F) = 0.1/0.5 = 0.2\), making tails 5 times more likely under the fair coin hypothesis.
Exercise 3: Server Fault Diagnosis
A data center engineer is diagnosing why a server is running slowly. Based on experience, the three most common causes and their prior probabilities are:
\(P(\text{Memory leak}) = 0.50\)
\(P(\text{CPU overload}) = 0.30\)
\(P(\text{Network issue}) = 0.20\)
The engineer runs a diagnostic test that checks CPU usage. A high CPU reading has the following probabilities:
\(P(\text{High CPU} | \text{Memory leak}) = 0.40\)
\(P(\text{High CPU} | \text{CPU overload}) = 0.90\)
\(P(\text{High CPU} | \text{Network issue}) = 0.20\)
The test shows high CPU usage. Calculate the posterior probability for each cause.
Verify that your three posterior probabilities sum to 1.
The engineer now runs a memory diagnostic. The probability of detecting elevated memory usage is:
\(P(\text{High memory} | \text{Memory leak}) = 0.85\)
\(P(\text{High memory} | \text{CPU overload}) = 0.30\)
\(P(\text{High memory} | \text{Network issue}) = 0.15\)
The test shows high memory usage. Using the posteriors from part (a) as new priors, calculate the updated probabilities.
Based on all evidence, which cause is most likely? How confident is the engineer?
Solution
Let M = Memory leak, C = CPU overload, N = Network issue, H = High CPU
Part (a): First Update — High CPU
First, calculate P(High CPU) using Law of Total Probability:
Apply Bayes’ Rule for each cause:
Part (b): Verification
\(0.3922 + 0.5294 + 0.0784 = 1.0000\) ✓
Part (c): Second Update — High Memory
Use posteriors from (a) as new priors. Let \(H_M\) = High memory.
Calculate P(High Memory | first evidence):
Apply Bayes’ Rule again:
Part (d): Final Assessment
After both diagnostics:
Memory leak: 66.15% ← Most likely
CPU overload: 31.51%
Network issue: 2.34%
The engineer should be moderately confident that the problem is a memory leak. While it’s the most likely cause at 66%, there’s still about a 32% chance it’s CPU overload. The network issue is very unlikely given the evidence.
Note: The first test pointed toward CPU overload (52.9%), but the high memory reading shifted the probability toward memory leak. This shows how additional evidence can change our conclusions.
Exercise 4: One-Step vs Sequential Computation
A quality control system uses two sensors to detect defective products.
\(P(\text{Defective}) = 0.05\)
Sensor 1: \(P(\text{Alert}_1 | \text{Defective}) = 0.95\), \(P(\text{Alert}_1 | \text{Good}) = 0.08\)
Sensor 2: \(P(\text{Alert}_2 | \text{Defective}) = 0.90\), \(P(\text{Alert}_2 | \text{Good}) = 0.05\)
Both sensors trigger alerts for a product.
Sequential approach: First update using Sensor 1, then update using Sensor 2. Find \(P(\text{Defective} | \text{Alert}_1, \text{Alert}_2)\).
One-step approach: Compute the probability directly by treating both alerts as a single piece of evidence. Assume sensor readings are conditionally independent given defect status.
Verify that both approaches give the same answer.
What is the advantage of the sequential approach?
Solution
Let D = Defective, G = Good, \(A_1\) = Alert from Sensor 1, \(A_2\) = Alert from Sensor 2
Part (a): Sequential Approach
Step 1: Update with Sensor 1
Step 2: Update with Sensor 2
New prior: \(P(D) = 0.3846\), \(P(G) = 0.6154\)
Part (b): One-Step Approach
With conditional independence:
Apply Bayes’ Rule:
Part (c): Verification
Sequential: 0.9183 One-step: 0.9184
The small difference is due to rounding. Both approaches yield the same result ✓
Part (d): Advantages of Sequential Approach
Intermediate decisions: Can decide after each sensor whether more testing is needed
Easier calculation: Each step is simpler than one large calculation
Real-time updating: Natural when data arrives over time
Insight into evidence: Shows how each piece of evidence changes our belief
Flexibility: Can easily incorporate different types of evidence
Exercise 5: Convergence of Beliefs
Two engineers, Alice and Bob, are trying to determine if a new manufacturing process produces acceptable parts. They have different prior beliefs:
Alice: \(P(\text{Acceptable}) = 0.30\) (skeptical)
Bob: \(P(\text{Acceptable}) = 0.70\) (optimistic)
They both observe the same sequence of test results. For acceptable processes:
\(P(\text{Part passes} | \text{Acceptable}) = 0.85\)
For unacceptable processes:
\(P(\text{Part passes} | \text{Unacceptable}) = 0.50\)
They test 5 parts, and all 5 pass.
Calculate Alice’s posterior probability after each of the 5 tests.
Calculate Bob’s posterior probability after each of the 5 tests.
How much closer are Alice’s and Bob’s beliefs after 5 tests compared to before any tests?
If the process is truly acceptable, what would you expect to happen to both engineers’ beliefs as they test more and more parts?
Solution
Let A = Acceptable, U = Unacceptable, P = Pass
Part (a): Alice’s Updates (Prior = 0.30)
General update formula:
|------|————|----------------| | 0 | — | 0.300 | | 1 | 0.300 | 0.421 | | 2 | 0.421 | 0.553 | | 3 | 0.553 | 0.678 | | 4 | 0.678 | 0.782 | | 5 | 0.782 | 0.859 |
Part (b): Bob’s Updates (Prior = 0.70)
|------|————|----------------| | 0 | — | 0.700 | | 1 | 0.700 | 0.799 | | 2 | 0.799 | 0.871 | | 3 | 0.871 | 0.920 | | 4 | 0.920 | 0.951 | | 5 | 0.951 | 0.970 |
Part (c): Convergence Analysis
Initial difference: \(|0.70 - 0.30| = 0.40\)
Final difference: \(|0.970 - 0.859| = 0.111\)
The beliefs have converged significantly. The gap narrowed from 0.40 to 0.11, a reduction of about 72%.
Part (d): Long-term Convergence
If the process is truly acceptable, we expect:
Both Alice and Bob’s posteriors will converge toward 1.0
Their beliefs will become increasingly similar regardless of starting priors
The rate of convergence depends on how distinguishable acceptable and unacceptable processes are (here, 0.85 vs 0.50)
This is a fundamental property of Bayesian updating: with enough evidence, rational observers will reach similar conclusions regardless of their initial beliefs (as long as neither starts with probability 0 or 1).
Exercise 6: Adaptive Testing
A software testing system uses adaptive testing. After each test, it updates its belief about whether a bug exists and decides whether to continue testing. The system starts with:
\(P(\text{Bug exists}) = 0.50\) (uninformative prior)
Each test can result in “Error detected” or “No error.” The probabilities are:
\(P(\text{Error} | \text{Bug exists}) = 0.70\)
\(P(\text{Error} | \text{No bug}) = 0.10\)
The testing stops when \(P(\text{Bug exists})\) drops below 0.10 OR rises above 0.90.
Consider the sequence: Error, No Error, Error, Error.
Calculate the posterior after each observation.
At what point (if any) does the testing stop?
If the first test had shown “No Error” instead, what would the posterior be? Would testing continue?
Why might using thresholds (0.10 and 0.90) be useful in practice?
Solution
Let B = Bug exists, N = No bug, E = Error detected, \(\bar{E}\) = No error
Part (a): Sequential Updates
Test 1: Error detected
Test 2: No Error (prior = 0.875)
\(P(\bar{E}|B) = 0.30\), \(P(\bar{E}|N) = 0.90\)
Test 3: Error detected (prior = 0.70)
Summary table:
|------|————-|--------|———–|-----------| | 1 | Error | 0.500 | 0.875 | Yes | | 2 | No Error | 0.875 | 0.700 | Yes | | 3 | Error | 0.700 | 0.942 | STOP |
Part (b): Stopping Point
Testing stops after Test 3. The posterior (0.942) exceeds the upper threshold of 0.90, indicating high confidence that a bug exists.
Part (c): Alternative First Test — No Error
If the first test showed “No Error”:
Since 0.25 is between 0.10 and 0.90, testing would continue.
This shows how a single piece of evidence can shift beliefs substantially (from 0.50 to 0.25), but may not be enough to reach a decision threshold.
Part (d): Benefits of Threshold-Based Stopping
Efficiency: Stop testing when confident enough — avoid wasting resources on unnecessary tests
Decision support: Clear criteria for when to act (fix the bug) vs. continue investigating
Risk management: The thresholds reflect acceptable error rates (false positives at 0.90, false negatives at 0.10)
Adaptive: Number of tests depends on evidence — uncertain cases get more tests
Interpretable: Easy to explain decisions to stakeholders (“We’re 94% confident a bug exists”)
4.5.5. Additional Practice Problems
True/False Questions (1 point each)
In Bayesian updating, the posterior probability from one calculation becomes the prior for the next calculation.
Ⓣ or Ⓕ
Sequential Bayesian updating always requires more calculations than the one-step approach.
Ⓣ or Ⓕ
If two observers start with different priors but observe the same evidence, their posteriors will always be identical.
Ⓣ or Ⓕ
With sufficient evidence, Bayesian posteriors tend to converge toward the truth regardless of the initial prior (assuming the prior is not 0 or 1).
Ⓣ or Ⓕ
If a piece of evidence is equally likely under both hypotheses, observing that evidence will not change the posterior probability.
Ⓣ or Ⓕ
The order in which evidence is processed affects the final posterior probability in sequential Bayesian updating.
Ⓣ or Ⓕ
Multiple Choice Questions (2 points each)
A coin is either fair (P(H) = 0.5) or biased (P(H) = 0.8). Initially P(Biased) = 0.25. After observing one heads, what is P(Biased)?
Ⓐ 0.25
Ⓑ 0.35
Ⓒ 0.40
Ⓓ 0.50
Starting from P(Disease) = 0.01, a positive test with sensitivity 0.95 and specificity 0.90 yields a posterior of about 0.088. If we get a second positive test (same characteristics, independent), the posterior will be:
Ⓐ About 0.088 (unchanged)
Ⓑ About 0.18 (roughly doubled)
Ⓒ About 0.48 (much higher)
Ⓓ About 0.95 (near certain)
Two scientists with priors P(H) = 0.2 and P(H) = 0.8 observe evidence E where P(E|H) = 0.9 and P(E|H’) = 0.3. After observing E, which statement is true?
Ⓐ Both posteriors increase
Ⓑ Both posteriors decrease
Ⓒ The gap between their posteriors increases
Ⓓ The gap between their posteriors decreases
In sequential Bayesian updating, what does “conditional independence” of observations mean?
Ⓐ Each observation is independent of all previous observations
Ⓑ Observations are independent given the hypothesis (the true state)
Ⓒ The prior probability doesn’t affect the posterior
Ⓓ The order of observations matters for the final answer
Answers to Practice Problems
True/False Answers:
True — This is the core mechanism of sequential Bayesian updating: posterior → new prior.
False — Sequential updating often involves simpler calculations at each step, and can be more practical when evidence arrives over time. The total computational effort is similar.
False — Different priors lead to different posteriors, even with the same evidence. However, with enough evidence, the posteriors will converge.
True — This is a fundamental property of Bayesian inference called “convergence.” As evidence accumulates, the data overwhelms the prior.
True — If P(E|H) = P(E|H’), then the likelihood ratio is 1, and the posterior equals the prior. The evidence is “uninformative.”
False — With conditionally independent observations, the order doesn’t matter. The final posterior is the same regardless of the sequence (though intermediate posteriors will differ).
Multiple Choice Answers:
Ⓒ — P(Biased|H) = (0.8)(0.25) / [(0.8)(0.25) + (0.5)(0.75)] = 0.20 / 0.575 ≈ 0.35… wait, let me recalculate.
P(Biased|H) = (0.8 × 0.25) / (0.8 × 0.25 + 0.5 × 0.75) = 0.20 / (0.20 + 0.375) = 0.20 / 0.575 ≈ 0.348
The closest answer is Ⓑ 0.35.
Ⓒ — Using the posterior 0.088 as the new prior: P(+|D) = 0.95, P(+|D’) = 0.10. P(D|++) = (0.95 × 0.088) / (0.95 × 0.088 + 0.10 × 0.912) ≈ 0.0836 / 0.175 ≈ 0.48.
Ⓓ — Evidence that favors H (likelihood ratio > 1) will increase both posteriors, but will also bring them closer together. The posteriors move in the same direction but converge.
Ⓑ — Conditional independence means observations are independent given the true state. For example, coin flips are independent given which coin you have, even though they’re not unconditionally independent.