4.5. Sequential Bayesian Updating

Perhaps the most powerful aspect of Bayes’ rule is that it allows us to systematically update our beliefs as new evidence emerges. In this lesson, we’ll explore the mechanics of iterative Bayesian updating by applying it to a real-world problem.

Road Map 🧭

  • Understand how Bayes’ rule enables sequential probability updates.

  • Explore the mechanics of iterative Bayesian updating through a coin flip example.

  • Observe how probabilities converge toward the truth over iterations with sufficient evidence.

4.5.1. Detecting a Biased Coin

Let’s explore a concrete example to illustrate Bayesian updating in action.

Have Your Tools Ready🔧

This lesson is a focused exploration of concepts from Section 4.4. Review Section 4.4 and keep your notes handy as you read.

The Problem

Suppose you have a bag containing 10 coins. One of these coins is biased, with a probability of heads equal to 0.8. The other 9 coins are fair, with a probability of heads equal to 0.5. You reach into the bag, select a coin at random, and flip it 10 times. Each flip results in heads. What is the probability that you selected the biased coin?

The Setup

To solve this problem, we need to define our events:

  • Let \(H_i\) denote the event that the coin showed heads on the \(i\)-th flip.

  • Let \(B\) denote the event that the biased coin was selected.

We know the following initial probabilities:

  • \(P(B) = 1/10\) (probability of selecting the biased coin)

  • \(P(B') = 9/10\) (probability of selecting a fair coin)

  • \(P(H_i|B) = 0.8\) (probability of heads on any flip, given the biased coin)

  • \(P(H_i|B') = 0.5\) (probability of heads on any flip, given a fair coin)

We want to find \(P(B|H_1,H_2,...,H_{10})\), the probability that the selected coin is biased given that all 10 flips resulted in heads. Rather than tackling this all at once, let us update our probability assessment after each flip.

Flip 1 🪙

After the first flip results in heads, we calculate

\[P(B|H_1) = \frac{P(H_1|B) P(B)}{P(H_1|B) P(B) + P(H_1|B') P(B')}\]

Substituting the values,

\[P(B|H_1) = \frac{0.8 \times \frac{1}{10}}{0.8 \times \frac{1}{10} + 0.5 \times \frac{9}{10}} = \frac{0.08}{0.08 + 0.45} = \frac{0.08}{0.53} = \frac{8}{53} \approx 0.151\]

After seeing one heads, the probability that we have the biased coin has increased from 0.1 to about 0.151. This makes intuitive sense—getting heads is more likely with the biased coin, so observing heads should increase our confidence that we have the biased coin. However, the change is modest because heads is a common outcome even with a fair coin.

Flip 2 🪙🪙

For the second flip, we treat the updated probability after the first flip, \(P(B|H_1)\), as the new prior:

\[P(B|H_1,H_2) = \frac{P(H_2|B,H_1) P(B|H_1)}{P(H_2|B,H_1) P(B|H_1) + P(H_2|B',H_1) P(B'|H_1)}\]

A key insight here is that individual flips of the same coin are independent events. The outcome of the first flip doesn’t affect the probability of heads on the second flip. This means:

\[\begin{split}P(H_2|B,H_1) = P(H_2|B) = 0.8 \\ P(H_2|B',H_1) = P(H_2|B') = 0.5\end{split}\]

Using these simplifications,

\[\begin{split}P(B|H_1,H_2) &= \frac{0.8 \times \frac{8}{53}}{0.8 \times \frac{8}{53} + 0.5 \times (1 - \frac{8}{53})} \\ &= \frac{0.8 \times \frac{8}{53}}{0.8 \times \frac{8}{53} + 0.5 \times \frac{45}{53}} \\ &= \frac{64}{289} \approx 0.221\end{split}\]

After the second heads, our confidence that we have the biased coin has increased further to about 0.221.

Flip 3 🪙🪙🪙

For the third flip,

\[P(B|H_1,H_2,H_3) = \frac{P(H_3|B) P(B|H_1,H_2)}{P(H_3|B) P(B|H_1,H_2) + P(H_3|B') P(B'|H_1,H_2)}\]

Substituting the values,

\[\begin{split}P(B|H_1,H_2,H_3) &= \frac{0.8 \times \frac{64}{289}}{0.8 \times \frac{64}{289} + 0.5 \times (1 - \frac{64}{289})} \\ &= \frac{0.8 \times \frac{64}{289}}{0.8 \times \frac{64}{289} + 0.5 \times \frac{225}{289}} \\ &= \frac{512}{1637} \approx 0.313\end{split}\]

The pattern continues for subsequent flips. With each additional heads, we become more confident that we have the biased coin.

Summary of All 10 Updates

Below is a complete table of the posterior probability \(p_k=P(B\mid H_1,\dots,H_k)\) after each successive throw of heads. Values are rounded to three decimals.

Flip \(k\)

Posterior \(p_k\)

0 (prior)

0.100

1

0.151

2

0.221

3

0.313

4

0.421

5

0.538

6

0.651

7

0.749

8

0.827

9

0.884

10

0.924

Notice how evidence accumulates rapidly: after just five heads, the biased coin is already more likely than not, and by the tenth head, the posterior has climbed to about 0.924. If we had observed a tails at any point, our confidence would have decreased instead, since the biased coin is less likely to show tails than a fair coin.

The Convergence of Beliefs

An important property of Bayesian updating is that with sufficient evidence, the posterior probabilities tend to converge toward the truth, regardless of the initial prior (as long as the prior isn’t exactly 0 or 1, which would represent absolute certainty).

This is why Bayesian methods are widely used in scientific and statistical reasoning— even if researchers begin with different prior beliefs, accumulating evidence will eventually lead them toward similar conclusions.

4.5.2. Comparison with One-Step Computation

We could have directly calculated the probability \(P(B|H_1,H_2,...,H_{10})\) by taking the following steps:

Step 1: Apply Bayes’ rule by treating the sequence of ten heads as a single event

\[P(B|H_1,H_2,...,H_{10}) = \frac{P(H_1,H_2,...,H_{10}|B) P(B)}{P(H_1,H_2,...,H_{10}|B) P(B) + P(H_1,H_2,...,H_{10}|B') P(B')}\]

Step 2: Compute the components

With independent flips,

\[P(H_1,H_2,...,H_{10}|B) = P(H_1|B) P(H_2|B) \times ... \times P(H_{10}|B) = (0.8)^{10}\]

And similarly,

\[P(H_1,H_2,...,H_{10}|B') = (0.5)^{10}\]

Step 3: Substitute

\[P(B|H_1,H_2,...,H_{10}) = \frac{(0.8)^{10} \times \frac{1}{10}}{(0.8)^{10} \times \frac{1}{10} + (0.5)^{10} \times \frac{9}{10}} \approx 0.924\]

This direct approach yields the same result as sequential updating. However, the sequential approach has a few advantages:

  1. It shows how our beliefs evolve with each new piece of evidence.

  2. It allows us to stop and make decisions at any point in the sequence.

  3. It more naturally accommodates scenarios where evidence arrives over time.

  4. It often involves simpler calculations at each step.

4.5.3. Bringing It All Together

Key Takeaways 📝

  1. Bayesian updating allows us to revise probability assessments sequentially as new evidence emerges.

  2. The posterior probability from one calculation becomes the prior probability for the next, creating a chain of updates.

  3. With sufficient evidence, probabilities tend to converge toward the truth regardless of initial priors (unless they’re 0 or 1).

  4. Sequential updating reflects how we naturally learn and revise our beliefs based on accumulated experience.

4.5.4. Exercises

These exercises develop your skills in applying Bayes’ Rule sequentially as new evidence emerges.

Exercise 1: Sequential Updating Basics

A machine learning system is trying to determine whether a user is a human or a bot based on their behavior. Initially, the system assumes:

  • \(P(\text{Bot}) = 0.20\) (20% of traffic is from bots)

  • \(P(\text{Human}) = 0.80\)

The system observes behaviors and updates its belief. For the first behavior observed:

  • \(P(\text{Fast response} | \text{Bot}) = 0.70\)

  • \(P(\text{Fast response} | \text{Human}) = 0.30\)

  1. The system observes a fast response. Calculate \(P(\text{Bot} | \text{Fast response})\).

  2. Using the result from part (a) as the new prior, suppose the system observes a second behavior — clicking in a perfectly regular pattern:

    • \(P(\text{Regular click} | \text{Bot}) = 0.80\)

    • \(P(\text{Regular click} | \text{Human}) = 0.10\)

    Calculate the updated probability that the user is a bot.

  3. How much has the probability of “Bot” changed from the initial prior (0.20) to after observing both behaviors?

Solution

Part (a): First Update — Fast Response

Let \(B\) = Bot, \(H\) = Human, \(F\) = Fast response

Apply Bayes’ Rule:

\[P(B|F) = \frac{P(F|B) \cdot P(B)}{P(F|B) \cdot P(B) + P(F|H) \cdot P(H)}\]
\[P(B|F) = \frac{(0.70)(0.20)}{(0.70)(0.20) + (0.30)(0.80)} = \frac{0.14}{0.14 + 0.24} = \frac{0.14}{0.38} \approx 0.3684\]

After observing a fast response, \(P(\text{Bot}) \approx 36.84\%\).

Part (b): Second Update — Regular Click Pattern

Now use \(P(B) = 0.3684\) as the new prior.

Let \(R\) = Regular click pattern

\[P(B|F,R) = \frac{P(R|B) \cdot P(B|F)}{P(R|B) \cdot P(B|F) + P(R|H) \cdot P(H|F)}\]

Where \(P(H|F) = 1 - P(B|F) = 1 - 0.3684 = 0.6316\)

\[P(B|F,R) = \frac{(0.80)(0.3684)}{(0.80)(0.3684) + (0.10)(0.6316)} = \frac{0.2947}{0.2947 + 0.0632} = \frac{0.2947}{0.3579} \approx 0.8235\]

After both observations, \(P(\text{Bot}) \approx 82.35\%\).

Part (c): Total Change

  • Initial prior: \(P(\text{Bot}) = 0.20\)

  • After first evidence: \(P(\text{Bot}|F) \approx 0.37\)

  • After second evidence: \(P(\text{Bot}|F,R) \approx 0.82\)

The probability increased from 20% to 82%, a change of 62 percentage points. The system now strongly believes this is a bot.


Exercise 2: Biased Coin Detection

You have a bag containing 5 coins: 1 biased coin with \(P(\text{Heads}) = 0.9\) and 4 fair coins with \(P(\text{Heads}) = 0.5\). You randomly select one coin and flip it repeatedly.

  1. Before any flips, what is \(P(\text{Biased})\)?

  2. The first flip is Heads. Calculate \(P(\text{Biased} | H_1)\).

  3. The second flip is also Heads. Calculate \(P(\text{Biased} | H_1, H_2)\).

  4. The third flip is Tails. Calculate \(P(\text{Biased} | H_1, H_2, T_3)\).

  5. After observing H, H, T, has your belief about the coin being biased increased or decreased compared to the prior? Explain intuitively why.

Solution

Let \(B\) = Biased coin, \(F\) = Fair coin

Part (a): Prior

\[P(B) = \frac{1}{5} = 0.20, \quad P(F) = \frac{4}{5} = 0.80\]

Part (b): After First Heads

\[P(B|H_1) = \frac{P(H|B) \cdot P(B)}{P(H|B) \cdot P(B) + P(H|F) \cdot P(F)}\]
\[P(B|H_1) = \frac{(0.9)(0.2)}{(0.9)(0.2) + (0.5)(0.8)} = \frac{0.18}{0.18 + 0.40} = \frac{0.18}{0.58} \approx 0.3103\]

Part (c): After Second Heads

New prior: \(P(B) = 0.3103\), \(P(F) = 0.6897\)

\[P(B|H_1,H_2) = \frac{(0.9)(0.3103)}{(0.9)(0.3103) + (0.5)(0.6897)} = \frac{0.2793}{0.2793 + 0.3449} = \frac{0.2793}{0.6242} \approx 0.4474\]

Part (d): After Third Flip is Tails

New prior: \(P(B) = 0.4474\), \(P(F) = 0.5526\)

For tails: \(P(T|B) = 0.1\), \(P(T|F) = 0.5\)

\[P(B|H_1,H_2,T_3) = \frac{(0.1)(0.4474)}{(0.1)(0.4474) + (0.5)(0.5526)} = \frac{0.0447}{0.0447 + 0.2763} = \frac{0.0447}{0.3210} \approx 0.1393\]

Part (e): Interpretation

  • Prior: \(P(B) = 0.20\)

  • After H, H, T: \(P(B) \approx 0.14\)

The belief has decreased from 20% to about 14%.

Intuitive explanation: While two heads initially increased our belief (the biased coin is more likely to show heads), the single tail dramatically reduced it. The biased coin has only a 10% chance of tails, while a fair coin has 50%. Seeing tails is strong evidence against the biased coin. The tails observation outweighed the two heads because \(P(T|B)/P(T|F) = 0.1/0.5 = 0.2\), making tails 5 times more likely under the fair coin hypothesis.


Exercise 3: Server Fault Diagnosis

A data center engineer is diagnosing why a server is running slowly. Based on experience, the three most common causes and their prior probabilities are:

  • \(P(\text{Memory leak}) = 0.50\)

  • \(P(\text{CPU overload}) = 0.30\)

  • \(P(\text{Network issue}) = 0.20\)

The engineer runs a diagnostic test that checks CPU usage. A high CPU reading has the following probabilities:

  • \(P(\text{High CPU} | \text{Memory leak}) = 0.40\)

  • \(P(\text{High CPU} | \text{CPU overload}) = 0.90\)

  • \(P(\text{High CPU} | \text{Network issue}) = 0.20\)

  1. The test shows high CPU usage. Calculate the posterior probability for each cause.

  2. Verify that your three posterior probabilities sum to 1.

  3. The engineer now runs a memory diagnostic. The probability of detecting elevated memory usage is:

    • \(P(\text{High memory} | \text{Memory leak}) = 0.85\)

    • \(P(\text{High memory} | \text{CPU overload}) = 0.30\)

    • \(P(\text{High memory} | \text{Network issue}) = 0.15\)

    The test shows high memory usage. Using the posteriors from part (a) as new priors, calculate the updated probabilities.

  4. Based on all evidence, which cause is most likely? How confident is the engineer?

Solution

Let M = Memory leak, C = CPU overload, N = Network issue, H = High CPU

Part (a): First Update — High CPU

First, calculate P(High CPU) using Law of Total Probability:

\[\begin{split}P(H) &= P(H|M)P(M) + P(H|C)P(C) + P(H|N)P(N) \\ &= (0.40)(0.50) + (0.90)(0.30) + (0.20)(0.20) \\ &= 0.20 + 0.27 + 0.04 = 0.51\end{split}\]

Apply Bayes’ Rule for each cause:

\[P(M|H) = \frac{P(H|M)P(M)}{P(H)} = \frac{(0.40)(0.50)}{0.51} = \frac{0.20}{0.51} \approx 0.3922\]
\[P(C|H) = \frac{P(H|C)P(C)}{P(H)} = \frac{(0.90)(0.30)}{0.51} = \frac{0.27}{0.51} \approx 0.5294\]
\[P(N|H) = \frac{P(H|N)P(N)}{P(H)} = \frac{(0.20)(0.20)}{0.51} = \frac{0.04}{0.51} \approx 0.0784\]

Part (b): Verification

\(0.3922 + 0.5294 + 0.0784 = 1.0000\)

Part (c): Second Update — High Memory

Use posteriors from (a) as new priors. Let \(H_M\) = High memory.

Calculate P(High Memory | first evidence):

\[\begin{split}P(H_M) &= P(H_M|M) \cdot P(M|H) + P(H_M|C) \cdot P(C|H) + P(H_M|N) \cdot P(N|H) \\ &= (0.85)(0.3922) + (0.30)(0.5294) + (0.15)(0.0784) \\ &= 0.3334 + 0.1588 + 0.0118 = 0.5040\end{split}\]

Apply Bayes’ Rule again:

\[P(M|H,H_M) = \frac{(0.85)(0.3922)}{0.5040} = \frac{0.3334}{0.5040} \approx 0.6615\]
\[P(C|H,H_M) = \frac{(0.30)(0.5294)}{0.5040} = \frac{0.1588}{0.5040} \approx 0.3151\]
\[P(N|H,H_M) = \frac{(0.15)(0.0784)}{0.5040} = \frac{0.0118}{0.5040} \approx 0.0234\]

Part (d): Final Assessment

After both diagnostics:

  • Memory leak: 66.15% ← Most likely

  • CPU overload: 31.51%

  • Network issue: 2.34%

The engineer should be moderately confident that the problem is a memory leak. While it’s the most likely cause at 66%, there’s still about a 32% chance it’s CPU overload. The network issue is very unlikely given the evidence.

Note: The first test pointed toward CPU overload (52.9%), but the high memory reading shifted the probability toward memory leak. This shows how additional evidence can change our conclusions.


Exercise 4: One-Step vs Sequential Computation

A quality control system uses two sensors to detect defective products.

  • \(P(\text{Defective}) = 0.05\)

  • Sensor 1: \(P(\text{Alert}_1 | \text{Defective}) = 0.95\), \(P(\text{Alert}_1 | \text{Good}) = 0.08\)

  • Sensor 2: \(P(\text{Alert}_2 | \text{Defective}) = 0.90\), \(P(\text{Alert}_2 | \text{Good}) = 0.05\)

Both sensors trigger alerts for a product.

  1. Sequential approach: First update using Sensor 1, then update using Sensor 2. Find \(P(\text{Defective} | \text{Alert}_1, \text{Alert}_2)\).

  2. One-step approach: Compute the probability directly by treating both alerts as a single piece of evidence. Assume sensor readings are conditionally independent given defect status.

  3. Verify that both approaches give the same answer.

  4. What is the advantage of the sequential approach?

Solution

Let D = Defective, G = Good, \(A_1\) = Alert from Sensor 1, \(A_2\) = Alert from Sensor 2

Part (a): Sequential Approach

Step 1: Update with Sensor 1

\[P(D|A_1) = \frac{P(A_1|D)P(D)}{P(A_1|D)P(D) + P(A_1|G)P(G)}\]
\[P(D|A_1) = \frac{(0.95)(0.05)}{(0.95)(0.05) + (0.08)(0.95)} = \frac{0.0475}{0.0475 + 0.076} = \frac{0.0475}{0.1235} \approx 0.3846\]

Step 2: Update with Sensor 2

New prior: \(P(D) = 0.3846\), \(P(G) = 0.6154\)

\[P(D|A_1,A_2) = \frac{P(A_2|D)P(D|A_1)}{P(A_2|D)P(D|A_1) + P(A_2|G)P(G|A_1)}\]
\[P(D|A_1,A_2) = \frac{(0.90)(0.3846)}{(0.90)(0.3846) + (0.05)(0.6154)} = \frac{0.3461}{0.3461 + 0.0308} = \frac{0.3461}{0.3769} \approx 0.9183\]

Part (b): One-Step Approach

With conditional independence:

\[P(A_1, A_2 | D) = P(A_1|D) \cdot P(A_2|D) = (0.95)(0.90) = 0.855\]
\[P(A_1, A_2 | G) = P(A_1|G) \cdot P(A_2|G) = (0.08)(0.05) = 0.004\]

Apply Bayes’ Rule:

\[P(D|A_1,A_2) = \frac{(0.855)(0.05)}{(0.855)(0.05) + (0.004)(0.95)} = \frac{0.04275}{0.04275 + 0.0038} = \frac{0.04275}{0.04655} \approx 0.9184\]

Part (c): Verification

Sequential: 0.9183 One-step: 0.9184

The small difference is due to rounding. Both approaches yield the same result

Part (d): Advantages of Sequential Approach

  1. Intermediate decisions: Can decide after each sensor whether more testing is needed

  2. Easier calculation: Each step is simpler than one large calculation

  3. Real-time updating: Natural when data arrives over time

  4. Insight into evidence: Shows how each piece of evidence changes our belief

  5. Flexibility: Can easily incorporate different types of evidence


Exercise 5: Convergence of Beliefs

Two engineers, Alice and Bob, are trying to determine if a new manufacturing process produces acceptable parts. They have different prior beliefs:

  • Alice: \(P(\text{Acceptable}) = 0.30\) (skeptical)

  • Bob: \(P(\text{Acceptable}) = 0.70\) (optimistic)

They both observe the same sequence of test results. For acceptable processes:

  • \(P(\text{Part passes} | \text{Acceptable}) = 0.85\)

For unacceptable processes:

  • \(P(\text{Part passes} | \text{Unacceptable}) = 0.50\)

They test 5 parts, and all 5 pass.

  1. Calculate Alice’s posterior probability after each of the 5 tests.

  2. Calculate Bob’s posterior probability after each of the 5 tests.

  3. How much closer are Alice’s and Bob’s beliefs after 5 tests compared to before any tests?

  4. If the process is truly acceptable, what would you expect to happen to both engineers’ beliefs as they test more and more parts?

Solution

Let A = Acceptable, U = Unacceptable, P = Pass

Part (a): Alice’s Updates (Prior = 0.30)

General update formula:

\[P(A|P) = \frac{P(P|A) \cdot P(A)}{P(P|A) \cdot P(A) + P(P|U) \cdot P(U)} = \frac{0.85 \cdot P(A)}{0.85 \cdot P(A) + 0.50 \cdot (1-P(A))}\]
Test | Prior P(A) | Posterior P(A) |

|------|————|----------------| | 0 | — | 0.300 | | 1 | 0.300 | 0.421 | | 2 | 0.421 | 0.553 | | 3 | 0.553 | 0.678 | | 4 | 0.678 | 0.782 | | 5 | 0.782 | 0.859 |

Part (b): Bob’s Updates (Prior = 0.70)

Test | Prior P(A) | Posterior P(A) |

|------|————|----------------| | 0 | — | 0.700 | | 1 | 0.700 | 0.799 | | 2 | 0.799 | 0.871 | | 3 | 0.871 | 0.920 | | 4 | 0.920 | 0.951 | | 5 | 0.951 | 0.970 |

Part (c): Convergence Analysis

Initial difference: \(|0.70 - 0.30| = 0.40\)

Final difference: \(|0.970 - 0.859| = 0.111\)

The beliefs have converged significantly. The gap narrowed from 0.40 to 0.11, a reduction of about 72%.

Part (d): Long-term Convergence

If the process is truly acceptable, we expect:

  • Both Alice and Bob’s posteriors will converge toward 1.0

  • Their beliefs will become increasingly similar regardless of starting priors

  • The rate of convergence depends on how distinguishable acceptable and unacceptable processes are (here, 0.85 vs 0.50)

This is a fundamental property of Bayesian updating: with enough evidence, rational observers will reach similar conclusions regardless of their initial beliefs (as long as neither starts with probability 0 or 1).


Exercise 6: Adaptive Testing

A software testing system uses adaptive testing. After each test, it updates its belief about whether a bug exists and decides whether to continue testing. The system starts with:

  • \(P(\text{Bug exists}) = 0.50\) (uninformative prior)

Each test can result in “Error detected” or “No error.” The probabilities are:

  • \(P(\text{Error} | \text{Bug exists}) = 0.70\)

  • \(P(\text{Error} | \text{No bug}) = 0.10\)

The testing stops when \(P(\text{Bug exists})\) drops below 0.10 OR rises above 0.90.

Consider the sequence: Error, No Error, Error, Error.

  1. Calculate the posterior after each observation.

  2. At what point (if any) does the testing stop?

  3. If the first test had shown “No Error” instead, what would the posterior be? Would testing continue?

  4. Why might using thresholds (0.10 and 0.90) be useful in practice?

Solution

Let B = Bug exists, N = No bug, E = Error detected, \(\bar{E}\) = No error

Part (a): Sequential Updates

Test 1: Error detected

\[P(B|E) = \frac{(0.70)(0.50)}{(0.70)(0.50) + (0.10)(0.50)} = \frac{0.35}{0.35 + 0.05} = \frac{0.35}{0.40} = 0.875\]

Test 2: No Error (prior = 0.875)

\(P(\bar{E}|B) = 0.30\), \(P(\bar{E}|N) = 0.90\)

\[P(B|\bar{E}) = \frac{(0.30)(0.875)}{(0.30)(0.875) + (0.90)(0.125)} = \frac{0.2625}{0.2625 + 0.1125} = \frac{0.2625}{0.375} = 0.70\]

Test 3: Error detected (prior = 0.70)

\[P(B|E) = \frac{(0.70)(0.70)}{(0.70)(0.70) + (0.10)(0.30)} = \frac{0.49}{0.49 + 0.03} = \frac{0.49}{0.52} \approx 0.9423\]

Summary table:

Test | Observation | Prior | Posterior | Continue? |

|------|————-|--------|———–|-----------| | 1 | Error | 0.500 | 0.875 | Yes | | 2 | No Error | 0.875 | 0.700 | Yes | | 3 | Error | 0.700 | 0.942 | STOP |

Part (b): Stopping Point

Testing stops after Test 3. The posterior (0.942) exceeds the upper threshold of 0.90, indicating high confidence that a bug exists.

Part (c): Alternative First Test — No Error

If the first test showed “No Error”:

\[P(B|\bar{E}) = \frac{(0.30)(0.50)}{(0.30)(0.50) + (0.90)(0.50)} = \frac{0.15}{0.15 + 0.45} = \frac{0.15}{0.60} = 0.25\]

Since 0.25 is between 0.10 and 0.90, testing would continue.

This shows how a single piece of evidence can shift beliefs substantially (from 0.50 to 0.25), but may not be enough to reach a decision threshold.

Part (d): Benefits of Threshold-Based Stopping

  1. Efficiency: Stop testing when confident enough — avoid wasting resources on unnecessary tests

  2. Decision support: Clear criteria for when to act (fix the bug) vs. continue investigating

  3. Risk management: The thresholds reflect acceptable error rates (false positives at 0.90, false negatives at 0.10)

  4. Adaptive: Number of tests depends on evidence — uncertain cases get more tests

  5. Interpretable: Easy to explain decisions to stakeholders (“We’re 94% confident a bug exists”)


4.5.5. Additional Practice Problems

True/False Questions (1 point each)

  1. In Bayesian updating, the posterior probability from one calculation becomes the prior for the next calculation.

    Ⓣ or Ⓕ

  2. Sequential Bayesian updating always requires more calculations than the one-step approach.

    Ⓣ or Ⓕ

  3. If two observers start with different priors but observe the same evidence, their posteriors will always be identical.

    Ⓣ or Ⓕ

  4. With sufficient evidence, Bayesian posteriors tend to converge toward the truth regardless of the initial prior (assuming the prior is not 0 or 1).

    Ⓣ or Ⓕ

  5. If a piece of evidence is equally likely under both hypotheses, observing that evidence will not change the posterior probability.

    Ⓣ or Ⓕ

  6. The order in which evidence is processed affects the final posterior probability in sequential Bayesian updating.

    Ⓣ or Ⓕ

Multiple Choice Questions (2 points each)

  1. A coin is either fair (P(H) = 0.5) or biased (P(H) = 0.8). Initially P(Biased) = 0.25. After observing one heads, what is P(Biased)?

    Ⓐ 0.25

    Ⓑ 0.35

    Ⓒ 0.40

    Ⓓ 0.50

  2. Starting from P(Disease) = 0.01, a positive test with sensitivity 0.95 and specificity 0.90 yields a posterior of about 0.088. If we get a second positive test (same characteristics, independent), the posterior will be:

    Ⓐ About 0.088 (unchanged)

    Ⓑ About 0.18 (roughly doubled)

    Ⓒ About 0.48 (much higher)

    Ⓓ About 0.95 (near certain)

  3. Two scientists with priors P(H) = 0.2 and P(H) = 0.8 observe evidence E where P(E|H) = 0.9 and P(E|H’) = 0.3. After observing E, which statement is true?

    Ⓐ Both posteriors increase

    Ⓑ Both posteriors decrease

    Ⓒ The gap between their posteriors increases

    Ⓓ The gap between their posteriors decreases

  4. In sequential Bayesian updating, what does “conditional independence” of observations mean?

    Ⓐ Each observation is independent of all previous observations

    Ⓑ Observations are independent given the hypothesis (the true state)

    Ⓒ The prior probability doesn’t affect the posterior

    Ⓓ The order of observations matters for the final answer

Answers to Practice Problems

True/False Answers:

  1. True — This is the core mechanism of sequential Bayesian updating: posterior → new prior.

  2. False — Sequential updating often involves simpler calculations at each step, and can be more practical when evidence arrives over time. The total computational effort is similar.

  3. False — Different priors lead to different posteriors, even with the same evidence. However, with enough evidence, the posteriors will converge.

  4. True — This is a fundamental property of Bayesian inference called “convergence.” As evidence accumulates, the data overwhelms the prior.

  5. True — If P(E|H) = P(E|H’), then the likelihood ratio is 1, and the posterior equals the prior. The evidence is “uninformative.”

  6. False — With conditionally independent observations, the order doesn’t matter. The final posterior is the same regardless of the sequence (though intermediate posteriors will differ).

Multiple Choice Answers:

  1. — P(Biased|H) = (0.8)(0.25) / [(0.8)(0.25) + (0.5)(0.75)] = 0.20 / 0.575 ≈ 0.35… wait, let me recalculate.

    P(Biased|H) = (0.8 × 0.25) / (0.8 × 0.25 + 0.5 × 0.75) = 0.20 / (0.20 + 0.375) = 0.20 / 0.575 ≈ 0.348

    The closest answer is Ⓑ 0.35.

  2. — Using the posterior 0.088 as the new prior: P(+|D) = 0.95, P(+|D’) = 0.10. P(D|++) = (0.95 × 0.088) / (0.95 × 0.088 + 0.10 × 0.912) ≈ 0.0836 / 0.175 ≈ 0.48.

  3. — Evidence that favors H (likelihood ratio > 1) will increase both posteriors, but will also bring them closer together. The posteriors move in the same direction but converge.

  4. — Conditional independence means observations are independent given the true state. For example, coin flips are independent given which coin you have, even though they’re not unconditionally independent.