4.5. Sequential Bayesian Updating

Perhaps the most powerful aspect of Bayes’ rule is that it allows us to systematically update our beliefs as new evidence emerges. In this lesson, we’ll explore the mechanics of iterative Bayesian updating by applying it to a real-world problem.

Road Map 🧭

  • Understand how Bayes’ rule enables sequential probability updates.

  • Explore the mechanics of iterative Bayesian updating through a coin flip example.

  • Observe how probabilities converge toward the truth over iterations with sufficient evidence.

4.5.1. Detecting a Biased Coin

Let’s explore a concrete example to illustrate Bayesian updating in action.

Have Your Tools Ready🔧

This lesson is a focused exploration of concepts from Section 4.4. Review Section 4.4 and keep your notes handy as you read.

The Problem

Suppose you have a bag containing 10 coins. One of these coins is biased, with a probability of heads equal to 0.8. The other 9 coins are fair, with a probability of heads equal to 0.5. You reach into the bag, select a coin at random, and flip it 10 times. Each flip results in heads. What is the probability that you selected the biased coin?

The Setup

To solve this problem, we need to define our events:

  • Let \(H_i\) denote the event that the coin showed heads on the \(i\)-th flip.

  • Let \(B\) denote the event that the biased coin was selected.

We know the following initial probabilities:

  • \(P(B) = 1/10\) (probability of selecting the biased coin)

  • \(P(B') = 9/10\) (probability of selecting a fair coin)

  • \(P(H_i|B) = 0.8\) (probability of heads on any flip, given the biased coin)

  • \(P(H_i|B') = 0.5\) (probability of heads on any flip, given a fair coin)

We want to find \(P(B|H_1,H_2,...,H_{10})\), the probability that the selected coin is biased given that all 10 flips resulted in heads. Rather than tackling this all at once, let us update our probability assessment after each flip.

Flip 1 🪙

After the first flip results in heads, we calculate

\[P(B|H_1) = \frac{P(H_1|B) P(B)}{P(H_1|B) P(B) + P(H_1|B') P(B')}\]

Substituting the values,

\[P(B|H_1) = \frac{0.8 \times \frac{1}{10}}{0.8 \times \frac{1}{10} + 0.5 \times \frac{9}{10}} = \frac{0.08}{0.08 + 0.45} = \frac{0.08}{0.53} = \frac{8}{53} \approx 0.151\]

After seeing one heads, the probability that we have the biased coin has increased from 0.1 to about 0.151. This makes intuitive sense—getting heads is more likely with the biased coin, so observing heads should increase our confidence that we have the biased coin. However, the change is modest because heads is a common outcome even with a fair coin.

Flip 2 🪙🪙

For the second flip, we treat the updated probability after the first flip, \(P(B|H_1)\), as the new prior:

\[P(B|H_1,H_2) = \frac{P(H_2|B,H_1) P(B|H_1)}{P(H_2|B,H_1) P(B|H_1) + P(H_2|B',H_1) P(B'|H_1)}\]

A key insight here is that individual flips of the same coin are independent events. The outcome of the first flip doesn’t affect the probability of heads on the second flip. This means:

\[\begin{split}P(H_2|B,H_1) = P(H_2|B) = 0.8 \\ P(H_2|B',H_1) = P(H_2|B') = 0.5\end{split}\]

Using these simplifications,

\[\begin{split}P(B|H_1,H_2) &= \frac{0.8 \times \frac{8}{53}}{0.8 \times \frac{8}{53} + 0.5 \times (1 - \frac{8}{53})} \\ &= \frac{0.8 \times \frac{8}{53}}{0.8 \times \frac{8}{53} + 0.5 \times \frac{45}{53}} \\ &= \frac{64}{289} \approx 0.221\end{split}\]

After the second heads, our confidence that we have the biased coin has increased further to about 0.221.

Flip 3 🪙🪙🪙

For the third flip,

\[P(B|H_1,H_2,H_3) = \frac{P(H_3|B) P(B|H_1,H_2)}{P(H_3|B) P(B|H_1,H_2) + P(H_3|B') P(B'|H_1,H_2)}\]

Substituting the values,

\[\begin{split}P(B|H_1,H_2,H_3) &= \frac{0.8 \times \frac{64}{289}}{0.8 \times \frac{64}{289} + 0.5 \times (1 - \frac{64}{289})} \\ &= \frac{0.8 \times \frac{64}{289}}{0.8 \times \frac{64}{289} + 0.5 \times \frac{225}{289}} \\ &= \frac{512}{1637} \approx 0.313\end{split}\]

The pattern continues for subsequent flips. With each additional heads, we become more confident that we have the biased coin.

Summary of All 10 Updates

Below is a complete table of the posterior probability \(p_k=P(B\mid H_1,\dots,H_k)\) after each successive throw of heads. Values are rounded to three decimals.

Flip \(k\)

Posterior \(p_k\)

0 (prior)

0.100

1

0.151

2

0.221

3

0.313

4

0.421

5

0.538

6

0.651

7

0.749

8

0.827

9

0.884

10

0.924

Notice how evidence accumulates rapidly: after just five heads, the biased coin is already more likely than not, and by the tenth head, the posterior has climbed to about 0.924. If we had observed a tails at any point, our confidence would have decreased instead, since the biased coin is less likely to show tails than a fair coin.

The Convergence of Beliefs

An important property of Bayesian updating is that with sufficient evidence, the posterior probabilities tend to converge toward the truth, regardless of the initial prior (as long as the prior isn’t exactly 0 or 1, which would represent absolute certainty).

This is why Bayesian methods are widely used in scientific and statistical reasoning— even if researchers begin with different prior beliefs, accumulating evidence will eventually lead them toward similar conclusions.

4.5.2. Comparison with One-Step Computation

We could have directly calculated the probability \(P(B|H_1,H_2,...,H_{10})\) by taking the following steps:

Step 1: Apply Bayes’ rule by treating the sequence of ten heads as a single event

\[P(B|H_1,H_2,...,H_{10}) = \frac{P(H_1,H_2,...,H_{10}|B) P(B)}{P(H_1,H_2,...,H_{10}|B) P(B) + P(H_1,H_2,...,H_{10}|B') P(B')}\]

Step 2: Compute the components

With independent flips,

\[P(H_1,H_2,...,H_{10}|B) = P(H_1|B) P(H_2|B) \times ... \times P(H_{10}|B) = (0.8)^{10}\]

And similarly,

\[P(H_1,H_2,...,H_{10}|B') = (0.5)^{10}\]

Step 3: Substitute

\[P(B|H_1,H_2,...,H_{10}) = \frac{(0.8)^{10} \times \frac{1}{10}}{(0.8)^{10} \times \frac{1}{10} + (0.5)^{10} \times \frac{9}{10}} \approx 0.924\]

This direct approach yields the same result as sequential updating. However, the sequential approach has a few advantages:

  1. It shows how our beliefs evolve with each new piece of evidence.

  2. It allows us to stop and make decisions at any point in the sequence.

  3. It more naturally accommodates scenarios where evidence arrives over time.

  4. It often involves simpler calculations at each step.

4.5.3. Bringing It All Together

Key Takeaways 📝

  1. Bayesian updating allows us to revise probability assessments sequentially as new evidence emerges.

  2. The posterior probability from one calculation becomes the prior probability for the next, creating a chain of updates.

  3. With sufficient evidence, probabilities tend to converge toward the truth regardless of initial priors (unless they’re 0 or 1).

  4. Sequential updating reflects how we naturally learn and revise our beliefs based on accumulated experience.

Exercises

  1. You have two bags of marbles. Bag A contains 3 red and 7 blue marbles. Bag B contains 8 red and 2 blue marbles. You select a bag at random (equal probability) and draw a marble that turns out to be red. What is the probability that you selected Bag B? If you replace the marble and draw again, getting another red marble, what is the updated probability?