4.5. Bayesian Updating: Sequential Learning
Perhaps the most powerful aspect of Bayes’ rule is that it allows us to systematically update our beliefs as new evidence emerges. Rather than analyzing all evidence at once, we can process information sequentially, updating our probability assessments step by step. This iterative approach mirrors how humans naturally learn and refine their understanding of the world. In this chapter, we’ll explore the mechanics of Bayesian updating and see how it can be applied to real-world problems.
Road Map 🧭
Understand how Bayes’ rule enables sequential probability updates
Explore the mechanics of Bayesian updating through a coin flip example
Recognize how prior information gets updated with each new observation
Observe how probabilities converge toward the truth with sufficient evidence
Apply these concepts to scenarios where sequential information becomes available
4.5.1. The Power of Bayesian Updating
Bayes’ rule, as we’ve seen in the previous chapter, allows us to calculate the probability of an event A given evidence B:
What makes this formula particularly powerful is that once we’ve calculated P(A|B), this posterior probability can serve as the prior probability for the next calculation when new evidence C arrives:
This sequential updating process reflects how learning typically occurs—we start with some initial beliefs, encounter new evidence, update our beliefs, encounter more evidence, update again, and so on. With each update, our probability assessments become more refined and (ideally) closer to the truth.
The Bayesian updating process follows these steps:
Start with a prior probability P(A)
Observe evidence B
Calculate the posterior probability P(A|B) using Bayes’ rule
Treat P(A|B) as the new prior
Observe new evidence C
Calculate the new posterior P(A|B,C)
Continue this process as more evidence becomes available
4.5.2. Coin Flip Example: Detecting a Biased Coin
Let’s explore a concrete example to illustrate Bayesian updating in action.
Scenario: Suppose you have a bag containing 10 coins. One of these coins is biased, with a probability of heads equal to 0.8. The other 9 coins are fair, with a probability of heads equal to 0.5. You reach into the bag, select a coin at random, and flip it 10 times. Each flip results in heads. What is the probability that you selected the biased coin?
To solve this problem, we need to define our events:
Let Hᵢ denote that the coin showed heads on the ith flip
Let B denote that the biased coin was selected
We know the following initial probabilities:
P(B) = 1/10 (probability of selecting the biased coin)
P(B’) = 9/10 (probability of selecting a fair coin)
P(Hᵢ|B) = 0.8 (probability of heads on any flip, given the biased coin)
P(Hᵢ|B’) = 0.5 (probability of heads on any flip, given a fair coin)
We want to find P(B|H₁,H₂,…,H₁₀), the probability that the selected coin is biased given that all 10 flips resulted in heads. Rather than tackling this all at once, we’ll update our probability assessment after each flip.
Flip 1
After the first flip results in heads, we calculate:
Substituting the values:
After seeing one heads, the probability that we have the biased coin has increased from 0.1 to about 0.151. This makes intuitive sense—getting heads is more likely with the biased coin, so observing heads should increase our confidence that we have the biased coin. However, the change is modest because heads is a common outcome even with a fair coin.
Flip 2
For the second flip, we treat our updated probability as the new prior:
A key insight here is that individual flips of the same coin are independent events. The outcome of the first flip doesn’t affect the probability of heads on the second flip. This means:
Using these simplifications:
After the second heads, our confidence that we have the biased coin has increased further to about 0.221.
Flip 3
For the third flip, we update again:
Substituting the values:
The pattern continues for subsequent flips. With each additional heads, we become more confident that we have the biased coin.
A General Update Formula for Every Flip
We can derive a more elegant recursive update rule for this process. Let’s define:
\(p_k = P(B|{\text{data up to flip }k})\) — our current belief that the coin is biased
\(\text{LR}_H = \frac{P(H|B)}{P(H|B')} = \frac{0.8}{0.5}=1.6\) — the likelihood ratio for heads
\(\text{LR}_T = \frac{P(T|B)}{P(T|B')} = \frac{0.2}{0.5}=0.4\) — the likelihood ratio for tails
After each new flip \(X_k \in \{H,T\}\), the posterior is updated by:
This formula embodies a simple principle: multiply by the likelihood ratio of what you just observed, then renormalize to keep the result between 0 and 1.
Below is a complete table of the posterior probability \(p_k=P(B\mid H_1,\dots,H_k)\) after each successive head when the first 10 flips are all heads. Values are rounded to three decimals.
Flip \(k\) |
Posterior \(p_k\) |
---|---|
0 (prior) |
0.100 |
1 |
0.151 |
2 |
0.221 |
3 |
0.313 |
4 |
0.421 |
5 |
0.538 |
6 |
0.651 |
7 |
0.749 |
8 |
0.827 |
9 |
0.884 |
10 |
0.924 |
Each step uses the update rule:
where 1.6 is the likelihood-ratio \(\text{LR}_H = P(H\mid B)/P(H\mid B') = 0.8/0.5\). Notice how evidence accumulates rapidly: after just five heads the biased coin is already more likely than not, and by the tenth head the posterior has climbed to about 0.924.
Closed Form After h Heads and t Tails
Because flips are conditionally independent given the coin type, we can derive a closed-form expression for any sequence with \(h\) heads and \(t\) tails:
Applying Bayes’ rule gives:
For our all-heads run (\(t=0\)), this reduces to:
This agrees with our sequential computation and confirms that after observing 10 heads in a row, we’re about 92.4% confident that we have the biased coin. This significant shift from our initial 10% probability demonstrates the power of evidence accumulation through Bayesian updating.
Note that if even one tail appears in the sequence, Equation 4.5.2 automatically down-weights the probability of the biased coin because the factor \((5/2)^t\) in the denominator grows rapidly. If even one tail appears, Eq 4.5.2 automatically down-weights the biased coin because the factor \((5/2)^t\) in the denominator grows.
Equation 4.5.1 is ideal when data arrive one at a time, allowing us to update our beliefs incrementally, while Equation 4.5.2 is more convenient when we can tally all heads and tails at once. Both approaches lead to the same posterior probability.

Fig. 4.18 The probability of having the biased coin increases with each observed heads, eventually approaching near certainty.
Note that if we had observed a tails at any point, our confidence would have decreased instead, since the biased coin is less likely to show tails than a fair coin. Using our closed-form expression (Equation 4.5.2), we can easily calculate that if we had observed, for example, 8 heads and 2 tails, the posterior probability would be approximately 0.63, significantly lower than the 0.924 we obtained with all heads.
4.5.3. The Convergence of Beliefs
An important property of Bayesian updating is that with sufficient evidence, the posterior probabilities tend to converge toward the truth, regardless of the initial prior (as long as the prior isn’t exactly 0 or 1, which would represent absolute certainty).
In our coin example, if we had started with a different prior—perhaps believing there was a 50% chance of selecting the biased coin—our final probability after 10 heads would still be very close to 92.4%. The evidence overwhelms the initial belief.
This convergence property is why Bayesian methods are widely used in scientific and statistical reasoning: even if researchers begin with different prior beliefs, accumulating evidence will eventually lead them toward similar conclusions.
4.5.4. Conditional Independence in Sequential Events
In our coin flip example, we made use of a property called conditional independence. The flips are independent of each other given that we know which coin we have. Mathematically, for all i ≠ j:
This means that once we know whether we have the biased or fair coin, knowing the outcome of any previous flip doesn’t give us additional information about the probability of future flips.
Conditional independence is a powerful simplification that often applies in sequential events and greatly simplifies the calculations in Bayesian updating. Without it, we would need to consider complex dependencies between events, making the calculations much more difficult.
4.5.5. Alternative Forms of Bayesian Updating
For mathematically simpler scenarios like our coin example, there are often more direct ways to calculate the final probability than performing each update sequentially.
For instance, we could have directly calculated:
With independent flips:
And similarly:
Substituting:
This direct approach yields the same result as sequential updating. However, the sequential approach has advantages:
It shows how our beliefs evolve with each new piece of evidence
It allows us to stop and make decisions at any point in the sequence
It more naturally accommodates scenarios where evidence arrives over time
It often involves simpler calculations at each step
4.5.6. Applications of Bayesian Updating
Bayesian updating has numerous practical applications:
Medical Diagnosis: Doctors update their assessment of a patient’s condition as test results come in. Each test result provides new evidence that shifts the probability of various diagnoses.
Spam Filtering: Email filters calculate the probability that a message is spam based on the words it contains. As the filter encounters new emails, it updates its probability estimates for different words.
Quality Control: Manufacturers monitor the quality of their products by testing samples. Each test outcome updates their belief about the overall quality of the production batch.
Weather Forecasting: Meteorologists update their precipitation forecasts as new atmospheric data becomes available, refining their predictions as the forecast day approaches.
Financial Markets: Investors update their beliefs about asset values as new economic data, earnings reports, and other information becomes available.
In each case, Bayesian updating provides a rigorous framework for incorporating new evidence and refining probability assessments over time.
4.5.7. Bringing It All Together
Key Takeaways 📝
Bayesian updating allows us to revise probability assessments sequentially as new evidence emerges.
The posterior probability from one calculation becomes the prior probability for the next, creating a chain of updates.
With conditional independence, the calculations simplify considerably for sequential events.
With sufficient evidence, probabilities tend to converge toward the truth regardless of initial priors (unless they’re 0 or 1).
Sequential updating reflects how we naturally learn and revise our beliefs based on accumulated experience.
Bayesian updating has numerous practical applications in fields ranging from medicine to finance to artificial intelligence.
Bayes’ rule and the process of Bayesian updating provide a powerful framework for reasoning under uncertainty. They allow us to start with initial beliefs, incorporate new evidence, and systematically update our probability assessments. This process mirrors the scientific method itself: we form hypotheses, test them against evidence, and refine our understanding based on the results.
In the next chapter, we’ll explore how events can be related (or unrelated) to each other through the concept of independence, which will further enhance our toolkit for probability calculations.
Exercises
Basic Updating: You have two bags of marbles. Bag A contains 3 red and 7 blue marbles. Bag B contains 8 red and 2 blue marbles. You select a bag at random (equal probability) and draw a marble that turns out to be red. What is the probability that you selected Bag B? If you replace the marble and draw again, getting another red marble, what is the updated probability?
Medical Testing Sequence: A rare disease affects 1% of the population. A test for this disease has a sensitivity of 95% and a specificity of 98%. A patient tests positive on the first test. The doctor administers a second, independent test with the same characteristics, and the patient tests positive again. Calculate the probability that the patient has the disease after each test.
Quality Control: A factory produces items with a 5% defect rate. Another factory produces the same items with a 2% defect rate. You receive a shipment of these items but don’t know which factory it came from (both are equally likely). You inspect 5 items and find 1 defect. What is the probability the shipment came from the first factory?
Modified Coin Example: Using the same scenario as the coin flip example in this chapter, what would be the probability of having the biased coin if the sequence of 10 flips included exactly 8 heads and 2 tails? How does this compare to the all-heads scenario?
Challenge Problem: Three meteorologists predict the weather for tomorrow. Meteorologist A is correct 70% of the time, B is correct 80% of the time, and C is correct 65% of the time. All three predict rain for tomorrow. Assuming their predictions are conditionally independent given the actual weather, what is the probability it will rain tomorrow? (Assume a 30% prior probability of rain.)