4.4. Law of Total Probability and Bayes’ Rule

When we make decisions under uncertainty, we often need to revise our probability assessments as new information emerges. Medical diagnoses, legal judgments, and even everyday decisions typically involve updating our beliefs based on partial evidence. In this chapter, we’ll develop the foundational principles of Bayes’ Rule, which provides a framework for this fundamental process of learning from evidence.

Road Map 🧭

  • Define partitions of the sample space and derive the law of partitions.

  • Build upon this to establish the law of total probability.

  • Develop Bayes’ rule for inverting conditional probabilities.

4.4.1. Law of Partitions

What is a Partition?

A collection of events \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\) if the following two conditions are satisfied.

  1. The events are mutually exclusive:

    \[A_i \cap A_j = \emptyset \text{ for all } i \neq j.\]
  2. The events are exhaustive:

    \[A_1 \cup A_2 \cup \cdots \cup A_n = \Omega.\]

In other words, a partition divides the sample space into non-overlapping pieces that, when combined, reconstruct the entire space. You can think of a partition as pizza slices—each slice represents an event, the slices do not overlap, and together they make up a whole pizza.

The law of partitions provides a way to calculate the probability of a new event by examining how it intersects with each part of a partition.

Note ✏️

The simplest example of a partition consists of just two events: any event \(A\) and its complement \(A'\). These two events are

  • mutually exclusive because \(A \cap A' = \emptyset\), and

  • exhaustive because together they cover the entire sample space (\(A \cup A' = \Omega\)).

Law of Partitions

If \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\), then for any event \(B\) in the same sample space:

\[P(B) = \sum_{i=1}^{n} P(A_i \cap B)\]

What Does It Say?

Visual representation of the law of partitions

Fig. 4.17 Law of partitions

Take a partition that consists of three events as in Fig. 4.17. Then, the Law of Partitions can be expanded to

\[P(B) = P(A_1 \cap B) + P(A_2 \cap B) + P(A_3 \cap B).\]

The left-hand side of the equation points to the relative area of the whole blue region, while each term on the right-hand side points to the relative area of a smaller piece created by the overlap of \(B\) with one of the events in the partion.

The core message of the Law of Partitions is quite simple; the probability of the whole is equal to the sum of the probabilities of its parts.

4.4.2. Law of Total Probability

The Law of Total Probability takes the Law of Partitions one step further by rewriting the intersection probabilities using the general multiplication rule.

Reminder🔎: The General Multiplication Rule

For any two events \(C\) and \(D\), \(P(C \cap D) = P(C|D) P(D) = P(D|C) P(C).\)

Statement

If \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\), then for any event \(B \subseteq \Omega\),

\[P(B) = \sum_{i=1}^{n} P(B|A_i) P(A_i).\]

What Does It Say?

Visual representation of the law of total probability

Fig. 4.18 Law of Total Probability

Let us continue to use the simple three-event partition. The Law of Total Probability says

\[P(B) = P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)(PA_3).\]

The Law of Total Probability now expresses the probability of event \(B\) as a weighted average of conditional probabilities. Each weight \(P(A_i)\) represents the probability of a particular part in the sample space, and each conditional probability \(P(B|A_i)\) represents the likelihood of \(B\) given that we are in that part.

The Law of Total Probability on a Tree diagram

Recall that in a tree diagram, the set of branches extending from the same node must represent all possible outcomes given the preceding path. This requirement is, in fact, another way of saying that these branches must form a partition. As a result, a tree diagram provides an ideal setting for applying the Law of Total Probability.

Computing a single-stage probability \(P(B)\) using the Law of Total Probability is equivalent to

  1. finding the path probabilties of all paths involving \(B\),

  2. then summing the probabilities.

Try writing these steps down in mathematical notation and confirm that they are identical to applying the Law of Total Probability directly.

the law of total probability on a tree diagram

Fig. 4.19 Using the Law of Total Probability with a tree diagram

Example💡: The Law of Partitions and the Law of Total Probability

The tree diagram for the Indianapolis problem

Fig. 4.20 Tree diagram for the Indianapolis problem

Recall the Indianapolis example from the previous section. In this problem, what is the probabity that it rains?

\[\begin{split}P(R) &= P(R \cap Sun) + P(R \cap Sat) + P(R \cap Fri) \\ &= P(Sun)P(R|Sun) + P(Sat)P(R|Sat) + P(Fri)P(R|Fri)\\ &= 1/10 + 1/8 + 3/40 \\ &= 0.1 + 0.125 + 0.075 \\ &= 0.3\end{split}\]
  • First equality uses the Law of Partitions. The second equality uses the Law of Total Probability.

  • Confirm that the mathematical steps and the final outcome are identical when the tree diagram is used.

4.4.3. Bayes’ Rule

Bayes’ rule allows us to invert conditional probabilities. That is, it allows us to compute \(P(A|B)\) from our knowledge of \(P(B|A).\)

Statement

If \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\), and \(B\) is an event with \(P(B) > 0\), then for any \(i=1,2,\cdots,n\),

\[P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_{j=1}^{n} P(B|A_j)P(A_j)}.\]

For the simplified case of a three-event partition, Bayes’ rule for \(P(A_1|B)\) is:

\[P(A_1|B) = \frac{P(B|A_1)P(A_1)}{P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)P(A_3)}.\]

Graphically, this equation represents the ratio of the area of the first blue piece (\(A_1 \cap B\)) over the whole area of \(B\) in Fig. 4.21.

Visual representation of the law of total probability

Fig. 4.21 Visual aid for Bayes’ Rule

Derivation of Bayes’ Rule

\[P(A_i|B) = \frac{P(A_i \cap B)}{P(B)} = \frac{P(A_i \cap B)}{\sum_{j=1}^{n} P(B|A_j)P(A_j)} = \frac{P(B|A_i)P(A_i)}{\sum_{j=1}^{n} P(B|A_j)P(A_j)}\]
  • First equality: definition of conditional probability

  • Second equality: Law of Total Probability for the denominator

  • Third equality: the general multiplication rule for the numerator

Example💡: Bayes’ Rule

The Indianapolis example is continued. Knowing that it didn’t rain on the day Glen and Jia went to Indianapolis, find the probability that this was Friday.

\[P(Fri|R') = \frac{P(Fri \cap R')}{P(R')} = \frac{P(Fri \cap R')}{1 - P(R)} = \frac{11/120}{1 - 0.3} \approx 0.131\]
  • \(P(R')\) can be computed directly using the tree diagram or the Law of Total Probability. However, using the complement rule is more convenient since we already have \(P(R)\) from the previous part.

4.4.4. Understanding the Bayesian Approach to Probability through Bayes’ Rule

Bayes’ rule forms the foundation of the Bayesian approach to probability, which interprets probabilities as degrees of belief that can be updated as new evidence emerges.

Each component of Bayes’ rule has a Bayesian interpretation:

  1. \(P(A_i)\): the prior probability

    The initial assessment of the probability of event \(A_1\)

  2. \(P(B|A_i)\): the likelihood

    The probability of observing a new evidence \(B\) given that \(A_1\) holds. This measures how consistent the evidence is with \(A_i\).

  3. \(P(A_i|B)\): the posterior probability

    The updated probability of \(A_i\) accounting for the evidence \(B\).

  4. \(P(B)\): the normalizing constant

    Once the evidence \(B\) is observed, the sample space shrinks to only the region that would have made \(B\) possible. Computing a posterior probability involves a step where we divide probabilities by \(P(B)\), the size of a new whole (see the second step of deriving Bayes rule).

As we gather more evidence, we can repeatedly apply Bayes’ rule, using the posterior probability from one calculation as the prior probability for the next. This iterative process allows our probability assessments to continuously improve as we incorporate new information.

Comprehensive Example💡: Medical Testing

Consider a disease that affects a small percentage of the population and a diagnostic test used to detect it.

  • Let \(D\) be the event that a person has the disease.

  • Let \(+\): be the event that the test gives a positive result.

  • Define \(D'\) and \(-\) as the complements of \(D\) and \(+\), respectively

Given these events, we can identify:

  • \(P(D)\): The prevalence of the disease in the population (prior probability)

  • \(P(+|D)\): The sensitivity of the test (true positive rate)

  • \(P(+|D')\): The false positive rate (1 - specificity)

What doctors and patients typically want to know is \(P(D|+)\), the probability that a person has the disease given a positive test result. This posterior probability can be calculated using Bayes’ rule:

\[P(D|+) = \frac{P(+|D)P(D)}{P(+|D) P(D) + P(+|D')P(D')}\]

Suppose a disease affects 1% of the population, the test has a sensitivity of 95%, and a specificity of 90%. What is the probability that someone with a positive test result actually has the disease?

Step 1: Write the building blocks in mathematical notation

  • \(P(D) = 0.01\)

  • \(P(+|D) = 0.95\)

  • \(P(+|D') = 1-P(-|D') = 1 - 0.9 = 0.1\)

Step 2: Compute the posterior probability

\[\begin{split}P(D|+) &= \frac{(0.95)(0.01)}{(0.95)(0.01) + (0.1)(0.99)} \\ &= \frac{0.0095}{0.0095 + 0.099} \\ &= \frac{0.0095}{0.1085} \\ &\approx 0.0876\end{split}\]

Despite the test being quite accurate (95% sensitivity, 90% specificity), the probability that a positive result indicates disease is less than 9%. This illustrates the importance of considering the base rate (prior probability) when interpreting test results, especially for rare conditions. Even a very accurate test will generate many false positives when applied to a population where the condition is uncommon.

  • Also try solving this problem using a tree diagram, and confirm that the results are consistent.

4.4.5. Bringing It All Together

Key Takeaways 📝

  1. The Law of Partitions decomposes the probability of an event across a partition.

  2. The Law of Total Probability expresses an event’s probability as a weighted average of conditional probabilities.

  3. Bayes’ rule lets us calculate “inverse” conditional probabilities.

  4. Tree diagrams serve as an assisting tool for the three rules above.

  5. Bayes’ rule forms the foundation of the Bayesian approach to probability.

Exercises

  1. Basic Concepts: A box contains 3 fair coins, 2 coins that always show heads, and 1 coin that always shows tails. If a coin is selected at random and flipped twice, showing heads both times, what is the probability that it is a fair coin?

  2. Medical Testing: A test for a certain disease has a sensitivity of 99% and a specificity of 98%. If the disease affects 0.5% of the population,

    1. What is the probability that a person with a positive test result has the disease?

    2. What is the probability that a person with a negative test result does not have the disease?

  3. Card Selection: A standard deck of 52 cards is split into two piles, one containing all 13 spades and the other containing the remaining 39 cards. If you select a pile at random and then draw a card from that pile,

    1. Use the Law of Total Probability to find the probability of drawing a king.

    2. If you draw a king, what is the probability that you selected the pile of spades?

  4. Quality Control: A factory has three machines (A, B, and C) that produce 20%, 30%, and 50% of its output, respectively. The defect rates for these machines are 1%, 2%, and 3%. If a product is selected at random and found to be defective,

    1. What is the probability it was produced by machine B?

    2. What is the total percentage of defective products across all machines?

  5. Challenge Problem: Three identical-looking coins are placed in a box. One coin is fair (\(P(Heads) = 0.5\)), one is biased with \(P(Heads) = 0.6\), and one is two-headed (\(P(Heads) = 1\)). A coin is randomly selected from the box and flipped twice. If both flips result in heads, what is the probability that the selected coin was the fair one?