4.4. Law of Total Probability and Bayes’ Rule

When we make decisions under uncertainty, we often need to revise our probability assessments as new information emerges. Medical diagnoses, legal judgments, and even everyday decisions typically involve updating our beliefs based on partial evidence. In this chapter, we’ll develop the foundational principles of Bayes’ rule, which provides a framework for this fundamental process of learning from evidence.

Road Map 🧭

  • Define partitions of the sample space and derive the law of partitions.

  • Build upon this to establish the law of total probability.

  • Develop Bayes’ rule for inverting conditional probabilities.

4.4.1. Law of Partitions

Preliminary: What is a Partition?

A collection of events \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\) if:

  1. The events are mutually exclusive:

    \[A_i \cap A_j = \emptyset \text{ for all } i \neq j.\]
  2. The events are exhaustive:

    \[A_1 \cup A_2 \cup \cdots \cup A_n = \Omega.\]

In other words, a partition divides the sample space into non-overlapping pieces that, when combined, reconstruct the entire space. You can think of it as slicing a pizza—each slice represents an event in the partition, with no overlap between slices, and all slices together make up the whole pizza.

Given a partition of the sample space, the law of partitions provides a way to calculate the probability of an event by examining how it intersects with each part of the partition.

Note ✏️

The simplest example of a partition consists of just two events: any event A and its complement A’. These two events are

  • mutually exclusive because \(A \cap A' = \emptyset\), and

  • exhaustive because together they cover the entire sample space (\(A \cup A' = \Omega\)).

Law of Partitions: The Statement

If \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\), then for any event \(B \subseteq \Omega\):

\[P(B) = \sum_{i=1}^{n} P(A_i \cap B)\]

What Does It Say?

Visual representation of the law of partitions

Fig. 4.16 Law of partitions

When the partition consists of three events as in Fig. 4.16, the law can be written as

\[P(B) = P(A_1 \cap B) + P(A_2 \cap B) + P(A_3 \cap B).\]

The left-hand side of the equation points to the relative area of the whole blue region, while each term on the right-hand side points to the relative area of a smaller piece created by the overlap of B with one of the events in the partion.

The core message of the law of partitions is quite simple; the probability of the whole equal to the sum of the probabilities of its parts.

4.4.2. Law of Total Probability

The Law of Total Probability takes the Law of Partitions one step further by rewriting the intersection probabilities using the general multiplication rule.

Reminder🔎: the general multiplication rule

For any two events C and D, \(P(C \cap D) = P(C|D) P(D) = P(D|C) P(C).\)

Statement

If \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\), then for any event \(B \subseteq \Omega\),

\[P(B) = \sum_{i=1}^{n} P(B|A_i) P(A_i).\]

What Does It Say?

Let us continue to use the simple three-event partition. In this case, the Law of Total Probability says

Visual representation of the law of total probability
\[P(B) = P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)(PA_3).\]

The law expresses the probability of event B as a weighted average of conditional probabilities. Each weight \(P(A_i)\) represents the probability of a particular part in the sample space, and each conditional probability \(P(B|A_i)\) represents the likelihood of \(B\) given that we’re in that part.

The Law of Total Probability on a Tree diagram

the law of total probability on a tree diagram

Fig. 4.17 Using the Law of Total Probability with a tree diagram

Recall that when constructing a tree diagram, the set of branches extending from the same node must represent all possible outcomes given the preceding path. This requirement is, in fact, another way of saying that the events stemming from a single node must form a partition. As a result, a tree diagram provides an ideal setting for applying the Law of Total Probability.

Computing a single-stage probability \(P(B)\) using the Law of Total Probability is equivalent to

  1. finding the path probabilties of all paths involving \(B\),

  2. then summing the probabilities.

Try writing the steps down in mathematical notation and confirm that they are identical to applying the Law of Total Probability directly.

Example💡: Law of partitions and law of total probability

The tree diagram for the Indianapolis problem

Recall the Indianapolis example from the previous section. What is the probabity that it rains?

\[\begin{split}P(R) &= P(R \cap Sun) + P(R \cap Sat) + P(R \cap Fri) \\ &= P(Sun)P(R|Sun) + P(Sat)P(R|Sat) + P(Fri)P(R|Fri)\\ &= 1/10 + 1/8 + 3/40 \\ &= 0.1 + 0.125 + 0.075 \\ &= 0.3\end{split}\]
  • First equality uses the Law of Partitions. The second equality uses the Law of Total Probability.

4.4.3. Bayes’ Rule

Bayes’ rule allows us to invert conditional probabilities. That is, it allows us to compute \(P(A|B)\) from our knowledge of \(P(B|A).\)

Statement

If \(\{A_1, A_2, \cdots, A_n\}\) forms a partition of the sample space \(\Omega\), and \(B\) is an event with \(P(B) > 0\), then for any \(i=1,2,\cdots,n\),

\[P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_{j=1}^{n} P(B|A_j)P(A_j)}.\]

Derivation of Bayes’ Rule

\[P(A_i|B) = \frac{P(A_i \cap B)}{P(B)} = \frac{P(A_i \cap B)}{\sum_{j=1}^{n} P(B|A_j)P(A_j)} = \frac{P(B|A_i)P(A_i)}{\sum_{j=1}^{n} P(B|A_j)P(A_j)}\]
  • First equality: Definition of conditional probability

  • Secon equality: Law of Total Probability for the denominator

  • Third equality: The general multiplication rule for the numerator

For the simplified case of a three-event partition, Bayes’ rule for \(P(A_1|B)\) is:

\[P(A_1|B) = \frac{P(B|A_1)P(A_1)}{P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)P(A_3)}.\]

Example💡: Bayes’ Rule

The Indianapolis example is continued. Knowing that it didn’t rain on the day they went to Indianapolis, find the probability that it was Friday.

\[P(Fri|R') = \frac{P(Fri \cap R')}{P(R')} = \frac{P(Fri \cap R')}{1 - P(R)} = \frac{11/120}{1 - 0.3} \approx 0.131\]
  • \(P(R')\) can be computed directly using the tree diagram or the Law of Total Probability. However, using the complement rule is more convenient since we already have \(P(R)\) from the previous part.

4.4.4. Understanding the Bayesian Approach to Probability through Bayes’ Rule

Bayes’ rule forms the foundation of the Bayesian approach to probability, which interprets probabilities as degrees of belief that can be updated as new evidence emerges.

Each component of Bayes’ rule has a Bayesian interpretation:

  1. \(P(A_i)\): the prior probability

    Our initial assessment of the probability of event Aᵢ

  2. \(P(B|A_i)\): the likelihood

    The probability of observing a new evidence B given that Aᵢ holds. This measures how consistent the evidence is with \(A_i\).

  3. \(P(A_i|B)\): the posterior probability

    The updated probability of \(A_i\) accounting for the evidence \(B\).

  4. \(P(B)\): the normalizing constant

    The total probability of observin the evidence \(B\) under all possible events in the partition. This ensures the posterior probabilities sum to 1.

As we gather more evidence, we can repeatedly apply Bayes’ rule, using the posterior probability from one calculation as the prior probability for the next. This iterative process allows our probability assessments to continuously improve as we incorporate new information.

Comprehensive Example💡: Medical testing

Consider a disease that affects a small percentage of the population and a diagnostic test used to detect it.

Let’s define:

  • D = “person has the disease”

  • D’ = “person does not have the disease”

  • “+” = “positive test result”

  • “-” = “negative test result”

Given these events, we can identify:

  • P(D): The prevalence of the disease in the population (prior probability)

  • P(+|D): The sensitivity of the test (true positive rate)

  • P(+|D’): The false positive rate (1 - specificity)

What doctors and patients typically want to know is P(D|+), the probability that a person has the disease given a positive test result. This posterior probability can be calculated using Bayes’ rule:

\[P(D|+) = \frac{P(+|D)P(D)}{P(+|D) P(D) + P(+|D')P(D')}\]

Suppose a disease affects 1% of the population, the test has a sensitivity of 95% (P(+|D) = 0.95), and a specificity of 90% (P(+|D’) = 0.1). What is the probability that someone with a positive test result actually has the disease?

\[\begin{split}P(D|+) &= \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.1 \times 0.99} \\ &= \frac{0.0095}{0.0095 + 0.099} \\ &= \frac{0.0095}{0.1085} \\ &\approx 0.0876\end{split}\]

This result is surprising. Despite the test being quite accurate (95% sensitivity, 90% specificity), the probability that a positive test result indicates disease is less than 9%. This is because the disease is rare in the population, so most positive results are false positives.

This example illustrates the importance of considering the base rate (prior probability) when interpreting test results, especially for rare conditions. Even a very accurate test will generate many false positives when applied to a population where the condition is uncommon.

4.4.5. Bringing It All Together

Key Takeaways 📝

  1. The law of partitions decomposes the probability of an event across a partition.

  2. The law of total probability expresses an event’s probability as a weighted average of conditional probabilities.

  3. Bayes’ rule lets us calculate “inverse” conditional probabilities.

  4. Tree diagrams serve as an assisting tool for the three rules above.

  5. Bayes’ rule forms the foundation of the Bayesian approach to probability.

Exercises

  1. Basic Concepts: A box contains 3 fair coins, 2 coins that always show heads, and 1 coin that always shows tails. If a coin is selected at random and flipped twice, showing heads both times, what is the probability that it is a fair coin?

  2. Medical Testing: A test for a certain disease has a sensitivity of 99% (P(+|D) = 0.99) and a specificity of 98% (P(-|D’) = 0.98). If the disease affects 0.5% of the population:

    1. What is the probability that a person with a positive test result has the disease?

    2. What is the probability that a person with a negative test result does not have the disease?

  3. Card Selection: A standard deck of 52 cards is split into two piles, one containing all 13 spades and the other containing the remaining 39 cards. If you select a pile at random and then draw a card from that pile:

    1. Use the law of total probability to find the probability of drawing a king.

    2. If you draw a king, what is the probability that you selected the pile of spades?

  4. Quality Control: A factory has three machines (A, B, and C) that produce 20%, 30%, and 50% of its output, respectively. The defect rates for these machines are 1%, 2%, and 3%. If a product is selected at random and found to be defective:

    1. What is the probability it was produced by machine B?

    2. What is the total percentage of defective products across all machines?

  5. Challenge Problem: Three identical-looking coins are placed in a box. One coin is fair (P(Heads) = 0.5), one is biased with P(Heads) = 0.6, and one is two-headed (P(Heads) = 1). A coin is randomly selected from the box and flipped twice. If both flips result in heads, what is the probability that the selected coin was the fair one?