Worksheet 3: Conditional Probability and Bayes’ Theorem

Learning Objectives 🎯

Understand conditional probability and its notation
Create and interpret tree diagrams for multi-stage experiments
Apply the multiplication rule and law of total probability
Use Bayes’ theorem to update probabilities with new information
Implement probability calculations in R

Introduction

Conditional probability allows us to understand how the likelihood of one event changes when we have additional information about another event. It does not assume one event causes another. Instead, it refines our understanding of probabilities by narrowing down the possibilities based on what we know.

Examples:

If you know it is cloudy, the likelihood of rain may increase because clouds are often associated with rain
If you know a card drawn from a standard 52 card deck is red, it impacts the probability that the card is a heart

For example, imagine you are selecting a gummy bear from one of several jars. The probability of selecting a red gummy depends on which jar you chose. If you know the jar you picked, you can narrow down the probability of getting a red gummy. Conditional probability helps us formalize this reasoning.

We write the conditional probability of drawing a red gummy from the i-th jar as:

\[P(\text{Red Gummy}|\text{Jar}_i)\]

Here, the vertical bar | means “given that.” It indicates that we are finding the probability of the event drawing a Red Gummy, given that the jar selected is Jar_i.

Part 1: Understanding Conditional Probability

Question 1: To understand the concept of conditional probability, consider three jars filled with gummy bears:

Jar₁ contains 30 red, 10 green, and 10 blue gummies
Jar₂ contains 20 red and 40 green gummies
Jar₃ contains 35 yellow gummies

Answer the following questions using formal probability statements and clearly show your work:

Suppose you were handed Jar₃ and you randomly sample a single gummy bear. Compute the probability that the gummy bear you sampled would be red.
Suppose instead you were handed Jar₂ and you randomly sample a single gummy bear. Compute the probability that the gummy bear you sampled would be red.
Suppose you were handed Jar₁ and you randomly sample a single gummy bear. Compute the probability that the gummy bear you sampled would be either red or blue.
Instead suppose you were not allowed to see the contents of the selected jar but are aware of the distribution of colors in each jar. You randomly sample one gummy, and it is yellow. Determine the probability that the gummy came from each of the three jars (Jar₁, Jar₂, or Jar₃).

R Code for Visualization:

# Define jar contents
jar1 <- c(red = 30, green = 10, blue = 10)
jar2 <- c(red = 20, green = 40)
jar3 <- c(yellow = 35)

# Create a visualization of jar contents
library(ggplot2)

# Prepare data for plotting
jar_data <- data.frame(
  jar = c(rep("Jar 1", 3), rep("Jar 2", 2), "Jar 3"),
  color = c(names(jar1), names(jar2), names(jar3)),
  count = c(jar1, jar2, jar3)
)

# Create bar plot
ggplot(jar_data, aes(x = jar, y = count, fill = color)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(values = c("red" = "red", "green" = "green",
                               "blue" = "blue", "yellow" = "gold")) +
  labs(title = "Gummy Bear Distribution in Jars",
       x = "Jar", y = "Number of Gummies") +
  theme_minimal()

Part 2: Tree Diagrams and Sequential Sampling

A tree diagram is a visual representation of the general multiplication rule. It illustrates all possible outcomes of a sequence of events, where each branch of the tree represents a possible event, and probabilities are assigned to these branches. By following the branches of the tree, we can calculate the probabilities of different outcomes by multiplying the conditional probabilities along the paths.

Tree diagrams also rely on the principles of mutual exclusivity and exhaustiveness:

Mutual exclusivity ensures that each complete path through the tree represents a distinct and non-overlapping outcome
Exhaustiveness ensures that all possible outcomes are represented exactly once in the tree, so the probabilities of the mutually exclusive outcomes add up to 1

Together, these two properties ensure that the tree diagram forms a partition of the sample space, dividing it into distinct and complete subsets of outcomes.

Question 2: Considering the same jars of gummy bears, suppose you randomly select one jar, with each jar being “equally likely” to be chosen:

\[P(\text{Jar}_1) = P(\text{Jar}_2) = P(\text{Jar}_3) = \frac{1}{3}\]

Next you sample two gummies from the selected jar without replacement.

Create a tree diagram to represent all possible outcomes of sampling two gummies. The tree diagram should convey all probabilities clearly, showing:
- Unconditional probabilities (e.g., the probability of selecting each jar)
- Conditional probabilities (e.g., the probabilities of drawing specific gummy colors on the first and second draws, accounting for the reduced contents of the jar after the first draw)
- Intersection probabilities (e.g., the probabilities at the end of each path, representing the probability of a specific sequence of events occurring)

R Visualization Exercise 🖥️

Creating a Probability Tree Diagram with R and AI Assistance

After completing the tree diagram by hand, use R to create a visualization. Follow these prompting strategies with your AI assistant:

Step 1: Initial Discovery Prompt

“I need to create a probability tree diagram in R for sampling gummies from jars. I have 3 jars: Jar₁ (30 red, 10 green, 10 blue), Jar₂ (20 red, 40 green), and Jar₃ (35 yellow). I randomly select a jar (P=1/3 each), then draw 2 gummies without replacement. What R packages can create tree diagrams with probability labels?”

Step 2: Comparing Package Options

“Can you compare different packages for my probability tree? I need to show conditional probabilities like 30/50 for red from Jar₁, then 29/49 for a second red. Which package handles fractions and decimal labels best?”

Step 3: Implementation with Your Data

“Help me create the tree diagram using [chosen package]. Starting from ‘Start’, I need 3 branches to Jar₁, Jar₂, and Jar₃ (each P=1/3). From Jar₁, I need branches for R (30/50), G (10/50), and B (10/50). Then show the second draw probabilities. How do I structure this data?”

Step 4: Customization

“How can I color-code my tree: red nodes for R, green for G, blue for B, yellow for Y? I also want the jar nodes in gray and clear probability labels on each edge. Can we make terminal nodes show outcomes like ‘RR’, ‘RG’, etc.?”

Logically explain why your tree diagram satisfies both mutual exclusivity (each complete path represents a distinct, non-overlapping outcome) and exhaustiveness (all possible outcomes of the experiment are included in the diagram, and their probabilities sum to 1).
Using your tree diagram, compute the following probabilities (try to maintain mathematical formalism):
1. The probability of drawing two yellow gummies
2. The probability of drawing two blue gummies
3. The probability of drawing two green gummies given that you know the samples came from Jar₁
4. The probability of drawing two green gummies given that you know the samples came from Jar₂
5. Use the law of total probability to determine the probability of sampling two green gummies and relate it to the paths of the tree diagram

Part 3: Bayes’ Theorem and Sequential Updating

Bayes’ Theorem provides a way to compute conditional probabilities when directly calculating them is difficult. In particular, it allows us to express a probability in terms of its reverse conditional, which is often easier to determine. It helps answer questions like:

“Given an observed outcome, how should we determine the probability of an event that may have led to it?”

Beyond this basic use, Bayes’ Theorem also provides a framework for updating probabilities when additional evidence is observed, refining our estimates step by step.

“Given an initial observation, how should we revise our probabilities after receiving additional evidence?”

The first application focuses on computing a conditional probability when its direct computation is complex, while the second extends this concept to sequential updating, where each new observation refines our understanding of the underlying probabilities.

Question 3: To understand the concept of Bayes formula, consider the same three jars filled with gummy bears:

Jar₁ contains 30 red, 10 green, and 10 blue gummies
Jar₂ contains 20 red and 40 green gummies
Jar₃ contains 35 yellow gummies

Suppose you randomly select one jar, with each jar being “equally likely” to be chosen. Next, you draw one gummy bear without looking at the contents of the jar and observe that it is red.

Compute the probability that the red gummy bear came from each jar (Jar₁, Jar₂, or Jar₃).
Now assume you draw a second gummy bear from the same jar, and it is also red. Compute the updated probabilities that the jar you sampled from was from Jar₁, Jar₂, or Jar₃. Your tree diagram may be helpful in answering this question.

Key Takeaways

Summary 📝

Conditional probability P(A|B) represents the probability of A given that B has occurred
Tree diagrams visualize multi-stage experiments and help calculate complex probabilities
The multiplication rule states that P(A ∩ B) = P(A) × P(B|A)
The law of total probability allows us to find P(B) by summing over all possible ways B can occur
Bayes’ theorem provides a way to reverse conditional probabilities: P(A|B) = P(B|A) × P(A) / P(B)
Bayes’ theorem can be applied sequentially to update beliefs as new evidence arrives