Worksheet 2: Set Theory and Probability Fundamentals
Learning Objectives 🎯
Master fundamental set theory notation and operations
Understand the axiomatic foundation of probability
Apply set operations to calculate probabilities
Use the complement rule and inclusion-exclusion principle
Visualize complex probability problems using Venn diagrams
Implement set operations and probability calculations in R
Introduction
Probability theory is built on the foundation of set theory. Before we can rigorously discuss probabilities, we must understand how to work with sets—collections of objects that form the building blocks of probabilistic reasoning. This worksheet introduces essential set notation and operations, then connects these concepts to the fundamental axioms of probability.
Part 1: Set Theory Foundations
Set theory provides a precise mathematical language for describing collections of objects. Here are key symbols and their meanings:
- ∈ (element of)
Denotes membership in a set. For example, \(x \in E\) means \(x\) belongs to set \(E\).
- ℤ (integers)
The set of all integers: \(\{..., -2, -1, 0, 1, 2, ...\}\)
- ℕ (natural numbers)
The set of positive integers: \(\{1, 2, 3, ...\}\)
- ∩ (intersection)
Binary operator representing elements in both sets. Associated with “and” in English.
- ∪ (union)
Binary operator representing elements in either set (or both). Associated with “or” in English.
Set-Builder Notation
Set-builder notation defines sets by specifying properties their elements must satisfy. For example:
This defines \(E\) as the set of all positive integers \(x\) that are both even and at most 10.
R Implementation of Sets
# In R, we can represent sets as vectors
# Example: E = {x ∈ Z+ | x is even and x ≤ 10}
E <- seq(2, 10, by = 2)
print(E)
# Set operations in R
# Union: union(A, B)
# Intersection: intersect(A, B)
# Set difference: setdiff(A, B)
# Check membership: x %in% A
Question 1: Let \(A\) and \(B\) be sets defined as follows:
Further consider the sets \(C = A \cap B\) and \(D = A \cup B\).
Write out the expanded set of elements of \(A\), then separately write out the expanded set of elements for \(B\).
# Define set A
A <- -5:5
# Define set B
B <- seq(2, 10, by = 2)
# Print the sets
print(paste("A =", toString(A)))
print(paste("B =", toString(B)))
Determine the elements contained in \(C\), and separately determine the elements contained in \(D\).
# Calculate C = A ∩ B
C <- intersect(A, B)
# Calculate D = A ∪ B
D <- union(A, B)
# Print results
print(paste("C = A ∩ B =", toString(C)))
print(paste("D = A ∪ B =", toString(sort(D))))
Using set-builder notation, express the set \(C\).
Formally describe the set \(D\) in terms of the sets \(A\) and \(B\), combining English and the ‘element of’ (∈) symbol.
Part 2: Probability Axioms
Probability is a function \(P(\cdot)\) that takes a set (or event) \(E\) as input and outputs a real number \(p\) in the interval \([0, 1]\).
The Fundamental Axioms of Probability:
Non-negativity: For any event \(E\), \(0 \leq P(E) \leq 1\).
Normalization: \(P(\Omega) = 1\), where \(\Omega\) denotes the entire sample space.
Additivity: For any event \(E\), \(P(E) = \sum_{\omega \in E} P(\omega)\). We add up the probabilities of all simple events in \(E\).
Empty Set: It follows that \(P(\emptyset) = 0\), where \(\emptyset\) denotes the empty set.
R Demonstration: Verifying Probability Axioms
# Example: Rolling a fair die
# Sample space
omega <- 1:6
# Probability function for fair die
prob_fair_die <- function(event) {
if (length(event) == 0) return(0) # Empty set
valid_outcomes <- sum(event %in% omega)
return(valid_outcomes / length(omega))
}
# Verify axioms
# Axiom 1: 0 ≤ P(E) ≤ 1
event1 <- c(1, 2, 3)
print(paste("P({1,2,3}) =", prob_fair_die(event1)))
# Axiom 2: P(Ω) = 1
print(paste("P(Ω) =", prob_fair_die(omega)))
# Axiom 4: P(∅) = 0
print(paste("P(∅) =", prob_fair_die(c())))
Question 2: Using these axioms, answer the following questions:
What does it mean for \(P\) to be a function that operates on sets rather than directly on elements of the sample space or numerical values? Why must the input to \(P(\cdot)\) always be a set?
Explain why the following statement is not a valid probability expression: \(P(A) \cap P(B) \cap P(C)\).
If \(A \subset B\), use axiom 3 to justify why \(P(A) < P(B)\).
The complement of a set \(E\), denoted \(E'\), is defined as \(E' = \{\omega \in \Omega \mid \omega \notin E\}\). Using axiom 2 and axiom 3, derive the complement rule \(P(E') = 1 - P(E)\).
# Demonstrate complement rule
E <- c(1, 2, 3)
E_complement <- setdiff(omega, E)
print(paste("E =", toString(E)))
print(paste("E' =", toString(E_complement)))
print(paste("P(E) =", prob_fair_die(E)))
print(paste("P(E') =", prob_fair_die(E_complement)))
print(paste("P(E) + P(E') =", prob_fair_die(E) + prob_fair_die(E_complement)))
Part 3: Applying Probability Rules
Why Formality and Intermediate Steps Matter
Writing probability statements explicitly and showing intermediate steps ensures:
Clarity: Identifies the correct rules and logic to apply
Accuracy: Reduces errors, especially in multi-step calculations
Preparation for Complexity: Builds habits needed for advanced problems
Communication Skills: Clear steps improve ability to explain and justify work
Question 3: Let \(E_1\) and \(E_2\) be two events of a sample space \(\Omega\), with known probabilities:
Calculate the following probabilities. Write out probability statements explicitly before performing calculations and include all intermediate steps.
R Helper Functions
# Function to calculate P(A ∩ B) given P(A), P(B), and P(A ∪ B)
prob_intersection <- function(p_a, p_b, p_union) {
# Using: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
return(p_a + p_b - p_union)
}
# Given values
p_e1 <- 0.3
p_e2 <- 0.6
p_union <- 0.75
# Calculate P(E1 ∩ E2)
p_intersection_e1_e2 <- prob_intersection(p_e1, p_e2, p_union)
print(paste("P(E1 ∩ E2) =", p_intersection_e1_e2))
Calculate the probability that both \(E_1'\) and \(E_2'\) occur simultaneously.
Calculate the probability that both \(E_1\) and \(E_2\) occur simultaneously.
Calculate the probability that both \(E_1'\) and \(E_2\) occur simultaneously.
Part 4: The Inclusion-Exclusion Principle
Question 4: A festival raffle has a total of \(N\) tickets, divided into the following categories of winners:
\(|A| = 40\): Tickets that win electronics
\(|B| = 30\): Tickets that win gift cards
\(|C| = 20\): Tickets that win home appliances
\(|A \cap B| = 10\): Tickets that win both electronics and gift cards
\(|A \cap C| = 5\): Tickets that win both electronics and home appliances
\(|B \cap C| = 3\): Tickets that win both gift cards and home appliances
\(|A \cap B \cap C| = 2\): Tickets that win in all three categories
The remaining 432 tickets do not win any prizes
Venn Diagram Template
Fill in each region of the Venn diagram below with the number of tickets:
R Visualization Exercise 🖥️
Creating a Venn Diagram with R and AI Assistance
Use your favorite AI assistant (ChatGPT, Claude, etc.) to help you create a Venn diagram visualization in R. Follow these prompting strategies:
Step 1: Initial Setup Prompt
“I need to create a Venn diagram in R for a probability problem. I have three sets A, B, and C with the following properties: |A|=40, |B|=30, |C|=20, |A∩B|=10, |A∩C|=5, |B∩C|=3, |A∩B∩C|=2. What R package would you recommend for creating Venn diagrams, and how do I install it?”
Step 2: Understanding the Package
“Can you explain how the ggVennDiagram package works? What format does it expect the data in? I need to represent sets with specific intersection counts.”
Step 3: Creating the Diagram
“Help me create lists/vectors in R that will produce a Venn diagram with exactly these intersection counts. I want the diagram to show the actual numbers in each region.”
Step 4: Customization
“How can I customize the colors, labels, and title of my Venn diagram? I want Electronics in red, Gift Cards in blue, and Home Appliances in green.”
Verification Questions to Ask Your AI:
“How can I verify that my lists produce the correct intersection counts?”
“What’s the difference between the total count |A| and the exclusive ‘only A’ region?”
“Can you show me how to use R’s intersect() function to check my work?”
Learning Goals:
Through this exercise, you should understand: - How Venn diagram packages represent overlapping sets - The relationship between set notation and R’s list/vector structures - How to verify your mathematical calculations using R functions
Determine \(N\): Using the inclusion-exclusion principle and additional knowledge, calculate the total number of tickets \(N\).
The inclusion-exclusion principle for three sets states:
After determining \(N\), calculate the following probabilities:
The probability of randomly selecting a ticket that wins in exactly one category.
Space for student answerThe probability of randomly selecting a ticket that wins in at least two categories.
Space for student answerThe probability of randomly selecting a ticket that wins in exactly two categories.
Space for student answerThe probability of randomly selecting a ticket that does not win electronics and does not win any gift cards.
Space for student answer
Simulation Exercise
Verify your probability calculations through simulation:
# Simulate the raffle
simulate_raffle <- function(n_sim = 10000) {
# Create the population of tickets
tickets <- c(
rep("A", only_A),
rep("B", only_B),
rep("C", only_C),
rep("AB", only_AB),
rep("AC", only_AC),
rep("BC", only_BC),
rep("ABC", all_three),
rep("None", n_none)
)
# Simulate draws
draws <- sample(tickets, n_sim, replace = TRUE)
# Calculate empirical probabilities
cat("\nSimulation Results (", n_sim, "draws):\n")
cat("P(exactly one) ≈", mean(draws %in% c("A", "B", "C")), "\n")
cat("P(at least two) ≈", mean(draws %in% c("AB", "AC", "BC", "ABC")), "\n")
cat("P(exactly two) ≈", mean(draws %in% c("AB", "AC", "BC")), "\n")
cat("P(not A and not B) ≈", mean(draws %in% c("C", "None")), "\n")
}
# Run simulation after calculating N
simulate_raffle()
Key Takeaways
Summary 📝
Set theory provides the mathematical foundation for probability
Probability is a function that maps sets (events) to numbers in [0,1]
The axioms of probability ensure consistency and allow derivation of rules
The complement rule and inclusion-exclusion principle are powerful tools
Venn diagrams help visualize complex probability relationships
R provides practical tools for implementing set operations and verifying probability calculations
Submission Guidelines
Show all work and intermediate steps
Use proper mathematical notation
Write probability statements explicitly before calculating
Double-check that all probabilities are between 0 and 1
Fill in all Venn diagram regions clearly