Worksheet 2: Set Theory and Probability Fundamentals

Learning Objectives 🎯

Master fundamental set theory notation and operations
Understand the axiomatic foundation of probability
Apply set operations to calculate probabilities
Use the complement rule and inclusion-exclusion principle
Visualize complex probability problems using Venn diagrams
Implement set operations and probability calculations in R

Introduction

Probability theory is built on the foundation of set theory. Before we can rigorously discuss probabilities, we must understand how to work with sets—collections of objects that form the building blocks of probabilistic reasoning. This worksheet introduces essential set notation and operations, then connects these concepts to the fundamental axioms of probability.

Part 1: Set Theory Foundations

Set theory provides a precise mathematical language for describing collections of objects. Here are key symbols and their meanings:

∈ (element of): Denotes membership in a set. For example, \(x \in E\) means \(x\) belongs to set \(E\).
ℤ (integers): The set of all integers: \(\{..., -2, -1, 0, 1, 2, ...\}\)
ℕ (natural numbers): The set of positive integers: \(\{1, 2, 3, ...\}\)
∩ (intersection): Binary operator representing elements in both sets. Associated with “and” in English.
∪ (union): Binary operator representing elements in either set (or both). Associated with “or” in English.

Set-Builder Notation

Set-builder notation defines sets by specifying properties their elements must satisfy. For example:

\[E = \{x \in \mathbb{Z}^+ \mid x \text{ is even and } x \leq 10\}\]

This defines \(E\) as the set of all positive integers \(x\) that are both even and at most 10.

R Implementation of Sets

# In R, we can represent sets as vectors
# Example: E = {x ∈ Z+ | x is even and x ≤ 10}
E <- seq(2, 10, by = 2)
print(E)

# Set operations in R
# Union: union(A, B)
# Intersection: intersect(A, B)
# Set difference: setdiff(A, B)
# Check membership: x %in% A

Question 1: Let \(A\) and \(B\) be sets defined as follows:

\[ \begin{align}\begin{aligned}A = \{x \in \mathbb{Z} \mid -5 \leq x \leq 5\}\\B = \{x \in \mathbb{N} \mid x \text{ is even and } x \leq 10\}\end{aligned}\end{align} \]

Further consider the sets \(C = A \cap B\) and \(D = A \cup B\).

a) Write out the expanded set of elements of \(A\), then separately write out the expanded set of elements for \(B\). Answer this question by hand and then use the R code below to confirm your answer.

# Define set A using the shorthand sequence notation when we want a seuqence of numbers that increment by 1.
A <- -5:5

# Define set B using the seq() function
B <- seq(from = 2, to = 10, by = 2)

# Print the sets
print(paste("A =", toString(A)))
print(paste("B =", toString(B)))

b) Determine the elements contained in \(C\), and separately determine the elements contained in \(D\). Answer this question by hand and then use the R code below to confirm your answer.

# Calculate C = A ∩ B see help(intersect)
C <- intersect(A, B)

# Calculate D = A ∪ B see help(union)
D <- union(A, B)

# Print results
print(paste("C = (A ∩ B) = {", toString(C), "}"))
print(paste("D = (A ∪ B) = {", toString(sort(D)), "}"))

Using set-builder notation, express the set \(C\).
Formally describe the set \(D\) in terms of the sets \(A\) and \(B\), combining English and the ‘element of’ (∈) symbol.

Part 2: Probability Axioms

Probability is a function \(P(\cdot)\) that takes a set (or event) \(E\) as input and outputs a real number \(p\) in the interval \([0, 1]\).

Kolmogorov Axioms-The Fundamental Axioms of Probability (for discrete sample spaces):

Non-negativity: For any event \(E\), \(P(E) \geq 0\).
Unitarity: \(P(\Omega) = 1\), where \(\Omega\) denotes the entire sample space.
Additivity: For any event \(E\), \(P(E) = \sum_{\omega \in E} P(\omega)\).

Properties Derived from the Axioms:

Empty set has probability zero: \(P(\emptyset) = 0\) (empty sum equals zero).

Bounded above: For any event \(E\), \(P(E) \leq 1\)

Proof: Since \(E \subseteq \Omega\), we have \(P(E) = \sum_{\omega \in E} P(\omega) \leq \sum_{\omega \in \Omega} P(\omega) = P(\Omega) = 1\).

Question 2: Using these axioms, answer the following questions:

What does it mean for \(P\) to be a function that operates on sets rather than directly on elements of the sample space or numerical values? Why must the input to \(P(\cdot)\) always be a set?
Explain why the following statement is not a valid probability expression: \(P(A) \cap P(B) \cap P(C)\).
If \(A \subset B\), use axiom 3 to justify why \(P(A) < P(B)\).
The complement of a set \(E\), denoted \(E'\), is defined as \(E' = \{\omega \in \Omega \mid \omega \notin E\}\). Using axiom 2 and axiom 3, derive the complement rule \(P(E') = 1 - P(E)\).

Part 3: Applying Probability Rules

Why Formality and Intermediate Steps Matter

Writing probability statements explicitly and showing intermediate steps ensures:

Clarity: Identifies the correct rules and logic to apply
Accuracy: Reduces errors, especially in multi-step calculations
Preparation for Complexity: Builds habits needed for advanced problems
Communication Skills: Clear steps improve ability to explain and justify work

Question 3: Let \(E_1\) and \(E_2\) be two events of a sample space \(\Omega\), with known probabilities:

\[P(E_1) = 0.3 \quad P(E_2) = 0.6 \quad P(E_1 \cup E_2) = 0.75\]

Calculate the following probabilities. Write out probability statements explicitly before performing calculations and include all intermediate steps.

Calculate the probability that both \(E_1'\) and \(E_2'\) occur simultaneously.
Calculate the probability that both \(E_1\) and \(E_2\) occur simultaneously.
Calculate the probability that both \(E_1'\) and \(E_2\) occur simultaneously.

Part 4: The Inclusion-Exclusion Principle

Question 4: A festival raffle has a total of \(N\) tickets, divided into the following categories of winners:

\(|A| = 40\): Tickets that win electronics
\(|B| = 30\): Tickets that win gift cards
\(|C| = 20\): Tickets that win home appliances
\(|A \cap B| = 10\): Tickets that win both electronics and gift cards
\(|A \cap C| = 5\): Tickets that win both electronics and home appliances
\(|B \cap C| = 3\): Tickets that win both gift cards and home appliances
\(|A \cap B \cap C| = 2\): Tickets that win in all three categories
The remaining 432 tickets do not win any prizes

Venn Diagram Template

Fill in each region of the Venn diagram below with the number of tickets:

After completing the above Venn Diagram by hand, use R to confirm your results:

R Visualization Exercise 🖥️

Creating a Venn Diagram with R and AI Assistance

Use your favorite AI assistant (ChatGPT, Claude, etc.) to help you create a Venn diagram visualization in R. Follow these prompting strategies:

Step 1: Initial Setup Prompt

“I need to create a Venn diagram in R for a probability problem. I have three sets A, B, and C with the following properties: |A|=40, |B|=30, |C|=20, |A∩B|=10, |A∩C|=5, |B∩C|=3, |A∩B∩C|=2. What R package would you recommend for creating Venn diagrams, and how do I install it?”

Step 2: Understanding the Package

“Can you explain how the ggVennDiagram package works? What format does it expect the data in? I need to represent sets with specific intersection counts.”

Step 3: Creating the Diagram

“Help me create lists/vectors in R that will produce a Venn diagram with exactly these intersection counts. I want the diagram to show the actual numbers in each region.”

Step 4: Customization

“How can I customize the colors, labels, and title of my Venn diagram? I want Electronics in red, Gift Cards in blue, and Home Appliances in green.”

Verification Questions to Ask Your AI:

“How can I verify that my lists produce the correct intersection counts?”
“What’s the difference between the total count |A| and the exclusive ‘only A’ region?”
“Can you show me how to use R’s intersect() function to check my work?”

Learning Goals:

Through this exercise, you should understand: - How Venn diagram packages represent overlapping sets - The relationship between set notation and R’s list/vector structures - How to verify your mathematical calculations using R functions

Determine \(N\): Using the inclusion-exclusion principle and additional knowledge, calculate the total number of tickets \(N\).

The inclusion-exclusion principle for three sets states:

\[|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|\]

After determining \(N\), calculate the following probabilities:
1. The probability of randomly selecting a ticket that wins in exactly one category.
2. The probability of randomly selecting a ticket that wins in at least two categories.
3. The probability of randomly selecting a ticket that wins in exactly two categories.
4. The probability of randomly selecting a ticket that does not win electronics and does not win any gift cards.

Key Takeaways

Summary 📝

Set theory provides the mathematical foundation for probability
Probability is a function that maps sets (events) to numbers in [0,1]
The axioms of probability ensure consistency and allow derivation of rules
The complement rule and inclusion-exclusion principle are powerful tools
Venn diagrams help visualize complex probability relationships
R provides practical tools for implementing set operations and verifying probability calculations

Submission Guidelines

Show all work and intermediate steps
Use proper mathematical notation
Write probability statements explicitly before calculating
Double-check that all probabilities are between 0 and 1
Fill in all Venn diagram regions clearly