.. _worksheet2:

Worksheet 2: Set Theory and Probability Fundamentals
====================================================

.. admonition:: Learning Objectives 🎯
   :class: info

   • Master fundamental set theory notation and operations
   • Understand the axiomatic foundation of probability
   • Apply set operations to calculate probabilities
   • Use the complement rule and inclusion-exclusion principle
   • Visualize complex probability problems using Venn diagrams
   • **Implement set operations and probability calculations in R**

Introduction
------------

Probability theory is built on the foundation of set theory. Before we can rigorously discuss probabilities, we must understand how to work with sets—collections of objects that form the building blocks of probabilistic reasoning. This worksheet introduces essential set notation and operations, then connects these concepts to the fundamental axioms of probability.

Part 1: Set Theory Foundations
-------------------------------

Set theory provides a precise mathematical language for describing collections of objects. Here are key symbols and their meanings:

.. glossary::

   ∈ (element of)
      Denotes membership in a set. For example, :math:`x \in E` means :math:`x` belongs to set :math:`E`.

   ℤ (integers)
      The set of all integers: :math:`\{..., -2, -1, 0, 1, 2, ...\}`

   ℕ (natural numbers)
      The set of positive integers: :math:`\{1, 2, 3, ...\}`

   ∩ (intersection)
      Binary operator representing elements in **both** sets. Associated with "and" in English.

   ∪ (union)
      Binary operator representing elements in **either** set (or both). Associated with "or" in English.

**Set-Builder Notation**

Set-builder notation defines sets by specifying properties their elements must satisfy. For example:

.. math::
   E = \{x \in \mathbb{Z}^+ \mid x \text{ is even and } x \leq 10\}

This defines :math:`E` as the set of all positive integers :math:`x` that are both even and at most 10.

**R Implementation of Sets**

.. code-block:: r

   # In R, we can represent sets as vectors
   # Example: E = {x ∈ Z+ | x is even and x ≤ 10}
   E <- seq(2, 10, by = 2)
   print(E)
   
   # Set operations in R
   # Union: union(A, B)
   # Intersection: intersect(A, B)
   # Set difference: setdiff(A, B)
   # Check membership: x %in% A

**Question 1:** Let :math:`A` and :math:`B` be sets defined as follows:

.. math::
   A = \{x \in \mathbb{Z} \mid -5 \leq x \leq 5\}
   
   B = \{x \in \mathbb{N} \mid x \text{ is even and } x \leq 10\}

Further consider the sets :math:`C = A \cap B` and :math:`D = A \cup B`.

a) Write out the expanded set of elements of :math:`A`, then separately write out the expanded set of elements for :math:`B`.

.. code-block:: r

   # Define set A
   A <- -5:5
   
   # Define set B
   B <- seq(2, 10, by = 2)
   
   # Print the sets
   print(paste("A =", toString(A)))
   print(paste("B =", toString(B)))


b) Determine the elements contained in :math:`C`, and separately determine the elements contained in :math:`D`.

.. code-block:: r

   # Calculate C = A ∩ B
   C <- intersect(A, B)
   
   # Calculate D = A ∪ B
   D <- union(A, B)
   
   # Print results
   print(paste("C = A ∩ B =", toString(C)))
   print(paste("D = A ∪ B =", toString(sort(D))))


c) Using set-builder notation, express the set :math:`C`.


d) Formally describe the set :math:`D` in terms of the sets :math:`A` and :math:`B`, combining English and the 'element of' (∈) symbol.


Part 2: Probability Axioms
--------------------------

Probability is a function :math:`P(\cdot)` that takes a set (or event) :math:`E` as input and outputs a real number :math:`p` in the interval :math:`[0, 1]`.

**The Fundamental Axioms of Probability:**

1. **Non-negativity:** For any event :math:`E`, :math:`0 \leq P(E) \leq 1`.

2. **Normalization:** :math:`P(\Omega) = 1`, where :math:`\Omega` denotes the entire sample space.

3. **Additivity:** For any event :math:`E`, :math:`P(E) = \sum_{\omega \in E} P(\omega)`. We add up the probabilities of all simple events in :math:`E`.

4. **Empty Set:** It follows that :math:`P(\emptyset) = 0`, where :math:`\emptyset` denotes the empty set.

**R Demonstration: Verifying Probability Axioms**

.. code-block:: r

   # Example: Rolling a fair die
   # Sample space
   omega <- 1:6
   
   # Probability function for fair die
   prob_fair_die <- function(event) {
     if (length(event) == 0) return(0)  # Empty set
     valid_outcomes <- sum(event %in% omega)
     return(valid_outcomes / length(omega))
   }
   
   # Verify axioms
   # Axiom 1: 0 ≤ P(E) ≤ 1
   event1 <- c(1, 2, 3)
   print(paste("P({1,2,3}) =", prob_fair_die(event1)))
   
   # Axiom 2: P(Ω) = 1
   print(paste("P(Ω) =", prob_fair_die(omega)))
   
   # Axiom 4: P(∅) = 0
   print(paste("P(∅) =", prob_fair_die(c())))

**Question 2:** Using these axioms, answer the following questions:

a) What does it mean for :math:`P` to be a function that operates on sets rather than directly on elements of the sample space or numerical values? Why must the input to :math:`P(\cdot)` always be a set?


b) Explain why the following statement is not a valid probability expression: :math:`P(A) \cap P(B) \cap P(C)`.


c) If :math:`A \subset B`, use axiom 3 to justify why :math:`P(A) < P(B)`.


d) The complement of a set :math:`E`, denoted :math:`E'`, is defined as :math:`E' = \{\omega \in \Omega \mid \omega \notin E\}`. Using axiom 2 and axiom 3, derive the complement rule :math:`P(E') = 1 - P(E)`.

.. code-block:: r

   # Demonstrate complement rule
   E <- c(1, 2, 3)
   E_complement <- setdiff(omega, E)
   
   print(paste("E =", toString(E)))
   print(paste("E' =", toString(E_complement)))
   print(paste("P(E) =", prob_fair_die(E)))
   print(paste("P(E') =", prob_fair_die(E_complement)))
   print(paste("P(E) + P(E') =", prob_fair_die(E) + prob_fair_die(E_complement)))


Part 3: Applying Probability Rules
----------------------------------

.. admonition:: Why Formality and Intermediate Steps Matter
   :class: tip
   
   Writing probability statements explicitly and showing intermediate steps ensures:
   
   1. **Clarity:** Identifies the correct rules and logic to apply
   2. **Accuracy:** Reduces errors, especially in multi-step calculations
   3. **Preparation for Complexity:** Builds habits needed for advanced problems
   4. **Communication Skills:** Clear steps improve ability to explain and justify work

**Question 3:** Let :math:`E_1` and :math:`E_2` be two events of a sample space :math:`\Omega`, with known probabilities:

.. math::
   P(E_1) = 0.3 \quad P(E_2) = 0.6 \quad P(E_1 \cup E_2) = 0.75

Calculate the following probabilities. Write out probability statements explicitly before performing calculations and include all intermediate steps.

**R Helper Functions**

.. code-block:: r

   # Function to calculate P(A ∩ B) given P(A), P(B), and P(A ∪ B)
   prob_intersection <- function(p_a, p_b, p_union) {
     # Using: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
     return(p_a + p_b - p_union)
   }
   
   # Given values
   p_e1 <- 0.3
   p_e2 <- 0.6
   p_union <- 0.75
   
   # Calculate P(E1 ∩ E2)
   p_intersection_e1_e2 <- prob_intersection(p_e1, p_e2, p_union)
   print(paste("P(E1 ∩ E2) =", p_intersection_e1_e2))

a) Calculate the probability that both :math:`E_1'` and :math:`E_2'` occur simultaneously.


b) Calculate the probability that both :math:`E_1` and :math:`E_2` occur simultaneously.


c) Calculate the probability that both :math:`E_1'` and :math:`E_2` occur simultaneously.


Part 4: The Inclusion-Exclusion Principle
-----------------------------------------

**Question 4:** A festival raffle has a total of :math:`N` tickets, divided into the following categories of winners:

- :math:`|A| = 40`: Tickets that win electronics
- :math:`|B| = 30`: Tickets that win gift cards  
- :math:`|C| = 20`: Tickets that win home appliances
- :math:`|A \cap B| = 10`: Tickets that win both electronics and gift cards
- :math:`|A \cap C| = 5`: Tickets that win both electronics and home appliances
- :math:`|B \cap C| = 3`: Tickets that win both gift cards and home appliances
- :math:`|A \cap B \cap C| = 2`: Tickets that win in all three categories
- The remaining 432 tickets do not win any prizes

**Venn Diagram Template**

Fill in each region of the Venn diagram below with the number of tickets:

.. raw:: html

   <div style="text-align: center; margin: 30px 0;">
   <svg width="550" height="450" xmlns="http://www.w3.org/2000/svg">
     <!-- Background -->
     <rect width="550" height="450" fill="#ffffff" stroke="black" stroke-width="2"/>
     
     <!-- Title -->
     <text x="25" y="30" font-size="22" font-weight="bold">Ω (Sample Space)</text>
     
     <!-- Circle A (Electronics) - Red -->
     <circle cx="200" cy="180" r="110" fill="rgba(255,0,0,0.1)" stroke="red" stroke-width="3"/>
     <text x="120" y="100" font-size="20" font-weight="bold" fill="darkred">A</text>
     <text x="80" y="125" font-size="14" fill="darkred">Electronics</text>
     
     <!-- Circle B (Gift Cards) - Blue -->
     <circle cx="350" cy="180" r="110" fill="rgba(0,0,255,0.1)" stroke="blue" stroke-width="3"/>
     <text x="420" y="100" font-size="20" font-weight="bold" fill="darkblue">B</text>
     <text x="390" y="125" font-size="14" fill="darkblue">Gift Cards</text>
     
     <!-- Circle C (Home Appliances) - Green -->
     <circle cx="275" cy="290" r="110" fill="rgba(0,255,0,0.1)" stroke="green" stroke-width="3"/>
     <text x="265" y="390" font-size="20" font-weight="bold" fill="darkgreen">C</text>
     <text x="210" y="415" font-size="14" fill="darkgreen">Home Appliances</text>
     
     <!-- Region labels with boxes for students to fill -->
     <!-- Only A -->
     <rect x="110" y="160" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="135" y="180" font-size="14" text-anchor="middle" fill="#666">Only A</text>
     
     <!-- Only B -->
     <rect x="390" y="160" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="415" y="180" font-size="14" text-anchor="middle" fill="#666">Only B</text>
     
     <!-- Only C -->
     <rect x="250" y="340" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="275" y="360" font-size="14" text-anchor="middle" fill="#666">Only C</text>
     
     <!-- A ∩ B only -->
     <rect x="250" y="130" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="275" y="150" font-size="12" text-anchor="middle" fill="#666">A∩B only</text>
     
     <!-- A ∩ C only -->
     <rect x="175" y="250" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="200" y="270" font-size="12" text-anchor="middle" fill="#666">A∩C only</text>
     
     <!-- B ∩ C only -->
     <rect x="325" y="250" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="350" y="270" font-size="12" text-anchor="middle" fill="#666">B∩C only</text>
     
     <!-- A ∩ B ∩ C -->
     <rect x="250" y="200" width="50" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="275" y="220" font-size="14" text-anchor="middle" fill="#666">A∩B∩C</text>
     
     <!-- Outside all circles (None) -->
     <rect x="440" y="380" width="80" height="30" fill="white" stroke="black" stroke-width="1" rx="3"/>
     <text x="480" y="400" font-size="14" text-anchor="middle" fill="#666">None</text>
   </svg>
   </div>

.. admonition:: R Visualization Exercise 🖥️
   :class: tip
   
   **Creating a Venn Diagram with R and AI Assistance**
   
   Use your favorite AI assistant (ChatGPT, Claude, etc.) to help you create a Venn diagram visualization in R. Follow these prompting strategies:
   
   **Step 1: Initial Setup Prompt**
   
   *"I need to create a Venn diagram in R for a probability problem. I have three sets A, B, and C with the following properties: |A|=40, |B|=30, |C|=20, |A∩B|=10, |A∩C|=5, |B∩C|=3, |A∩B∩C|=2. What R package would you recommend for creating Venn diagrams, and how do I install it?"*
   
   **Step 2: Understanding the Package**
   
   *"Can you explain how the ggVennDiagram package works? What format does it expect the data in? I need to represent sets with specific intersection counts."*
   
   **Step 3: Creating the Diagram**
   
   *"Help me create lists/vectors in R that will produce a Venn diagram with exactly these intersection counts. I want the diagram to show the actual numbers in each region."*
   
   **Step 4: Customization**
   
   *"How can I customize the colors, labels, and title of my Venn diagram? I want Electronics in red, Gift Cards in blue, and Home Appliances in green."*
   
   **Verification Questions to Ask Your AI:**
   
   - *"How can I verify that my lists produce the correct intersection counts?"*
   - *"What's the difference between the total count |A| and the exclusive 'only A' region?"*
   - *"Can you show me how to use R's intersect() function to check my work?"*
   
   **Learning Goals:**
   
   Through this exercise, you should understand:
   - How Venn diagram packages represent overlapping sets
   - The relationship between set notation and R's list/vector structures
   - How to verify your mathematical calculations using R functions

a) Determine :math:`N`: Using the inclusion-exclusion principle and additional knowledge, calculate the total number of tickets :math:`N`.

**The inclusion-exclusion principle for three sets states:**

.. math::
   |A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|


b) After determining :math:`N`, calculate the following probabilities:

   i. The probability of randomly selecting a ticket that wins in exactly one category.
   
   .. raw:: html

      <div style="border: 1px solid #ddd; padding: 15px; margin: 15px 0; background-color: #f9f9f9;">
      <em>Space for student answer</em>
      </div>

   ii. The probability of randomly selecting a ticket that wins in at least two categories.
   
   .. raw:: html

      <div style="border: 1px solid #ddd; padding: 15px; margin: 15px 0; background-color: #f9f9f9;">
      <em>Space for student answer</em>
      </div>

   iii. The probability of randomly selecting a ticket that wins in exactly two categories.
   
   .. raw:: html

      <div style="border: 1px solid #ddd; padding: 15px; margin: 15px 0; background-color: #f9f9f9;">
      <em>Space for student answer</em>
      </div>

   iv. The probability of randomly selecting a ticket that does not win electronics and does not win any gift cards.
   
   .. raw:: html

      <div style="border: 1px solid #ddd; padding: 15px; margin: 15px 0; background-color: #f9f9f9;">
      <em>Space for student answer</em>
      </div>


Simulation Exercise
--------------------------

**Verify your probability calculations through simulation:**

.. code-block:: r

   # Simulate the raffle
   simulate_raffle <- function(n_sim = 10000) {
     # Create the population of tickets
     tickets <- c(
       rep("A", only_A),
       rep("B", only_B), 
       rep("C", only_C),
       rep("AB", only_AB),
       rep("AC", only_AC),
       rep("BC", only_BC),
       rep("ABC", all_three),
       rep("None", n_none)
     )
     
     # Simulate draws
     draws <- sample(tickets, n_sim, replace = TRUE)
     
     # Calculate empirical probabilities
     cat("\nSimulation Results (", n_sim, "draws):\n")
     cat("P(exactly one) ≈", mean(draws %in% c("A", "B", "C")), "\n")
     cat("P(at least two) ≈", mean(draws %in% c("AB", "AC", "BC", "ABC")), "\n")
     cat("P(exactly two) ≈", mean(draws %in% c("AB", "AC", "BC")), "\n")
     cat("P(not A and not B) ≈", mean(draws %in% c("C", "None")), "\n")
   }
   
   # Run simulation after calculating N
   simulate_raffle()

Key Takeaways
-------------

.. admonition:: Summary 📝
   :class: important
   
   • Set theory provides the mathematical foundation for probability
   • Probability is a function that maps sets (events) to numbers in [0,1]
   • The axioms of probability ensure consistency and allow derivation of rules
   • The complement rule and inclusion-exclusion principle are powerful tools
   • Venn diagrams help visualize complex probability relationships
   • R provides practical tools for implementing set operations and verifying probability calculations

Submission Guidelines
---------------------

1. Show all work and intermediate steps
2. Use proper mathematical notation  
3. Write probability statements explicitly before calculating
4. Double-check that all probabilities are between 0 and 1
5. Fill in all Venn diagram regions clearly