4.1. Basic Set Theory

Before we can analyze data using statistical inference, we need a mathematical framework to describe uncertainty-namely, probability theory. We begin by developing the basic vocabulary and operations of set thoery.

Road Map 🧭

  • Define experiments, sample spaces, and events as the building blocks of probability.

  • Explore set operations: complement, union, and intersection.

  • Visualize set relationships using Venn diagrams.

  • Understand the basic rules governing set operations.

4.1.1. Basic Terminology and Notation

Random Experiment

A Random experiment is a repeatable process with at least two possible outcomes, and where the result of any single run cannot be predicted with certainty.

Examples of random experiments include:

  • Rolling a die

  • Flipping a coin

  • Measuring the height of a randomly selected student

  • Observing whether a manufactured part is defective

Sample space

A sample space is the set of all possible outcomes of a random experiment.

A set is a mathematical object represnting a collection of objects. It is conventionally named with a capital letter. When listing its elements, we enclose them in a pair of curly brackets, \(\{\cdots\}\).

Because the sample space is a special set in probability, we set aside the letters \(S\) or \(\Omega\) (Omega) for it and rarely use them to name other sets. When referring to an arbitrary single outcome in a sample space, we use the lower case Greek letter \(\omega\) (omega).

To express that an outcome belongs to a set, we use the notation \(in\). For example,

\[\omega \in \Omega.\]

Trial

A trial refers to a single execution of a random experiment.

Event

An Event is a set of outcomes. An event can be

  • a simple event consisting of a single outcome,

  • a set as large as the whole sample space, or

  • an empty set (denoted by \(\emptyset\) or \(\{\}\)).

An empty set is a valid event representing the set of “impossible” outcomes in the experiment.

Events are denoted by capital letters other than the ones set aside for the sample space.

Example💡: Using set theory terminology and notation correctly

1. Suppose a 20-sided die is being rolled. Identify the random experiment, trial and sample space.

The random experiment is the process of rolling a 20-sided die. A trial consists of rolling the die once then observing the outcome. Its sample space can be denoted as:

\[\Omega = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20\}\]
  1. Name the events corresponding to

    1. “rolling an even number”

    2. “rolling a value of at least 21”

    3. “rolling a perfect 20”

    4. “rolling a number”

    and list their elements using the correct set theory notation.

    Denote the events described in parts (a)-(d) with capital letters \(A\) - \(D\), respectively.

    • \(A = \{2, 4, 6, 8, 10, 12, 14, 16, 18, 20\}\).

    • \(B = \{\}\) or \(B=\emptyset\) because there is no outcome in the sample space \(\Omega\) which fits the description. It is an “impossible” event.

    • \(C = \{20\}\). It is a simple event.

    • \(D = \Omega\) since any outcome in the sample space fits the description.

4.1.2. Visualizing Sets with Venn Diagrams

a simple venn diagram

Fig. 4.1 A simple venn diagram

Venn diagrams provide a visual tool for understanding relationships between sets. A Venn diagram is constructed based on the following rules:

  • The sample space is represented with the outer rectangle.

  • Circles inside the rectangle represent events.

  • In general, multiple events are drawn to slightly overlap to account for any outcomes that belong to both events.

  • All shapes must be labeled.

4.1.3. Core Set Operations

The power of set theory lies in the operations that allow us to construct new sets from existing ones. This section introduces the three most fundamental set operations: complement, union, and intersection. It then explains the concept of mutually exclusive (disjoint) sets.

Complement

venn diagram with A complement highlighted

Fig. 4.2 Complement of A

The complement of an event A, denoted by A’ (read as “A prime” or “A complement”), consists of outcomes in the sample space that are not in A.

Example💡: Complement

Find the complements of events \(A\) - \(D\) from the previous example.

  1. \(A' = \{1, 3, 5, 7, 9, 11, 13, 15, 17, 19\}\).

  2. \(B' = \Omega\) since all outcomes are not in event B.

  3. \(C' = \{1, 2, \cdots, 19\}\).

  4. \(D' = \emptyset\).

Union

venn diagram with the union of A and B highlighted

Fig. 4.3 Union of A and B

The union of events A and B, denoted by \(A \cup B\), contains all outcomes that belong to either A or B or both.

Example💡: Union

Suppose a 20-sided die is being rolled. Continue using the events \(A\) - \(D\) defined in the first example.

  1. Define a new event \(E\) consisting of all outcomes less than 6. Find the union of each of \(A\) - \(D\) with \(E\).

    Begin by writing down the elements of \(E\) formally:

    \[E = \{1,2,3,4,5\}.\]
    1. \(A \cup E = \{1,2,3,4,5,6,8,10,12,14,16,18,20\}\).

    2. \(B \cup E = \emptyset \cup E = \{1,2,3,4,5\} = E\).

    3. \(C \cup E = \{1,2,3,4,5,20\}\).

    4. \(D \cup E = \Omega \cup E = \Omega\).

  2. In general, what is the union of an event and its complement?

    A set and “everything else” make up a whole together. The union of an event with its complement is always the sample space.

Intersection

venn diagram with the intersection of A and B highlighted

Fig. 4.4 Intersection of A and B

The intersection of events A and B, denoted by \(A \cap B\), contains the outcomes that belong to both A and B.

Example💡: Intersection

Suppose a 20-sided die is being rolled. Continue using the events \(A\) - \(E\) defined in the previous examples.

  1. Find the intersection of each of \(A\) - \(D\) with \(E\).

    1. \(A \cap E = \{2,4\}\)

    2. \(B \cap E = \emptyset \cap E= \emptyset\) because the overlap of “nothing” with \(E\) is still “nothing”!

    3. \(C \cap E = \emptyset\). The two events have no shared elements.

    4. \(D \cap E = \Omega \cap E = \{1,2,3,4,5\} = E\). The entirety of \(E\) is an overlap between \(D\) and \(E\).

  2. In general, what is the intersection of an event and its complement?

    If an outcome belongs to an event, it cannot belong to the complement, and vice versa. Therfore, the intersection of an event and its complement is always an empty set.

Mutually Exclusive Events

venn diagram of mutually exclusive events

Fig. 4.5 Mutually exclusive events

When two events cannot happen simultaneously, we call them mutually exclusive or disjoint. Since no outcome can belong to both sets at the same time, their intersection is always an empty set. In mathematical notation, if events \(A\) and \(B\) are mutually exclusive, we write

\[A \cap B = \emptyset.\]

To express this special relationship graphically, we draw disjoint events as non-overlapping regions on a Venn diagram.

Example💡: Mutually exclusive sets

  1. Among the previously defined events \(A,B, \cdots, E\), can you identify any pair of events that are mutually exclusive?

    • \(C \cap E = \emptyset\) because they do not have any common outcomes.

    • \(A,C,D,E\) are all disjoint with \(B=\emptyset\) because the pairwise intersections are an empty set.

Subsets

Venn diagram showing set C as a subset of set A

Fig. 4.6 C is a subset of A (C ⊆ A)

We say A is a subset of B, written \(A \subseteq B\), if every outcome in A is also in B. In other words, the whole set A is a part of the set B.

On a Venn diagram, we express the relationship with the subset completely enclosed in the larger event.

Example💡: Subsets

  1. Among the previously defined events \(A,B, \cdots, E\), can you identify an event that is a subset of another event?

    • \(C \subseteq A\) because the outcome 20 makes up the whole event \(C\) but also belongs to the event \(A\).

    • All events are a subet of \(D=\Omega\).

    • Each event is a subset of itself because every outcome in \(A\) is also in \(A\)!

    • \(B = \emptyset\) is a subset of any other event. (This one requires a more subtle argument which we will not go through now. It is included for the completeness of the example.)

Extending Operations to Multiple Events

The union and intersection operations can be extended to multiple events using indexing notation.

A union of n events \(A_1, A_2, \cdots, A_n\) contains outcomes which belong to \(A_1, A_2, \cdots\), or \(A_n\). It is denoted by

\[\bigcup_{i=1}^{n} A_i = A_1 \cup A_2 \cup \ldots \cup A_n.\]

An intersection of n events \(A_1, A_2, \cdots, A_n\) contains outcomes which belong to all of \(A_1, A_2, \cdots\), and \(A_n\). It is denoted by

\[\bigcap_{i=1}^{n} A_i = A_1 \cap A_2 \cap \ldots \cap A_n.\]

Important tip 🛑

Being able to move fluently between plain-language descriptions of complex events and their mathematical notation is a vital skill. Recall that the three core set operations—complement, union, and intersection—correspond directly to the key words “not”, “or”, and “and”, respectively. Keep these translations in mind when approaching word problems.

4.1.4. Algebra of Sets

Set operations follow specific algebraic laws that parallel those in ordinary arithmetic. Understanding these laws allows us to manipulate complex expressions involving sets.

Commutative Laws

The order of operations doesn’t matter for unions and intersections.

\[\begin{split}A \cup B = B \cup A \\ A \cap B = B \cap A\end{split}\]

Associative Laws

When we have operations of the same type (all unions or all intersections), the way we group them doesn’t matter.

\[\begin{split}(A \cup B) \cup C = A \cup (B \cup C) \\ (A \cap B) \cap C = A \cap (B \cap C)\end{split}\]
  • Parentheses indicate that any operations inside must be prioritized over the ones outside.

When dealing with a sequence of the same operation, this property allows us to place parentheses wherever we want without changing the result. For example, when taking the union of three sets, we could first unite A and B and then unite the result with C, or we could first unite B and C and then unite A with that result—the final set will be identical either way.

Remark: Justifying the multiset operations notation

In fact, the right hand side expression of the multi-event operations

\[\bigcup_{i=1}^{n} A_i = A_1 \cup A_2 \cup \cdots \cup A_n\]
\[\bigcap_{i=1}^{n} A_i = A_1 \cap A_2 \cap \cdots \cap A_n\]

require the commutative and associative laws to be well defined. Without the associative property, we would need to specify exactly how the operations are grouped using parentheses. The commutative property further allows us to reorder or reindex the events without changing the result.

Distributive Laws

The distributive laws apply when when unions and intersections are used together.

\[\begin{split}A \cup (B \cap C) = (A \cup B) \cap (A \cup C) \\ A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\end{split}\]

They’re called “distributive” because one operation distributes over the other, similar to how multiplication distributes over addition in algebra (a × (b + c) = a × b + a × c).

Let us confirm that these laws hold through a concrete example.

Example💡: Distributive laws

Suppose that we are rolling a 6-sided die. Take the events A = {1, 2, 3}, B = {2, 3, 4}, and C = {3, 4, 5}. Compute \(A \cup (B \cap C)\) using the given expression AND using the appropriate distributive law. Confirm that the two methods yield an identical outcome.

Computing \(A \cup (B \cap C)\) directly

\(B \cap C = \{3,4\}\). Then \(A \cup (B \cap C) = \{1, 2, 3\} \cup \{3,4\} = \{1,2,3,4\}\).

Using the distributive law

Using the right hand side of the first law, let us compute \((A \cup B) \cap (A \cup C).\) First, \(A \cup B = \{1, 2, 3, 4\}\) and \(A \cup C = \{1, 2, 3, 4, 5\}\). Then

\[(A \cup B) \cap (A \cup C) = \{1, 2, 3, 4\} \cap \{1, 2, 3, 4, 5\} = \{1, 2, 3, 4\}.\]

Yes, the two methods give the same answer.

In many situations, it is much easier to compute the expression using one side of the equation than the other. The distributive laws They allow us to rewrite expressions in forms that might be easier to work with.

The distributive laws extend to multiple sets as well:

\[A \cup \left(\bigcap_{i=1}^{n} B_i\right) = \bigcap_{i=1}^{n} (A \cup B_i)\]
\[A \cap \left(\bigcup_{i=1}^{n} B_i\right) = \bigcup_{i=1}^{n} (A \cap B_i)\]

De Morgan’s Laws

De Morgan’s laws are a tool for manipulating complements of unions and intersections.

De Morgan’s First Law:

\[(A \cup B)' = A' \cap B'\]

In plain language, this means that for an outcome to be excluded from “A or B,” it must be excluded from both A and B.

De Morgan’s Second Law:

\[(A \cap B)' = A' \cup B'\]

This means that for an outcome to be excluded from “A and B,” it’s sufficient to be excluded from either A or B (or both).

Example💡: De Morgan’s Laws

Let us continue to consider the random experiment of rolling a six-sided die. Define new events:

  • \(D = \{5,6\}\) (the event of rolling a number greater than 4)

  • \(E = \{2, 4, 6\}\) (the event of rolling an even number)

1. Find \((D \cap E)'\) directly and using the appropriate De Margan’s law. Confirm that the two methods yield the same answer.

Finding \((D \cap E)'\) directly:

\((D \cap E)=\{6\}\). Then \((D \cap E)'=\{1,2,3,4,5\}\).

Using De Morgan’s second law:

\(D' = \{1,2,3,4\}\) and \(E' = \{1,3,5\}\). Therefore, \(D' \cup E' = \{1,2,3,4\} \cup \{1,3,5\} =\{1,2,3,4,5\}\).

Yes, the two approaches result in the same answer.

  1. Try verifying De Morgan’s first law as an independent exercise.

The general form of De Morgan’s first law for n events is:

Venn diagram showing set C as a subset of set A

Fig. 4.7 Multi-event first De Morgan’s Law

\[\left(\bigcup_{i=1}^{n} A_i\right)' = \bigcap_{i=1}^{n} A_i'\]

This reads as “the complement of the union of n events equals the intersection of the complements of those n events.” In other words, for an outcome to be outside the union of all events, it must be outside each individual event.

The general form of De Morgan’s second law for n events is:

\[\left(\bigcap_{i=1}^{n} A_i\right)' = \bigcup_{i=1}^{n} A_i'\]
Venn diagram showing set C as a subset of set A

Fig. 4.8 Multi-event second De Morgan’s Law

This reads as “the complement of the intersection of n events equals the union of the complements of those n events.” This means an outcome is outside the intersection of all events if it’s outside at least one of the individual events.

4.1.5. Bringing It All Together

In this chapter, we’ve built the mathematical framework for describing uncertainty through set theory.

Key Takeaways 📝

  1. Random experiments produce outcomes with uncertainty; the complete collection of possible outcomes forms the sample space.

  2. Events are subsets of the sample space, representing collections of outcomes we’re interested in studying.

  3. The fundamental set operations—complement, union, and intersection—allow us to create new events from existing ones.

  4. Venn diagrams provide a visual representation of set relationships.

  5. Set operations follow algebraic laws similar to those in ordinary arithmetic.

  6. De Morgan’s laws provide tools for manipulating complex expressions by relating complements of unions and intersections.

In the next chapter, we’ll assign numerical measures to events, allowing us to quantify uncertainty with probability. The set operations we’ve learned will become the building blocks for the probability rules that govern statistical inference.

Exercises

  1. Basics of random experiment. For each scenario below, identify the sample space and three events of interest:

    1. The daily closing stock price of a company listed on Nasdaq

    2. Testing whether manufactured products are defective

  2. 20-sided die. Let S = {1, 2, …, 20}, A = “even numbers”, and B = “numbers less than or equal to 10”. Verify De Morgan’s second law by showing that (A ∪ B)’ = A’ ∩ B’ for these specific sets.

  3. Venn diagram sketch. Draw three overlapping circles labeled A, B, and C. Shade the region corresponding to (A ∩ C’) ∪ (B ∩ C).