4.2. Probability
Now that we’ve established the language of set theory, we can build upon this foundation to describe uncertainty using probability.
Road Map 🧭
Define probability as a function that maps events to numerical values representing likelihood.
Establish the axioms that make a probability measure well-defined.
Compare frequentist and Bayesian interpretations of probability.
Develop fundamental rules for calculating probabilities: complement rule, addition rule.
Apply these concepts to solve probability problems with dice.
4.2.1. Probability as a Function
Probability is a function that maps events (sets) to real numbers in the interval \([0, 1]\):
For any event A in the sample space \(\Omega\), we denote its probability as \(P(A)\). The value of \(P(A)\) expresses how likely it is for \(A\) to occur.
4.2.2. Axioms of Probability
Not every function that maps events to numbers between 0 and 1 is a valid probability measure. To be considered a probability, the function must satisfy three fundamental axioms:
Axiom 1: Non-negativity
For any event A in the sample space Ω, its probability is always non-negative. That is,
Axiom 2: Normalization
The probability of the sample space is 1.
This axiom ensures that something from the sample space must occur when we perform our random experiment.
Axiom 3: Additivity
For any sequence of mutually exclusive events A₁, A₂, …, (that is, events where Aᵢ ∩ Aⱼ = ∅ for all i ≠ j):
This axiom states that the probability of a union of mutually exclusive events equals the sum of their individual probabilities.
Additional properties
From these three axioms, we can derive several important properties:
The probability of the empty set is zero. P(∅) = 0.
The probability of any event is at most 1. P(A) ≤ 1 for all events A.
If A ⊆ B, then P(A) ≤ P(B).
Example 💡: Two Dice Probability
Suppose we roll a six-sided die followed by a four-sided die, and record the outcome as an ordered pair of the two numbers, with the result of the six-sided die always listed first.
Our sample space Ω consists of all possible ordered pairs:
There are 6 × 4 = 24 possible outcomes in total. Assuming the dice are fair, each outcome is equally likely with probability 1/24. Define A as the event that the sum equals 6, B as the event that the sum equals 10, and C as the event of rolling doubles.
A = {(2,4), (3,3), (4,2), (5,1)}
B = {(6,4)}
C = {(1,1), (2,2), (3,3), (4,4)}
Compute the probabilities of the event A.
Using the third axiom of probability,
\[\begin{split}P(A) &= P((2,4)) + P((3,3)) + P((4,2)) + P((5,1)) \\ &= \frac{1}{24} + \frac{1}{24} + \frac{1}{24} + \frac{1}{24} = \frac{1}{6}\end{split}\]Since all outcomes are equally likely, the probability of an event depends only on the number of outcomes it contains—also known as its cardinality. We denote the cardinality of a set \(A\) as \(|A|\).
\[\]Compute \(P(A \cup C)\).
First, we find that the set \(A \cup C\) consists of \(\{(2,4), (3,3), (4,2), (5,1), (1,1), (2,2), (4,4)\}\). Then
\[P(A \cup C) = \frac{|A \cup B|}{|\Omega|} = \frac{7}{24}\]
4.2.3. Interpretations of Probability
There are different ways to interpret what probabilities are. The two major interpretations are the frequentist and Bayesian approaches.
Frequentist Interpretation
The frequentists define the probability of an event \(A\) as the relative frequency of its occurrence as the number of trials goes to infinity. Mathematically,
This approaches probability as an intrinsic property of the random process being studied. For example, the statement “a fair coin has a 50% probability of landing heads” means that if we toss the coin infinitely many times, the proportion of heads would approach 0.5.
Bayesian Interpretation
The Bayesian view on probability is that it is a degree of belief about the likelihood of events, which can be updated as new information becomes available. In this view, probability represents a state of knowledge rather than an intrinsic property of the world.
In the Bayesian framework, we often start with a prior belief about an event’s probability. As we collect data, we update this belief to form a posterior probability that incorporates the new evidence.
While this course primarily uses the frequentist approach, some Bayesian concepts will appear later in the semester.
4.2.4. Basic Rules of Probability
Several rules help us calculate probabilities for complex events based on simpler ones.
Complement Rule
For any event A,
Why does this rule hold?
This rule follows from our axioms. Recall that \(A \cup A' = \Omega\). Two equal events must have the same probability, so
Since \(A\) and \(A'\) are disjoint, Axiom 3 of probability says \(P(A \cup A') = P(A) + P(A')\). Also, Axiom 2 of Probability says \(P(\Omega)=1\). Using these, the equation above can be updated to
The complement rule is simply a rearrangement of this equation.

Fig. 4.9 Graphical illustration of the complement rule
Example 💡: When is the complement rule useful?
Continued from the first example of throwing two dice. Compute the probability that the sum of the two numbers is not equal to 10.
Approach 1: without using the complement rule
Name the event that the sum of the two numbers is not equal to 10.
List the elements in this event.
Add the probabilitities of each outcome in the event.
Approach 2: using the complement rule
Recognize that this event is the complement of \(B\). Therefore, we are essentially looking for \(P(B')\). Using the complement rule,
General Addition Rule
For any two events A and B,
General addition rule provides a way of computing the probability of the union of two events.
Why does this rule hold?

Fig. 4.10 Graphical illustration of the general addition rule
The key component of the general addition rule is the final subtraction of the intersection probability. If we simply add \(P(A)\) and \(P(B)\), we would count the outcomes in the intersection twice. Subtracting \(P(A ∩ B)\) corrects for this double-counting.
i. A Special Case: Mutually Exclusive Events
If A and B are mutually exclusive (\(A \cap B = \emptyset\)),
\[P(A \cup B) = P(A) + P(B).\]
This is a restatement of the third axiom of probability, but it can also be seen as a special case of the general addition rule. The general rule still applies; however, because A and B are mutually exclusive, we have \(P(A \cap B) = P(\emptyset) = 0\). As a result, the final subtraction of intersection probability becomes unnecessary.
Avoid the common mistake ‼️
Any special case formulas should only be used when the conditions for the special situation are fully met. When unsure, always begin with the general version.
ii. Extension to Multiple Events
The logic used for the two-way addition rule is a special case of a broader rule which works for cases with multiple events. This rule is called the inclusion-exclusion principle. For the general case with n events A₁, A₂, …, Aₙ, it follows the list of steps below:
Add the probabilities of individual events
Subtract the probabilities of all pairwise intersections
Add the probabilities of all triple intersections
Subtract the probabilities of all quadruple intersections
Continue this pattern, with the sign alternating for each term
For three events A, B, and C, the probability of their union is:
\[P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)\]
Example 💡: Two Dice Probability
Continued from the previous examples of throwing two dice.
Compute \(P(A\cup C)\) using the general addition rule, and confirm that the answer is the same as our previous approach without the rule.
\(A\cap C = \{(3,3)\}\). Using the rule and the fact that all outcomes are equally likely,
\[\begin{split}P(A\cup C) &= P(A) + P(C) - P(A\cap C) \\ &= \frac{|A|}{|\Omega|} + \frac{|C|}{|\Omega|} - \frac{|A\cap C|}{|\Omega|} \\ &= \frac{4}{24} + \frac{4}{24} - \frac{1}{24} = \frac{7}{24}.\end{split}\]Compute the probability that the outcome is a double or the sum is equal to 10.
The probability can be written as \(P(B \cup C)\). Since the two events are mutually exclusive, we can use the special addition rule:
\[P(B \cup C) = P(B) + P(C) = \frac{1}{24} + \frac{4}{24} = \frac{5}{24}.\]
4.2.5. Bringing It All Together
Key Takeaways 📝
Probability is a function that maps events to values between 0 and 1, representing the likelihood of those events occurring.
A valid probability function must satisfy three axioms: non-negativity, normalization (sample space has probability 1), and additivity for mutually exclusive events.
The complement rule: P(A’) = 1 - P(A). It isuseful for calculating probabilities of events defined by “at least one” or “none.”
The general addition rule: P(A ∪ B) = P(A) + P(B) - P(A ∩ B). It can be extended to cases with multiple events using inclusion-exclusion principle.
In the next chapter, we’ll build on these concepts to explore conditional probability and independence, which allow us to analyze how events influence each other.
Exercises
Basic Concepts: Explain why each of the following is or is not a valid probability function.
P(A) = -0.2 for some event A
P(Ω) = 0.95
P(A ∪ B) = P(A) + P(B) for all events A and B.
Card Probabilities: Suppose each card in a standard 52-card deck has an equal chance of being drawn. Calculate:
The probability of drawing a face card (jack, queen, or king).
The probability of drawing a red card or an ace.
The probability of drawing a card that is neither black nor a face card.
Dice Rolling: If you roll two fair six-sided dice, find:
The probability that their sum equals 7.
The probability that their sum equals 2 or 12.
The probability that the second die shows a larger value than the first die.
The probability that at least one die shows an even number.
Inclusion-Exclusion: In a group of 100 students, 65 study mathematics, 45 study physics, and 25 study both subjects. Calculate:
The probability that a randomly selected student studies mathematics or physics.
The probability that a randomly selected student studies neither subject.
The probability that a randomly selected student studies mathematics but not physics.