Slides 📊

4.2. Probability

Now that we’ve established the language of set theory, we can build upon this foundation to describe uncertainty using probability.

Road Map 🧭

Define probability as a function that maps events to numerical values representing likelihood.
Establish the axioms that make a probability measure well-defined.
Compare frequentist and Bayesian interpretations of probability.
Develop fundamental rules for calculating probabilities.

4.2.1. Probability as a Function

Probability is a function that maps events (sets) to real numbers in the interval \([0, 1]\):

\[P: \{\text{events}\} \rightarrow [0, 1].\]

For any event \(A\) in the sample space \(\Omega\), we denote its probability as \(P(A)\). The value of \(P(A)\) expresses how likely it is for \(A\) to occur.

4.2.2. Axioms of Probability

Not every function that maps events to numbers between 0 and 1 is a valid probability measure. To be considered a probability, the function must satisfy three fundamental axioms:

Axiom 1: Non-negativity

For any event \(A\) in the sample space \(\Omega\), its probability is always non-negative. That is,

\[P(A) \geq 0.\]

Axiom 2: Normalization

The probability of the sample space is 1.

\[P(\Omega) = 1\]

This axiom ensures that something from the sample space must occur when we perform our random experiment.

Axiom 3: Additivity

For any sequence of mutually exclusive events \(A_1, A_2, ...\) (that is, \(A_i \cap A_j = \emptyset\) for all \(i\neq j\)),

\[P\left(\bigcup_{i=1}^{n} A_i\right) = \sum_{i=1}^{n} P(A_i)\]

This axiom states that the probability of a union of mutually exclusive events equals the sum of their individual probabilities.

Additional properties

From these three axioms, we can derive several additional properties:

The probability of the empty set is zero. \(P(\emptyset) = 0\).
The probability of any event is at most 1. \(P(A) \leq 1\) for all events \(A\).
If \(A \subseteq B\), then \(P(A) \leq P(B)\).

You are encouraged to try proving these on your own.

Example 💡: Two Dice Probability

Suppose we roll a six-sided die followed by a four-sided die, and record the outcome as an ordered pair with the result from the six-sided die always listed first.

The sample space \(\Omega\) consists of all possible ordered pairs:

\[\Omega = \{(1,1), (1,2), (1,3), (1,4), (2,1), ..., (6,4)\}\]

There are 6 × 4 = 24 possible outcomes in total. Assuming the dice are fair, each outcome is equally likely with probability 1/24. Define \(A\) as the event that the sum equals 6, \(B\) as the event that the sum equals 10, and \(C\) as the event of rolling doubles.

\(A = \{(2,4), (3,3), (4,2), (5,1)\}\)
\(B = \{(6,4)\}\)
\(C = \{(1,1), (2,2), (3,3), (4,4)\}\)

Compute the probability of \(A\).

Using the third axiom of probability,

\[\begin{split}P(A) &= P((2,4)) + P((3,3)) + P((4,2)) + P((5,1)) \\ &= \frac{1}{24} + \frac{1}{24} + \frac{1}{24} + \frac{1}{24} = \frac{1}{6}\end{split}\]

Since all outcomes are equally likely, the probability of an event depends only on the number of outcomes it contains—also known as its cardinality. We denote the cardinality of a set \(A\) as \(|A|\).

\[\]
Compute \(P(A \cup C)\).

First, we find that the set \(A \cup C\) consists of \(\{(2,4), (3,3), (4,2), (5,1), (1,1), (2,2), (4,4)\}\). Then

\[P(A \cup C) = \frac{|A \cup B|}{|\Omega|} = \frac{7}{24}\]

4.2.3. Interpretations of Probability

There are different ways to interpret what probabilities are. The two major interpretations are the frequentist and Bayesian approaches.

Frequentist Interpretation

The frequentists define the probability of an event \(A\) as the relative frequency of its occurrence as the number of trials \(n\) goes to infinity. Mathematically,

\[P(A) = \lim_{n \to \infty} \frac{\text{Number of times A occurs}}{n}\]

This view approaches probability as an intrinsic property of the random process being studied. For example, the statement “a fair coin has 0.5 probability of landing heads” means that if we toss the coin infinitely many times, the proportion of heads would approach 0.5.

Bayesian Interpretation

The Bayesians view probability as a degree of belief that can be updated as new information becomes available. They always have a prior belief about an event’s probability which gets updated to form a posterior probability once new evidence emerges. In this perspective, probability represents a state of knowledge rather than an intrinsic property of the world.

While this course primarily uses the frequentist approach, some Bayesian concepts will appear later in the semester.

4.2.4. Basic Rules of Probability

Several rules help us calculate probabilities for complex events based on simpler ones.

A. Complement Rule

For any event A,

\[P(A') = 1 - P(A).\]

Why does this rule hold?

This rule follows from the Axioms of Probability. Recall that \(A \cup A' = \Omega\). Two equal events must have the same probability, so

\[P(A \cup A') = P(\Omega).\]

Since \(A\) and \(A'\) are disjoint, Axiom 3 says \(P(A \cup A') = P(A) + P(A')\). Also, Axiom 2 says \(P(\Omega)=1\). Using these, the equation above can be updated to

\[P(A) + P(A') = 1.\]

The complement rule is simply a rearrangement of this equation.

Diagram of the complement rule — Fig. 4.9 Graphical illustration of the complement rule

Example 💡: When is the Complement Rule Useful?

Continued from the first example of throwing two dice. Compute the probability that the sum of the two numbers is not equal to 10.

Approach 1: without using the complement rule

Name the event that the sum of the two numbers is not equal to 10.
List the elements in this event.
Add the probabilitities of each outcome in the event.

Approach 2: using the complement rule

Recognize that this event is the complement of \(B\). Therefore, we are essentially looking for \(P(B')\). Using the complement rule,

\[P(B') = 1-P(B) = 1 - \frac{1}{24} = \frac{23}{24}.\]

B. General Addition Rule

For any two events A and B,

\[P(A \cup B) = P(A) + P(B) - P(A \cap B).\]

Why does this rule hold?

Diagram of general addition rule — Fig. 4.10 Graphical illustration of the general addition rule

The key component of the general addition rule is the final subtraction of the intersection probability. If we simply add \(P(A)\) and \(P(B)\), we would count the outcomes in the intersection twice. Subtracting \(P(A ∩ B)\) corrects for this double-counting.

i. A Special Case: Mutually Exclusive Events

If \(A\) and \(B\) are mutually exclusive (\(A \cap B = \emptyset\)),

\[P(A \cup B) = P(A) + P(B).\]

This is a restatement of the third axiom of probability, but it can also be seen as a special case of the general addition rule. When the rule is applied to the disjoint events \(A\) and \(B\), the third term disappears because \(P(A \cap B) = P(\emptyset) = 0\).

Avoid the common mistake ‼️

Any special case formulas should only be used when the conditions for the special situation are fully met. When unsure, always begin with the general version.

ii. Extension to Multiple Events

This two-way general addition rule is a special case of the broader inclusion-exclusion principle, which provides a way to decompose the union probability of two or more events. For \(n\) events \(A_1, A_2, ..., A_n\), the inclusion-exclusion principle is constructed following the steps below:

Add the probabilities of individual events.

Subtract the probabilities of all pairwise intersections.

Add the probabilities of all triple intersections.

Subtract the probabilities of all quadruple intersections.

Continue this pattern, with the sign alternating for each step.

Applying the principle to three events \(A\), \(B\), and \(C\), the probability of their union is:

\[\begin{split}P(A \cup B \cup C) &= P(A) + P(B) + P(C) - P(A \cap B) - \\ &P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)\end{split}\]

Example 💡: Two Dice Probability

Continued from the previous examples of throwing two dice.

Compute \(P(A\cup C)\) using the general addition rule, and confirm that the answer is the same as our previous approach without the rule.

\(A\cap C = \{(3,3)\}\). Using the rule and the fact that all outcomes are equally likely,

\[\begin{split}P(A\cup C) &= P(A) + P(C) - P(A\cap C) \\ &= \frac{|A|}{|\Omega|} + \frac{|C|}{|\Omega|} - \frac{|A\cap C|}{|\Omega|} \\ &= \frac{4}{24} + \frac{4}{24} - \frac{1}{24} = \frac{7}{24}.\end{split}\]
Compute the probability that the outcome is a double or the sum is equal to 10.

The probability can be written as \(P(B \cup C)\). Since the two events are mutually exclusive, we can use the special addition rule:

\[P(B \cup C) = P(B) + P(C) = \frac{1}{24} + \frac{4}{24} = \frac{5}{24}.\]

4.2.5. Bringing It All Together

Key Takeaways 📝

Probability is a function that maps events to values between 0 and 1, representing the likelihood of those events occurring.
A valid probability function must satisfy three axioms: non-negativity, normalization (sample space has probability 1), and additivity for mutually exclusive events.
The complement rule: \(P(A') = 1 - P(A)\). It isuseful for calculating probabilities of events defined by “at least one” or “none.”
The general addition rule: \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\). It can be extended to cases with multiple events using inclusion-exclusion principle.

In the next chapter, we’ll build on these concepts to explore conditional probability and independence, which allow us to analyze how events influence each other.

Exercises

Basic Concepts: Explain why each of the following is or is not a valid probability function.
1. \(P(A) = -0.2\) for some event \(A\).
2. \(P(\Omega) = 0.95\).
3. \(P(A \cup B) = P(A) + P(B)\) for all events A and B.
Card Probabilities: Suppose each card in a standard 52-card deck has an equal chance of being drawn. Calculate:
1. The probability of drawing a face card (jack, queen, or king).
2. The probability of drawing a red card or an ace.
3. The probability of drawing a card that is neither black nor a face card.
Dice Rolling: If you roll two fair six-sided dice, find:
1. The probability that their sum equals 7.
2. The probability that their sum equals 2 or 12.
3. The probability that the second die shows a larger value than the first die.
4. The probability that at least one die shows an even number.
Inclusion-Exclusion: In a group of 100 students, 65 study mathematics, 45 study physics, and 25 study both subjects. Calculate:
1. The probability that a randomly selected student studies mathematics or physics.
2. The probability that a randomly selected student studies neither subject.
3. The probability that a randomly selected student studies mathematics but not physics.