Probability
========================
Now that we've established the language of set theory, we can build upon this foundation to describe
uncertainty using probability.
.. admonition:: Road Map 🧭
:class: important
• Define probability as a function that maps events to numerical values representing
likelihood.
• Establish the axioms that make a probability measure well-defined.
• Compare frequentist and Bayesian interpretations of probability.
• Develop fundamental rules for calculating probabilities: complement rule, addition rule.
• Apply these concepts to solve probability problems with dice.
Probability as a Function
---------------------------
**Probability** is a function that maps events (sets) to real numbers in the interval :math:`[0, 1]`:
.. math:: P: \{\text{events}\} \rightarrow [0, 1].
For any event A in the sample space :math:`\Omega`, we denote its probability as :math:`P(A)`.
The value of :math:`P(A)` expresses how likely it is for :math:`A` to occur.
Axioms of Probability
-----------------------
Not every function that maps events to numbers between 0 and 1 is a valid probability measure.
To be considered a probability, the function must satisfy three fundamental axioms:
Axiom 1: Non-negativity
~~~~~~~~~~~~~~~~~~~~~~~~~
For any event A in the sample space Ω, its probability is always non-negative. That is,
.. math:: P(A) \geq 0.
Axiom 2: Normalization
~~~~~~~~~~~~~~~~~~~~~~~
The probability of the sample space is 1.
.. math:: P(\Omega) = 1
This axiom ensures that something from the sample space must occur when we perform our random experiment.
Axiom 3: Additivity
~~~~~~~~~~~~~~~~~~~~~~~~
For any sequence of mutually exclusive events A₁, A₂, ..., (that is, events where Aᵢ ∩ Aⱼ = ∅ for all i ≠ j):
.. math::
P\left(\bigcup_{i=1}^{n} A_i\right) = \sum_{i=1}^{n} P(A_i)
This axiom states that the probability of a union of mutually exclusive events equals the sum of their individual probabilities.
.. admonition:: Additional properties
:class: important
From these three axioms, we can derive several important properties:
1. The probability of the empty set is zero. P(∅) = 0.
2. The probability of any event is at most 1. P(A) ≤ 1 for all events A.
3. If A ⊆ B, then P(A) ≤ P(B).
.. admonition:: Example 💡: Two Dice Probability
:class: note
Suppose we roll a six-sided die followed by a four-sided die, and record the
outcome as an ordered pair of the two numbers,
with the result of the six-sided die always listed first.
Our sample space Ω consists of all possible ordered pairs:
.. math::
\Omega = \{(1,1), (1,2), (1,3), (1,4), (2,1), ..., (6,4)\}
There are 6 × 4 = 24 possible outcomes in total. Assuming the dice are fair,
each outcome is equally likely with probability 1/24. Define A as the event
that the sum equals 6, B as the event that the sum equals 10, and C
as the event of rolling doubles.
- A = {(2,4), (3,3), (4,2), (5,1)}
- B = {(6,4)}
- C = {(1,1), (2,2), (3,3), (4,4)}
1. Compute the probabilities of the event A.
Using the third axiom of probability,
.. math::
P(A) &= P((2,4)) + P((3,3)) + P((4,2)) + P((5,1)) \\
&= \frac{1}{24} + \frac{1}{24} + \frac{1}{24} +
\frac{1}{24} = \frac{1}{6}
Since all outcomes are equally likely, the probability of an event depends only on the number of outcomes
it contains—also known as its *cardinality*. We denote the cardinality of a set :math:`A` as :math:`|A|`.
.. math::
2. Compute :math:`P(A \cup C)`.
First, we find that the set :math:`A \cup C` consists of :math:`\{(2,4), (3,3), (4,2), (5,1), (1,1), (2,2), (4,4)\}`.
Then
.. math:: P(A \cup C) = \frac{|A \cup B|}{|\Omega|} = \frac{7}{24}
Interpretations of Probability
-------------------------------
There are different ways to interpret what probabilities are. The two major
interpretations are the frequentist and Bayesian approaches.
Frequentist Interpretation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The **frequentists** define the probability of an event :math:`A` as the relative frequency
of its occurrence as the number of trials goes to infinity. Mathematically,
.. math::
P(A) = \lim_{n \to \infty} \frac{\text{Number of times A occurs}}{\text{Number of trials}}
This approaches probability as **an intrinsic property of the random process** being studied.
For example, the statement "a fair coin has a 50% probability of landing heads" means that
if we toss the coin infinitely many times, the proportion of heads would approach 0.5.
Bayesian Interpretation
~~~~~~~~~~~~~~~~~~~~~~~~~
The **Bayesian** view on probability is that it is a **degree of belief** about the likelihood
of events, which can be updated as new information becomes available. In this view, probability
represents **a state of knowledge rather than an intrinsic property of the world**.
In the Bayesian framework, we often start with a prior belief about an event's probability.
As we collect data, we update this belief to form a posterior probability that incorporates the new evidence.
While this course primarily uses the frequentist approach, some Bayesian concepts will appear later in the semester.
Basic Rules of Probability
-----------------------------
Several rules help us calculate probabilities for complex events based on simpler ones.
Complement Rule
~~~~~~~~~~~~~~~~~
For any event A,
.. math:: P(A') = 1 - P(A).
**Why does this rule hold?**
This rule follows from our axioms. Recall that :math:`A \cup A' = \Omega`.
Two equal events must have the same probability, so
.. math:: P(A \cup A') = P(\Omega).
Since :math:`A` and :math:`A'` are disjoint, Axiom 3 of probability says :math:`P(A \cup A') = P(A) + P(A')`.
Also, Axiom 2 of Probability says :math:`P(\Omega)=1`. Using these, the equation above can be updated to
.. math:: P(A) + P(A') = 1.
The complement rule is simply a rearrangement of this equation.
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter4/complement-rule.png
:alt: Diagram of the complement rule
:align: center
:figwidth: 90%
Graphical illustration of the complement rule
.. admonition:: Example 💡: When is the complement rule useful?
:class: note
Continued from the first example of throwing two dice. Compute the probability that
the sum of the two numbers is not equal to 10.
**Approach 1: without using the complement rule**
1. Name the event that the sum of the two numbers is not equal to 10.
2. List the elements in this event.
3. Add the probabilitities of each outcome in the event.
**Approach 2: using the complement rule**
Recognize that this event is the complement of :math:`B`. Therefore, we are essentially
looking for :math:`P(B')`. Using the complement rule,
.. math:: P(B') = 1-P(B) = 1 - \frac{1}{24} = \frac{23}{24}.
General Addition Rule
~~~~~~~~~~~~~~~~~~~~~~~~
For any two events A and B,
.. math::
P(A \cup B) = P(A) + P(B) - P(A \cap B).
General addition rule provides a way of computing the probability of the union of
two events.
**Why does this rule hold?**
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter4/add-rule.png
:alt: Diagram of general addition rule
:align: center
:figwidth: 90%
Graphical illustration of the general addition rule
The key component of the general addition rule is the final subtraction of the
intersection probability. If we simply add :math:`P(A)` and :math:`P(B)`,
we would count the outcomes in the intersection twice. Subtracting :math:`P(A ∩ B)` corrects for this double-counting.
i. A Special Case: Mutually Exclusive Events
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If A and B are mutually exclusive (:math:`A \cap B = \emptyset`),
.. math::
P(A \cup B) = P(A) + P(B).
This is a restatement of the third axiom of probability, but it can also be seen as a
special case of the general addition rule. The general rule still applies; however, because
A and B are mutually exclusive, we have :math:`P(A \cap B) = P(\emptyset) = 0`. As
a result, the final subtraction of intersection probability becomes unnecessary.
.. admonition:: Avoid the common mistake ‼️
:class: danger
Any special case formulas should only be used when the conditions for the special situation
are fully met. When unsure, always begin with the general version.
ii. Extension to Multiple Events
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The logic used for the two-way addition rule is a special case
of a broader rule which works for cases with multiple events. This rule is called the **inclusion-exclusion principle**.
For the general case with n events A₁, A₂, ..., Aₙ, it follows the list of steps below:
- Add the probabilities of individual events
- Subtract the probabilities of all pairwise intersections
- Add the probabilities of all triple intersections
- Subtract the probabilities of all quadruple intersections
- Continue this pattern, with the sign alternating for each term
For three events A, B, and C, the probability of their union is:
.. math::
P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)
.. admonition:: Example 💡: Two Dice Probability
:class: note
Continued from the previous examples of throwing two dice.
1. Compute :math:`P(A\cup C)` using the **general addition rule**, and confirm
that the answer is the same as our previous approach without the rule.
:math:`A\cap C = \{(3,3)\}`. Using the rule and the fact that all outcomes are equally likely,
.. math::
P(A\cup C) &= P(A) + P(C) - P(A\cap C) \\
&= \frac{|A|}{|\Omega|} + \frac{|C|}{|\Omega|} - \frac{|A\cap C|}{|\Omega|} \\
&= \frac{4}{24} + \frac{4}{24} - \frac{1}{24} = \frac{7}{24}.
2. Compute the probability that the outcome is a double **or** the sum is equal to 10.
The probability can be written as :math:`P(B \cup C)`. Since the two events are
mutually exclusive, we can use the **special addition rule**:
.. math:: P(B \cup C) = P(B) + P(C) = \frac{1}{24} + \frac{4}{24} = \frac{5}{24}.
Bringing It All Together
---------------------------
.. admonition:: Key Takeaways 📝
:class: important
#. **Probability** is a function that maps events to values between 0 and 1,
representing the likelihood of those events occurring.
#. A valid probability function must satisfy three **axioms**: non-negativity,
normalization (sample space has probability 1), and additivity for mutually exclusive events.
#. The **complement rule**: P(A') = 1 - P(A). It isuseful for calculating probabilities
of events defined by "at least one" or "none."
#. The **general addition rule**: P(A ∪ B) = P(A) + P(B) - P(A ∩ B). It can be extended to cases with
multiple events using inclusion-exclusion principle.
In the next chapter, we'll build on these concepts to explore conditional probability and independence,
which allow us to analyze how events influence each other.
Exercises
~~~~~~~~~~~~~~~~~~~~
1. **Basic Concepts**: Explain why each of the following is or is not a valid probability function.
a) P(A) = -0.2 for some event A
b) P(Ω) = 0.95
c) P(A ∪ B) = P(A) + P(B) for all events A and B.
2. **Card Probabilities**: Suppose each card in a standard 52-card deck has an equal chance
of being drawn. Calculate:
a) The probability of drawing a face card (jack, queen, or king).
b) The probability of drawing a red card or an ace.
c) The probability of drawing a card that is neither black nor a face card.
3. **Dice Rolling**: If you roll two fair six-sided dice, find:
a) The probability that their sum equals 7.
b) The probability that their sum equals 2 or 12.
c) The probability that the second die shows a larger value than the first die.
d) The probability that at least one die shows an even number.
4. **Inclusion-Exclusion**: In a group of 100 students, 65 study mathematics, 45 study physics,
and 25 study both subjects. Calculate:
a) The probability that a randomly selected student studies mathematics or physics.
b) The probability that a randomly selected student studies neither subject.
c) The probability that a randomly selected student studies mathematics but not physics.