Slides 📊

5.1. Discrete Random Variables and Probability Mass Distributions

In previous chapters, we used set theory to describe events and their probabilities. While this approach provides a rigorous foundation, it can become cumbersome when dealing with complex scenarios. Random variables offer a more elegant solution by mapping outcomes directly to numbers.

Road Map 🧭

Define random variables as functions that map real-life events of arbitrary complexity to their numerical representations.
Distinguish between discrete and continuous random variables.
Formalize probability mass functions (PMFs) for discrete random variables.
Apply PMFs to calculate probabilities for complex events.

5.1.1. Random Variables: From Sets to Numbers

Definition

A random variable (RV) \(X\) is a function that maps each outcome in the sample space \(\omega \in \Omega\) to a numerical value. Formally, \(X: \Omega \to \mathbb{R}\).

Why is a Random Variable Needed?

Outcomes of random experiments are often multi-faceted and tend to introduce more complexity than necessary. For example, suppose we flip a coin 10 times and count how many heads appear in the sequence. The complete sample space of ten coin flips contains \(2^{10} = 1,024\) different possible sequences.

However, if we’re only interested in the total number of heads, we do not need to examine each sequence individually. For instance, instead of interpreting ‘HHHHHHHHTH’ as a unique sequence, we can view it simply as an outcome that yields the numerical value 9.

This is where a random variable becomes useful. We can define a random variable, say \(X\), to map the outcome ‘HHHHHHHHTH’ to a numerical value that reflects the focus of our interest:

\[X('HHHHHHHHTH') = 9,\]

and all other 1,023 outcomes in a similar manner. By using a random variable, we reduced our focus from 1,024 sequences to just 11 possible values (0 through 10).

Expressing Events With Random Variables

One of the key advantages of introducing a random variable is conciseness. Once an appropriate random variable is defined, most events can be expressed as equalities or inequalities involving the variable. See the table below for some examples:

Description	Using set notation	Using random variable \(X\)
Event that there are three heads in the sequence	Define \(A_3\) as the name of the event. List all sequences with three heads in \(A_3 = \{\cdots\}\).	\(X=3\)
Event that there are more than 7 heads in the sequence	Define \(A_8, A_9, A_{10}\) as the events of sequences with 8, 9, and 10 heads, respectively. The event of interest is \(A_8 \cup A_9 \cup A_{10}\).	\(X > 7\)

We no longer need to define a new event for every new question. Instead, we can express various situations compactly using the random variable \(X\).

5.1.2. Types of Random Variables

Random variables fall into two main categories based on the nature of their possible values.

Discrete Random Variables

A random variable is discrete if it can take on only a countable number of possible values. Discrete random variables typically arise when counting things, such as:

The number of heads in coin flips
The number of website hits during a specific time period
The number of customers until the first big-ticket item is sold

Continuous Random Variables

A random variable is continuous if it can take on any value within a continuous range or interval. Continuous random variables typically arise when measuring quantities, such as:

Height, weight, or other physical measurements
Time until a particular event occurs
Temperature, pressure, or other environmental measurements

5.1.3. Probability Distributions

To describe the probabilistic behavior of a random variable, we must specify the probabilities associated with all its possible values. This complete description is called a probability distribution.

Discrete and continuous random variables have different types of probability distributions. Discrete random variables is described by a probability mass function (PMF), while a continuous random variables is described by a probability density function (PDF).

In this chapter, we will focus on discrete random variables and PMFs. As we progress through the course, we will see how PMFs and PDFs share some foundational ideas, while differing in important ways.

5.1.4. Probability Mass Functions

Definition

The probability mass function of a discrete random variable \(X\) is denoted by \(p_X\). For each possible value \(x\) that \(X\) can take, it gives

\[p_X(x) = P(X = x).\]

Different forms of a PMF

A PMF can be represented in several different forms.

A PMF can be organized into a table by listing the possible values with their corresponding probabilities.

Fig. 5.1 Exapmle of a PMF in table form
A bar graph can visually represent a PMF, with possible values on the x-axis and bar heights indicating their probabilities. However, it is rarely used alone, since exact probabilities are hard to read unless the plot is very simple.
For some special random variables, a mathematical formula is used to describe the PMF. One example is:

\[p_X(x) = \frac{e^{-\lambda} \lambda^x}{x!}, \text{ for } x \geq 0\]

Support

The support of a discrete random variable is the set of all possible values with a positive probability:

\[\text{supp}(X) = \{x \in \mathbb{R} \mid p_X(x) > 0\}.\]

Validity of a PMF

For a probability mass function to be valid, the following conditions must hold.

Non-negativity: For all \(x\), \(0 \leq p_X(x) \leq 1.\)
Total probability of 1: The sum of probabilities over all values in the support must equal 1:

\[\sum_{x \in \text{supp}(X)} p_X(x) = 1\]

5.1.5. Important Types of Problems Involving PMFs

A. Constructing a PMF from Scracth

It is an important skill for statisticians to be able to “translate” descriptions of a random experiment in plain language to mathematical language involving a random variable and its PMF.

Example💡: Flipping a Biased Coin

Let us try constructing a PMF from scratch, only using descriptions of the experimental setting.

Suppose we flip a biased coin four times, where the probability of heads on each flip is 0.7 (and tails is 0.3). We define a random variable \(H\) to count the number of heads in the four flips. Find the complete PMF for \(H\). Verify that the PMF is valid.

First, let’s identify the sample space. There are \(2^4 = 16\) possible sequences of heads and tails over four flips. However, rather than working with all 16 sequences individually, we can group them based on the number of heads:

\(H = 0\): Only one sequence has zero heads (all tails: TTTT)
\(H = 1\): Four sequences have exactly one head (HTTT, THTT, TTHT, TTTH)
\(H = 2\): Six sequences have exactly two heads
\(H = 3\): Four sequences have exactly three heads
\(H = 4\): Only one sequence has all four heads (HHHH)

Using the independence of the coin flips and the given probabilities,

\(P(H = 0) = P(TTTT) = (0.3)^4 = 0.0081\)
\(P(H = 1) = 4 (0.3)^3 (0.7) = 0.0756\)
\(P(H = 2) = 6 (0.3)^2 (0.7)^2 = 0.2646\)
\(P(H = 3) = 4 (0.3) (0.7)^3 = 0.4116\)
\(P(H = 4) = (0.7)^4 = 0.2401\)

All probabilities are betwen 0 and 1, satisfying the first condition for validity. The probabilities also sum to 1:

\[0.0081 + 0.0756 + 0.2646 + 0.4116 + 0.2401 = 1.\]

This gives us the complete PMF for our random variable \(H\).

PMF for biased coin example in table format — Fig. 5.2 Probability mass function for the number of heads in four flips

PMF for biased coin example in graph format — Fig. 5.3 Visualization of the PMF for the number of heads in four flips

The PMF reveals that getting three heads is the most likely outcome, with a probability of approximately 0.41, while getting zero heads is very unlikely, with a probability of only about 0.008.

B. Completing a Partially Known PMF

Completing a partially specified PMF is a common task in statistics. Typical scenarios include:

One probability in the support is unknown.
Multiple probabilities are unknown, with additional constraints provided.
The coefficient \(k\) that turns a non-negative function \(f(x)\) into a valid PMF \(p_X(x) = kf(x)\) is unknown. This constant \(k\) is called the normalization constant.

In all cases, we must “fill in the blanks” by applying the conditions of a valid PMF.

Example💡: Finding the Normalization Constant

Consider a potential PMF:

\[\begin{split}p_X(x) = \begin{cases} \frac{k}{16} & \text{for } x = 0, 1 \\ \frac{k}{32} & \text{for } x = 2, 3, 4 \\ \frac{k}{64} & \text{for } x = 5, 6\\ 0 & \text{for all other} x \end{cases}\end{split}\]

To make this a valid PMF, we need to find the value of k that ensures the probabilities sum to 1:

\[\frac{k}{16} + \frac{k}{16} + \frac{k}{32} + \frac{k}{32} + \frac{k}{32} + \frac{k}{64} + \frac{k}{64} = 1\]

Multiplying both sides by 64 and solving for \(k\),

\[\begin{split}&4k + 4k + 2k + 2k + 2k + k + k = 64\\ &16k = 64 \\ &k = 4\end{split}\]

Therefore, the valid PMF is:

\[\begin{split}p_X(x) = \begin{cases} \frac{1}{4} & \text{for } x = 0, 1 \\ \frac{1}{8} & \text{for } x = 2, 3, 4 \\ \frac{1}{16} & \text{for } x = 5, 6\\ 0 & \text{for all other} x \end{cases}\end{split}\]

C. Calculating Probabilities with PMFs

Once we have a complete PMF, we can calculate probabilities for various events related to the random variable.

Viewing events as equalities and inequalities involving a random variable, we can express probablities of unions, intersections, and complements concisely in terms of \(p_X(x)\). Let us first get some practice writing proability statements correctly in terms of \(X\).

Example: Consider a random variable \(X\) whose support consists of all positive integers.

Probability statements for discrete RV \(X\) with a positive support
Description	Expresssion in terms of \(p_X(x)\)	Comment
Probability that X is less than 4	\[\begin{split}&P(X < 4) \\ &= P(X=1 \text{ OR } X=2 \text{ OR } X=3) \\ &= P(X=1 \cup X=2 \cup X=3)\\ &= P(X=1) + P(X=2) + P(X=3)\\ &= p_X(1) + p_X(2) + p_X(3)\end{split}\]	The transition from the third to the fourth line works because the events \(\{X=x\}\) are disjoint for different values of \(x\) (See the special addition rule).
Probability that X is less than 4 and at least 2	\[\begin{split}&P(X < 4 \cap X \geq 2)\\ &= P(2 \leq X < 4)\\ & = p_X(2) + p_X(3)\end{split}\]	For intersections and unions of non-disjoint events, think of ways to combine the two separate (in)equalities into one.
Probability that X is at least than 4 or greater than 6	\[\begin{split}&P(X4 \geq \cup X>6) \\ &= P(X \geq 4) \\ &= 1 - P(X < 4)\end{split}\]	To compute \(P(X \geq 4)\) directly, we would have to sum infinitely many terms. The complement rule simplifies computation.

Now, let us apply these skills to solve a problem.

Example💡: Computing probabilities using PMF

Using the PMF we just derived, let’s calculate some probabilities.

\[\begin{split}p_X(x) = \begin{cases} \frac{1}{4} & \text{for } x = 0, 1 \\ \frac{1}{8} & \text{for } x = 2, 3, 4 \\ \frac{1}{16} & \text{for } x = 5, 6\\ 0 & \text{for all other} x \end{cases}\end{split}\]

The probability that X is even:

\[\begin{split}P(X \text{ is even}) &= P(X = 0) + P(X = 2) + P(X = 4) + P(X = 6) \\ &= 1/4 + 1/8 + 1/8 + 1/16 = 9/16\end{split}\]
The probability that X is greater than 3:

\[\begin{split}P(X > 3) &= P(X = 4) + P(X = 5) + P(X = 6) \\ &= 1/8 + 1/16 + 1/16 = 1/4\end{split}\]
Are the events \(\{X = 5 or X = 6\}\) and \(\{X > 3\}\) independent?

To show independence between two events \(A\) and \(B\), we must show that they meet the mathematical definition of idependence. That is, we must verify \(P(A|B) = P(A)\) or \(P(B|A)P(A).\)

\[\begin{split}P(X = 5 \text{ or } X = 6 | X > 3) &= \frac{P((X = 5 \cup X = 6) \cap (X > 3))}{P(X > 3)}\\ &= \frac{P(X = 5 \cup X = 6)}{P(X > 3)}\\ &= (1/16 + 1/16)/(1/4) = 1/2\\ P(X = 5 \cup X = 6) &= 1/16 + 1/16 = 1/8\end{split}\]

Since \(1/2 \neq 1/8\), these events are not independent.

5.1.6. Bringing It All Together

Key Takeaways 📝

Random variables map outcomes from the sample space to numerical values, allowing us to focus on quantities of interest rather than complex sets.
Discrete random variables take on countable values and are typically used when counting things, while continuous random variables can take any value in a continuum and are used for measurements.
A probability mass function (PMF) specifies the probability that a discrete random variable equals each possible value in its support.
Valid PMFs must satisfy two conditions:
1. all probabilities are between 0 and 1, and
2. the sum of all probabilities equals 1.
We can calculate probabilities for various events by rewriting the probability statements in terms of the PMF.

Exercises

Terminology Check: Explain the difference between a discrete and a continuous random variable. Give two examples of each that were not mentioned in the chapter.
Dice Sum: Two fair dice are rolled. Let \(X\) be the random variable that represents the sum of the two values.
1. What is the support of \(X\)?
2. Construct the PMF for \(X\).
3. Find \(P(X \text{ is odd})\).
4. Find \(P(X > 8)\).
Card Draw: A card is drawn randomly from a standard deck. Define the random variable \(X\) as follows:
- \(X = 1\) if the card is an ace
- \(X = 11\) if the card is a face card (jack, queen, or king)
- \(X =\) the number on the card for all other cards (2 through 10)
1. Construct the PMF for \(X\).
2. What is \(P(X ≥ 5)\)?
3. Find \(P(X = 11 | X > 5)\).
PMF Validation: Determine if the following functions are valid PMFs for a discrete random variable \(X\). If not, explain why.
1. \(p_X(x) = 0.2\) for \(x = 1, 2, 3, 4, 5\)
2. \(p_X(x) = x/15\) for \(x = 1, 2, 3, 4, 5\)
3. \(p_X(x) = 1/(x+1)\) for \(x = 1, 2, 3, 4\)
Independence Check: For the biased coin example in the chapter (with \(P(Heads) = 0.7\)), let \(H\) be the random variable counting the number of heads in four flips.
1. Are the events \(\{H \text{ is odd}\}\) and \(\{H > 2\}\) independent? Show your work.
2. Find two other events defined in terms of \(H\) that are independent.