5.3. Expected Value of a Discrete Random Variable

A probability mass function provides the probabilities of individual outcomes at a fundamental level, but it also conveys much richer information about the overall behavior of a random variable. The first property we will study is the expected value, which represents the “center” of the values generated by the random variable.

Road Map 🧭

  • Define the expected value of a discrete random variable as a weighted average.

  • Master the Law of the Unconscious Statistician (LOTUS) for computing expectations of functions of random variables.

  • Discover key properties of expectation that simplify calculations.

5.3.1. From Samples to Populations: Expected Value

When analyzing a data set, we use the sample mean to represent the “center” of the data. It is computed by summing all observed values and dividing by the sample size—effectively giving each observation equal weight. If certain values are more likely to occur than others, this tendency is usually reflected in their higher frequencies within the sample.

Behind every generated data set is a random variable and its probability mass function (PMF), which govern how frequently different values are likely to appear. So how do we compute the “center” of this true, or population, distribution?

The key insight is to use probabilities as weights. If certain outcomes are more likely than others, then the center of the distribution should be pulled toward those values.

Definition

The expected value of a discrete random variable \(X\), denoted \(E[X]\) or \(\mu_X\), is the weighted average of all possible values in its support, with weights equal to their probabilities:

\[\mu_X = E[X] = \sum_{x \in \text{supp}(X)} x \, p_X(x) = \sum_{x \in \text{supp}(X)} x \, P(X = x)\]

This formula makes intuitive sense: values that occur with higher probability contribute more to the average, while rare outcomes have less influence.

Example💡: Salamander Insurance Company

Salamander Insurance Company (SIC) has collected data on moving violations among its customers over a three-year period. They want to understand the typical number of violations to better price their policies.

Let \(X\) denote the number of moving violations for a randomly selected customer, with the following distribution:

\(x\)

0

1

2

3

\(p_X(x)\)

0.60

0.25

0.10

0.05

Most customers have no violations, while progressively fewer have one, two, or three violations. The expected number of violations is:

\[\begin{split}E[X] &= (0)(0.60) + (1)(0.25) + (2)(0.10) + (3)(0.05) \\ &= 0.25 + 0.20 + 0.15 = 0.60\end{split}\]

On average, a customer has 0.6 moving violations over three years. This value helps SIC understand the typical violation behavior, even though no individual customer ever has exactly 0.6 violations.

Do Expected Values Always Exist?

For a random variable with finite support, say \(\text{supp}(X) = \{x_1, x_2, \cdots, x_n\}\) , the expected value

\[E[X] = \sum_{i=1}^{n} x_i \, p_X(x_i)\]

is always well-defined, since it is a sum of finitely many finite terms.

However, when dealing with a random variable with countably infinite support, we need an additional condition to ensure the expected value is well-defined. Specifically, the sum must be absolutely convergent:

\[\sum_{x \in \text{supp}(X)} |x| \, p_X(x) < \infty\]

If this condition isn’t met, the expected value might not exist or might depend on the order of summation, creating mathematical ambiguities. Fortunately, all the random variables we’ll encounter in this course have well-defined expectations.

Important Facts About Expected Values

  • The expected value does not necessarily correspond to a value in the support of the random variable.

    For instance, the expected number of heads in one flip of a fair coin is 0.5, even though any single flip must result in either 0 or 1 heads, never 0.5. This is because the expected value reflects the long-term average behavior rather than any single outcome.

  • The expected value has a few different names; expectation, mean of a random variable, and average of a random variable are all alternative expressions for expected value.

    Note that we are now working with two types of means, each with its own definition and formula—the sample mean (from data) and the expected value (from a probability distribution). When confused between the two, pay attention to whether you are working with a data or a random variable.

5.3.2. Expected Value of Functions of a Random Variable

In practice, we’re often interested not just in a random variable itself, but in some function of that variable. For example,

  • if \(X\) represents a measurement in inches, we might want to convert it to centimeters (\(2.54X\)), or

  • if X represents a count, we might be interested in \(X^2\).

We use The Law of the Unconscious Statistician (LOTUS) for these calculations.

Law of the Unconscious Statistician (LOTUS)

If \(X\) is a discrete random variable and \(g(·)\) is a real-valued function defined at least on the support of \(X\), then:

\[E[g(X)] = \sum_{x\in \text{supp}(X)} g(x) \, p_X(x)\]

This theorem tells us that to find the expected value of \(g(X)\), we don’t need to derive the PMF of the new random variable \(Y = g(X)\). Instead, we can simply apply the function \(g\) to each value in the support of \(X\), weight these transformed values by their original probabilities, and sum them up.

LOTUS greatly simplifies calculations involving functions of random variables, as we’ll see in the examples that follow.

Example💡: Salamander Insurance Company, Continued

Compute the expectation of \(X^2\), where \(X\) represents the number of moving violations made by a randomly selected customer.

\(x\)

0

1

2

3

\(p_X(x)\)

0.60

0.25

0.10

0.05

Let \(g(x) = x^2\). Using LOTUS,

\[\begin{split}E[X^2] &= g(0)p_X(0) + g(1)p_X(1) + g(2)p_X(2)+ g(3)p_X(3)\\ &=(0^2)(0.6) + (1^2)(0.25) +(2^2)(0.1) + (3^2)(0.05) = 1.1\end{split}\]

Be cautious 🛑

It is generally NOT true that \(E[g(X)] = g(E[X])\). In the Salamander Insurance Company example, \((E[X])^2 = (0.6)^2\) which is not equal to \(E[X^2] = 1.1.\)

5.3.3. Leveraging Properties of Expected Value

The expected value follows several elegant properties that streamline calculations. Let’s explore the most important ones.

Linearity of Expectation

If \(g\) is a linear function of the form \(g(x) = ax + b\), where \(a\) and \(b\) are constants, we have:

\[E[aX + b] = a \, E[X] + b.\]

This property is particularly useful because it allows us to “push” the expected value operation through linear operations. For example, if we know \(E[X] = 3\) and we want to find \(E[2X + 5]\), we can directly calculate \(E[2X + 5] = 2·3 + 5 = 11\), without repeating the summation.

We can verify this property using LOTUS:

\[\begin{split}E[aX + b] &= \sum_{x\in \text{supp}(X)} (ax + b) \, p_X(x) \\ &= a \sum_{x\in \text{supp}(X)} x \, p_X(x) + b \sum_{x\in \text{supp}(X)} p_X(x) \\ &= a \, E[X] + b \cdot 1 \\ &= a \, E[X] + b\end{split}\]

The second sum equals 1 because the probabilities in a valid PMF must sum to 1.

Additivity of Expectation

The expected value operation also distributes over sums and differences of random variables. For any two random variables \(X\) and \(Y\):

\[E[X \pm Y] = E[X] \pm E[Y].\]

This property extends naturally to any finite collection of random variables:

\[E[X_1 \pm X_2 \pm \cdots \pm X_n] = E[X_1] \pm E[X_2] \pm \cdots \pm E[X_n].\]

A remarkable aspect of this property is that it holds regardless of whether the random variables are independent. While many properties involving multiple random variables depend on independence, additivity of expectation does not.

The proof of this property involves working with the joint probability mass function, but the intuition is that expected values represent long-term averages, and the average of a sum equals the sum of the averages.

Monotonicity

There’s one more property worth mentioning briefly: monotonicity. If a random variable \(X\) is always less than or equal to another random variable \(Y\) (\(P(X > Y) = 0\)), then:

\[E[X] \leq E[Y].\]

Example💡: SIC, Continued

SIC also tracks the number of accidents (Y) and has compiled the joint distribution of violations and accidents:

\(x\) \ \(y\)

0

1

2

\(p_X(x)\)

0

0.58

0.015

0.005

0.60

1

0.18

0.058

0.012

0.25

2

0.02

0.078

0.002

0.10

3

0.02

0.029

0.001

0.05

\(p_Y(y)\)

0.80

0.18

0.02

1

The company determines monthly premiums based on both factors, using the formula:

\[g(X,Y) = 95 + 10Y^3 + 120Y + 25X\]

This formula incorporates a base rate ($95), a penalty that scales linearly with violations ($25X), and terms that increase dramatically with accidents (both linear and cubic terms for Y).

To find the expected monthly premium across all customers, we use the additivity and linearity properties:

\[\begin{split}E[g(X,Y)] &= E[95 + 10Y^3 + 120Y + 25X] \\ &= 95 + 10E[Y^3] + 120E[Y] + 25E[X].\end{split}\]

We already calculated \(E[X] = 0.6\). For the remaining terms,

\[\begin{split}E[Y] &= (0)(0.80) + (1) (0.18) + (2) (0.02) = 0.22 \\ E[Y^3] &= (0^3) (0.80) + (1^3) (0.18) + (2^3) (0.02) \\ &= 0 + 0.18 + 0.16 = 0.34.\end{split}\]

Substituting these values:

\[\begin{split}E[g(X,Y)] &= 95 + (10) (0.34) + (120) (0.22) + (25) (0.6) \\ &= 95 + 3.4 + 26.4 + 15 = 139.8.\end{split}\]

Therefore, the expected monthly premium is $139.80.

5.3.4. Bringing It All Together

Expected value provides a measure of central tendency for random variables, giving us insight into their typical behavior over many repetitions. Just as the sample mean describes the center of a dataset, the expected value describes the center of a probability distribution.

Key Takeaways 📝

  1. The expected value of a discrete random variable represents its long-term average behavior—a weighted average of all possible outcomes with weights equal to their probabilities.

  2. The Law of the Unconscious Statistician (LOTUS) provides a method for finding the expected value of a function of a random variable.

  3. Expected values follow linearity: \(E[aX + b] = aE[X] + b\), allowing us to simplify calculations involving linear transformations.

  4. Expected values are additive: \(E[X + Y] = E[X] + E[Y]\), regardless of whether X and Y are independent, which greatly simplifies calculations involving sums of random variables.

To fully characterize a random variable, we need more than just its expected value. In the next section, we’ll explore how to quantify the spread or variability of a discrete random variable around its expected value—a concept analogous to the sample variance and standard deviation of a dataset.

Exercises

  1. Basic Calculation: For a random variable \(X\) with PMF \(p_X(-2) = 0.3, p_X(0) = 0.4\), and \(p_X(3) = 0.3\), calculate:

    1. \(E[X]\)

    2. \(E[X^2]\)

    3. \(E[3X - 5]\)

  2. Lottery Analysis: A lottery ticket costs $2 and has the following payoffs:

    • $0 with probability 0.9

    • $5 with probability 0.08

    • $20 with probability 0.01

    • $100 with probability 0.009

    • $1000 with probability 0.001

    Calculate the expected net gain (payoff minus cost) for a single ticket. Is this lottery favorable to the player?

  3. Insurance Model: An insurance company offers a policy that costs $200. The probability that the policyholder will make a claim during the coverage period is 0.05, and the expected claim amount, given that a claim is made, is $3000. What is the expected profit for the insurance company per policy?

  4. Working with Joint Distributions: Given the joint PMF of random variables \(X\) and \(Y\),

    \(x\) \\(y\)

    1

    2

    3

    1

    0.1

    0.2

    0.1

    2

    0.1

    0.3

    0.2

    Calculate:

    1. \(E[X]\)

    2. \(E[Y]\)

    3. \(E[X + Y]\) without using linearity (using LOTUS directly).

    4. \(E[XY]\)

    5. Verify that \(E[X + Y] = E[X] + E[Y]\).

    6. Is \(E[XY]=E[X]E[Y]\) true?