Slides 📊
5.3. Expected Value of a Discrete Random Variable
A probability mass function provides the probabilities of individual outcomes at a fundamental level, but it also conveys much richer information about the overall behavior of a random variable. The first property we will study is the expected value, which represents the “center” of the values generated by the random variable.
Road Map 🧭
Define the expected value of a discrete random variable as a weighted average.
Master the Law of the Unconscious Statistician (LOTUS) for computing expectations of functions of random variables.
Discover key properties of expectation that simplify calculations.
5.3.1. From Samples to Populations: Expected Value
When analyzing a data set, we use the sample mean to represent the “center” of the data. It is computed by summing all observed values and dividing by the sample size—effectively giving each observation equal weight. If certain values are more likely to occur than others, this tendency is usually reflected in their higher frequencies within the sample.
Behind every generated data set is a random variable and its probability mass function (PMF), which govern how frequently different values are likely to appear. So how do we compute the “center” of this true, or population, distribution?
The key insight is to use probabilities as weights. If certain outcomes are more likely than others, then the center of the distribution should be pulled toward those values.
Definition
The expected value of a discrete random variable \(X\), denoted \(E[X]\) or \(\mu_X\), is the weighted average of all possible values in its support, with weights equal to their probabilities:
This formula makes intuitive sense; values that occur with higher probability contribute more to the average, while rare outcomes have less influence.
Example💡: Salamander Insurance Company
Salamander Insurance Company (SIC) has collected data on moving violations among its customers over a three-year period. They want to understand the typical number of violations to better price their policies.
Let \(X\) denote the number of moving violations for a randomly selected customer, with the following distribution:
\(x\) |
0 |
1 |
2 |
3 |
|---|---|---|---|---|
\(p_X(x)\) |
0.60 |
0.25 |
0.10 |
0.05 |
Most customers have no violations, while progressively fewer have one, two, or three violations. The expected number of violations is:
On average, a customer has 0.6 moving violations over three years. This value helps SIC understand the typical violation behavior, even though no individual customer ever has exactly 0.6 violations.
Do Expected Values Always Exist?
For a random variable with finite support, say \(\text{supp}(X) = \{x_1, x_2, \cdots, x_n\}\) , the expected value
is a sum of finitely many finite terms and is therefore always well-defined.
However, when dealing with a random variable with countably infinite support, we need an additional condition to ensure the expected value is well-defined. Specifically, the sum must be absolutely convergent:
If this condition is not met, the expected value might not exist or might depend on the order of summation, creating mathematical ambiguities. Fortunately, all the random variables we encounter in this course have well-defined expectations.
Important Facts About Expected Values
The expected value does not necessarily correspond to a value in the support.
For instance, the expected number of heads in one flip of a fair coin is 0.5, even though any single flip must result in either 0 or 1 heads, never 0.5. The expected value reflects the long-term average behavior rather than any single outcome.
The expected value has a few different names. Expectation, mean of a random variable, and average of a random variable are all alternative expressions for expected value.
Note that we are now working with two types of means, each with its own definition and formula—the sample mean (from data) and the expected value (from a probability distribution). When confused between the two, pay attention to whether the mean is for data or a random variable.
5.3.2. Expected Value of Functions of a Random Variable
In practice, we’re often interested not only in a random variable itself, but also in its functions. For example,
If \(X\) represents a measurement in inches, we might want to convert it to centimeters (\(2.54X\)).
If \(Y\) represents a count, we might be interested in \(Y^2\).
We use the Law of the Unconscious Statistician (LOTUS) to understand the central behavior of functions of a random variable.
Law of the Unconscious Statistician (LOTUS)
If \(X\) is a discrete random variable and \(g(·)\) is a real-valued function defined on the support of \(X\), then
This theorem tells us that to find the expected value of \(g(X)\), we don’t need to derive the PMF of the new random variable \(Y = g(X)\). Instead, we can simply apply the function \(g\) to each value in the support of \(X\), weight these transformed values by their original probabilities, and sum them up.
LOTUS greatly simplifies calculations involving functions of random variables, as we’ll see in the examples that follow.
Example💡: Salamander Insurance Company, Continued
Compute the expectation of \(X^2\), where \(X\) represents the number of moving violations made by a randomly selected customer.
\(x\) |
0 |
1 |
2 |
3 |
|---|---|---|---|---|
\(p_X(x)\) |
0.60 |
0.25 |
0.10 |
0.05 |
Let \(g(x) = x^2\). Using LOTUS,
Be cautious 🛑
It is generally NOT true that \(E[g(X)] = g(E[X])\). In the Salamander Insurance Company example, \((E[X])^2 = (0.6)^2\) which is not equal to \(E[X^2] = 1.1.\)
5.3.3. Leveraging Properties of Expected Value
The expected value has several elegant properties that streamline calculations. Let’s explore the most important ones.
A. Linearity of Expectation
If \(g\) is a linear function of the form \(g(x) = ax + b\), where \(a\) and \(b\) are constants, we have:
This property is particularly useful because it allows us to “push” the expected value operation through linear operations. For example, if we know \(E[X] = 3\) and we want to find \(E[2X + 5]\), we can directly calculate \(E[2X + 5] = 2·3 + 5 = 11\), without repeating the summation.
We can verify this property using LOTUS:
\(\sum_{x\in \text{supp}(X)} p_X(x)=1\) due to the second condition for a valid PMF.
B. Additivity of Expectation
The expected value operation also distributes over sums and differences of random variables. For any two random variables \(X\) and \(Y\):
This property extends naturally to any finite collection of random variables:
A remarkable aspect of this property is that it holds regardless of whether the random variables are independent. While many properties involving multiple random variables depend on independence, additivity of expectation does not.
C. Monotonicity
There’s one more property worth mentioning briefly: monotonicity. If a random variable \(X\) is always less than or equal to another random variable \(Y\) (\(P(X > Y) = 0\)), then:
Example💡: SIC, Continued
SIC also tracks the number of accidents (Y) and has compiled the joint distribution of moving violations and accidents:
\(x\) \ \(y\) |
0 |
1 |
2 |
\(p_X(x)\) |
|---|---|---|---|---|
0 |
0.58 |
0.015 |
0.005 |
0.60 |
1 |
0.18 |
0.058 |
0.012 |
0.25 |
2 |
0.02 |
0.078 |
0.002 |
0.10 |
3 |
0.02 |
0.029 |
0.001 |
0.05 |
\(p_Y(y)\) |
0.80 |
0.18 |
0.02 |
1 |
The company determines monthly premiums based on both factors, using the formula:
This formula incorporates a base rate ($95), a penalty that scales linearly with violations ($25X), and terms that increase dramatically with accidents (both linear and cubic terms for Y).
To find the expected monthly premium across all customers, we use the additivity and linearity properties:
We already calculated \(E[X] = 0.6\). For the remaining terms,
Substituting these values,
Therefore, the expected monthly premium is $139.80.
5.3.4. Bringing It All Together
Key Takeaways 📝
The expected value of a discrete random variable represents its long-term average behavior—a weighted average of all possible outcomes with weights equal to their probabilities.
The Law of the Unconscious Statistician (LOTUS) provides a method for finding the expected value of a function of a random variable.
Expected values follow linearity: \(E[aX + b] = aE[X] + b\), allowing us to simplify calculations involving linear transformations.
Expected values are additive: \(E[X + Y] = E[X] + E[Y]\), regardless of whether X and Y are independent, which greatly simplifies calculations involving sums of random variables.
To fully characterize a random variable, we need more than just its expected value. In the next section, we’ll explore how to quantify the spread or variability of a discrete random variable around its expected value—a concept analogous to the sample variance and standard deviation of a dataset.
5.3.5. Exercises
These exercises develop your skills in computing expected values, applying the Law of the Unconscious Statistician (LOTUS), and using the linearity and additivity properties of expectation.
Exercise 1: Basic Expected Value Calculation
A software quality assurance team tracks the number of bugs \(X\) found per code review session. Based on historical data, the PMF is:
\(x\) |
0 |
1 |
2 |
3 |
4 |
|---|---|---|---|---|---|
\(p_X(x)\) |
0.30 |
0.35 |
0.20 |
0.10 |
0.05 |
Calculate \(E[X]\), the expected number of bugs per review.
Calculate \(E[X^2]\).
If each bug takes an average of 2 hours to fix, and there’s a 30-minute (0.5 hour) overhead for each review session regardless of bugs found, calculate the expected total time spent per review session. That is, find \(E[2X + 0.5]\).
Verify your answer to part (c) using the linearity property.
Solution
Part (a): E[X]
The expected number of bugs per review is 1.25 bugs.
Part (b): E[X²]
Using LOTUS with \(g(x) = x^2\):
Part (c): E[2X + 0.5] using LOTUS directly
Using LOTUS with \(g(x) = 2x + 0.5\):
Part (d): Verification using linearity
By linearity of expectation: \(E[aX + b] = aE[X] + b\)
✓ Both methods give the same answer: 3.0 hours expected time per review session.
Exercise 2: LOTUS with Non-Linear Functions
A data center monitors server response times. Due to system architecture, the actual processing time \(T\) (in milliseconds) follows this distribution:
\(t\) |
10 |
20 |
50 |
|---|---|---|---|
\(p_T(t)\) |
0.70 |
0.20 |
0.10 |
The server’s power consumption (in watts) is modeled as \(P(t) = 5 + 0.1t^2\).
Calculate \(E[T]\), the expected processing time.
Calculate the expected power consumption \(E[P(T)] = E[5 + 0.1T^2]\).
A naive engineer calculates power consumption by plugging the expected time into the formula: \(P(E[T]) = 5 + 0.1(E[T])^2\). What value do they get?
Explain why \(E[P(T)] \neq P(E[T])\). Which is the correct expected power consumption?
Solution
Part (a): E[T]
Part (b): E[P(T)] = E[5 + 0.1T²]
Using linearity and LOTUS:
First, find \(E[T^2]\):
Therefore:
Part (c): Naive calculation P(E[T])
Part (d): Explanation
The naive calculation gives 30.6 watts, but the correct expected power is 45 watts.
This illustrates a fundamental principle: E[g(X)] ≠ g(E[X]) in general, unless \(g\) is a linear function.
The function \(P(t) = 5 + 0.1t^2\) is convex (curves upward). For convex functions, Jensen’s inequality tells us \(E[g(X)] \geq g(E[X])\). Indeed, 45 > 30.6.
The correct expected power consumption is 45 watts (using LOTUS). The naive approach underestimates power usage because it ignores the variability in processing times—occasional long processes (50 ms) consume disproportionately more power due to the squared term.
Exercise 3: Expected Value with Negative Outcomes
A venture capital firm evaluates startup investments. For a typical $100,000 investment, let \(X\) represent the return multiplier (how many times the investment is returned):
\(x\) |
0 |
0.5 |
1 |
3 |
10 |
|---|---|---|---|---|---|
\(p_X(x)\) |
0.40 |
0.25 |
0.20 |
0.10 |
0.05 |
Note: \(x = 0\) means total loss, \(x = 1\) means breaking even, \(x = 3\) means tripling the investment, etc.
Calculate \(E[X]\), the expected return multiplier.
If the firm invests $100,000, what is the expected dollar return? (Hint: The dollar return is \(100000 \cdot X\))
What is the expected profit (return minus initial investment)? Is this a profitable investment strategy on average?
The firm considers a “safe” alternative that guarantees \(x = 1.1\) (10% return). Which option has higher expected return multiplier?
Solution
Part (a): E[X]
The expected return multiplier is 1.125.
Part (b): Expected dollar return
Part (c): Expected profit
Profit = Return - Investment = \(100000X - 100000 = 100000(X - 1)\)
Yes, this is a profitable investment strategy on average, with an expected profit of $12,500 per investment (12.5% expected return).
Part (d): Comparison with safe alternative
Risky strategy: \(E[X] = 1.125\) (12.5% expected return)
Safe alternative: \(X = 1.1\) with certainty (10% guaranteed return)
The risky strategy has a higher expected return multiplier (1.125 > 1.1).
However, the risky strategy has a 40% chance of total loss and only a 15% chance of exceeding the safe return. The choice depends on risk tolerance—expected value alone doesn’t capture the full picture.
Exercise 4: Additivity of Expectation
A manufacturing plant has two production lines. Let \(X\) = number of defective items from Line A per hour and \(Y\) = number of defective items from Line B per hour.
The marginal PMFs are:
\(x\) |
0 |
1 |
2 |
|---|---|---|---|
\(p_X(x)\) |
0.70 |
0.20 |
0.10 |
\(y\) |
0 |
1 |
2 |
|---|---|---|---|
\(p_Y(y)\) |
0.60 |
0.30 |
0.10 |
Calculate \(E[X]\) and \(E[Y]\).
Using additivity of expectation, find \(E[X + Y]\), the expected total defects per hour.
Each defective item costs $50 to rework. Line A has a fixed hourly operating cost of $200, and Line B has a fixed cost of $150. Find the expected total hourly cost: \(E[200 + 50X + 150 + 50Y]\).
Does your calculation in part (c) require knowing whether X and Y are independent? Explain.
Solution
Part (a): E[X] and E[Y]
Part (b): E[X + Y] using additivity
By additivity of expectation:
Part (c): Expected total hourly cost
Using linearity and additivity:
The expected total hourly cost is $395.
Part (d): Independence not required
No, the calculation does not require knowing whether X and Y are independent.
The additivity property \(E[X + Y] = E[X] + E[Y]\) holds regardless of whether X and Y are independent. This is one of the most powerful aspects of expected value—we can compute expected values of sums using only the marginal distributions.
Exercise 5: Insurance Premium Calculation
An auto insurance company models claims using random variable \(N\) = number of claims per year for a policyholder:
\(n\) |
0 |
1 |
2 |
3 |
|---|---|---|---|---|
\(p_N(n)\) |
0.75 |
0.15 |
0.07 |
0.03 |
Each claim has a cost \(C\) that is independent of \(N\), with:
\(c\) |
$500 |
$2000 |
$5000 |
|---|---|---|---|
\(p_C(c)\) |
0.60 |
0.30 |
0.10 |
Calculate \(E[N]\), the expected number of claims per year.
Calculate \(E[C]\), the expected cost per claim.
The total annual payout for a policyholder is \(N \cdot C\) if we assume each claim costs the expected amount. Using the approximation Total Payout ≈ \(N \cdot E[C]\), find \(E[N \cdot E[C]]\).
The company adds a $100 administrative fee plus a 20% markup on expected payouts to set premiums. What annual premium should they charge?
What profit does the company expect per policyholder?
Solution
Part (a): E[N]
Part (b): E[C]
Part (c): E[N · E[C]]
Since \(E[C] = 1400\) is a constant:
The expected annual payout is $532 per policyholder.
Part (d): Annual premium calculation
Premium = Administrative fee + (1 + markup) × Expected payout
The company should charge $738.40 annually.
Part (e): Expected profit per policyholder
Alternatively: The 20% markup on $532 is $106.40.
Expected profit per policyholder is $106.40 (representing the 20% margin).
5.3.6. Additional Practice Problems
True/False Questions (1 point each)
The expected value of a discrete random variable must be one of the values in its support.
Ⓣ or Ⓕ
For any function \(g\), \(E[g(X)] = g(E[X])\).
Ⓣ or Ⓕ
\(E[X + Y] = E[X] + E[Y]\) holds only when X and Y are independent.
Ⓣ or Ⓕ
If \(E[X] = 5\), then \(E[3X - 2] = 13\).
Ⓣ or Ⓕ
The expected value can be interpreted as the long-run average of repeated observations.
Ⓣ or Ⓕ
If \(X\) is a random variable with \(E[X] = 3\), then \(E[X^2] = 9\).
Ⓣ or Ⓕ
Multiple Choice Questions (2 points each)
A random variable \(X\) has \(E[X] = 4\) and \(E[X^2] = 20\). What is \(E[X^2 - 2X + 1]\)?
Ⓐ 13
Ⓑ 9
Ⓒ 17
Ⓓ 5
If \(X\) has PMF \(p_X(1) = 0.5\) and \(p_X(3) = 0.5\), what is \(E[X]\)?
Ⓐ 1
Ⓑ 2
Ⓒ 3
Ⓓ 4
A game costs $5 to play. You win $20 with probability 0.2 and $0 otherwise. What is the expected net gain?
Ⓐ $4
Ⓑ -$1
Ⓒ $0
Ⓓ -$5
A random variable \(X\) takes values 1, 2, and 3 with equal probability. What is \(E[X^2]\)?
Ⓐ 2
Ⓑ 4
Ⓒ 14/3
Ⓓ 6
Answers to Practice Problems
True/False Answers:
False — The expected value is a weighted average and need not be in the support. Example: E[X] = 0.5 for a fair coin flip where X ∈ {0, 1}.
False — This only holds for linear functions. In general, E[g(X)] ≠ g(E[X]). Example: E[X²] ≠ (E[X])² unless X is constant.
False — Additivity of expectation E[X + Y] = E[X] + E[Y] holds regardless of independence. This is one of the most useful properties of expected value.
True — By linearity: E[3X - 2] = 3E[X] - 2 = 3(5) - 2 = 15 - 2 = 13.
True — This is the frequentist interpretation of expected value. By the Law of Large Numbers, the sample mean converges to E[X] as the number of observations increases.
False — In general, \(E[X^2] \neq (E[X])^2\). This is only true if X is a constant (has zero variance). For example, if X takes values 0 and 6 with equal probability, E[X] = 3 but E[X²] = (0² + 6²)/2 = 18 ≠ 9.
Multiple Choice Answers:
Ⓐ — Using linearity:
E[X² - 2X + 1] = E[X²] - 2E[X] + 1 = 20 - 2(4) + 1 = 20 - 8 + 1 = 13.
Ⓑ — E[X] = (1)(0.5) + (3)(0.5) = 0.5 + 1.5 = 2.
Ⓑ — Expected winnings = (20)(0.2) + (0)(0.8) = 4. Net gain = 4 - 5 = -$1.
This is an unfavorable game for the player.
Ⓒ — E[X²] = (1²)(1/3) + (2²)(1/3) + (3²)(1/3) = (1 + 4 + 9)/3 = 14/3 ≈ 4.67.
Note that E[X] = (1 + 2 + 3)/3 = 2, and (E[X])² = 4 ≠ 14/3 = E[X²], illustrating that E[X²] ≠ (E[X])² in general.