5.4. Varianace of a Discrete Random Variable

Just as the expected value tells us about the center of a probability distribution, we often need to quantify how spread out or dispersed the values are around this center. Variance and standard deviation provide this crucial second dimension to our understanding of random variables.

Road Map 🧭

  • Define variance and standard deviation for discrete random variables.

  • Explore an alternative computational formula for variance.

  • Derive key properties of variance for linear transformations and sums.

5.4.1. From Sample to Population: Defining Variance

In our exploration of sample statistics, we measured the spread of data using sample variance—the average of squared deviations from the mean. For random variables, we take a similar approach, but with an important twist. Instead of averaging deviations with equal weights, we weight each deviation by its probability.

Definition

The variance of a discrete random variable \(X\), denoted \(Var(X)\) or \(\sigma^2_X\), is the expected value of the squared deviation of \(X\) from its mean:

\[\sigma_X^2 = \text{Var}(X) = E[(X - \mu_X)^2] = \sum_{x\in\text{supp}(X)} (x - \mu_X)^2 p_X(x).\]

The standard deviation, denoted \(\sigma_X\), is simply the square root of the variance:

\[\sigma_X = \sqrt{\text{Var}(X)} = \sqrt{E[(X - \mu_X)^2]}.\]

Note the variance has squared units (e.g., dollars² if \(X\) is in dollars). The standard deviation returns us to the original units, making it often more interpretable in practice.

This definition requires that the series be absolutely convergent for the variance to be well-defined.

A Computational Shortcut for Variance

Calculating variance directly from its definition can be cumbersome. Fortunately, there’s an equivalent formula that is typically easier to apply:

\[\sigma_X^2 = E[X^2] - \mu_X^2.\]

This can be derived by expanding the squared term in the original definition then simplifying:

\[\begin{split}\begin{align} \text{Var}(X) &= E[(X - \mu_X)^2] \\ &= E[X^2 - 2X\mu_X + \mu_X^2] \\ &= E[X^2] - 2\mu_X E[X] + \mu_X^2 \\ &= E[X^2] - 2\mu_X \mu_X + \mu_X^2 \\ &= E[X^2] - \mu_X^2 \end{align}\end{split}\]

This computational formula often simplifies the work significantly, as we’ll see in our examples.

Example💡: Bean & Butter

Bean & Butter is a small campus café that sells only two morning items: coffee for $4 per cup and pastry for $3 each. The shop records its sales in waves—each wave is short enough that \(X\) (cups of coffee) and \(Y\) (pastries) follow a stable pattern but long enough to summarize cleanly.

It is known that \(X\) and \(Y\) are independent. The sales distribution for a single wave is:

Outcome

\(p_X(x)\) (coffee)

\(p_Y(y)\) (pastry)

0

0.20

0.30

1

0.50

0.40

2

0.30

0.30

Let us first compute the expected sales count of coffee and pastries.

\[\begin{split}E[X] &= (0) (0.2) + (1) (0.5) + (2) (0.3) = 1.1\\ E[Y] &= (0) (0.3) + (1) (0.4) + (2) (0.3) = 1.0\end{split}\]

On average, 1.1 cups of cofee and 1.0 patry are sold per wave.

For staffing, buying milk, or setting aside cash for the till, the owner also cares about variability of sales—how much does an individual wave fluctuate from the average?

To answer this question, we compute the variance and standard deviation of each random variable. For cofee,

\[\begin{split}E[X^2] &= (0^2) (0.2) + (1^2) (0.5) + (2^2) (0.3) = 1.7\\ \text{Var}(X) &= E[X^2]- (E[X])^2 =1.7 - 1.1^2 = 0.49\\ \sigma_X &= \sqrt{0.49} \approx 0.70\end{split}\]

Similarly for pastries:

\[\begin{split}E[Y^2] &= (0^2) (0.3) + (1^2) (0.4) + (2^2) (0.3) = 1.6\\ \text{Var}(Y) &= 1.6 - 1.0^2 = 0.60\\ \sigma_Y &= \sqrt{0.60} \approx 0.77\end{split}\]

A standard deviation of about 0.70 coffees and 0.77 pastries tells us that, in a typical wave, each count strays by roughly three-quarters of an item from its own average.

5.4.2. Properties of Variance

Variance has several key properties that make calculations more manageable, especially when dealing with linear transformations of random variables.

A. Variance of Linear Transformations

For a linear transformation of a random variable, \(g(X) = aX + b\), where \(a\) and \(b\) are constants:

\[\text{Var}(aX + b) = a^2 \text{Var}(X).\]

Notice two important implications:

  • Scaling a random variable by a factor of \(a\) multiplies its variance by \(a^2\).

  • Adding a constant \(b\) has no effect on variance.

This makes intuitive sense. Multiplying all values by \(a\) stretches (or compresses) the distribution, amplifying (or reducing) the deviations also by a factor of \(a\). But since deviations are squared in the variance calculation, the variance increases by a factor of \(a^2\). Meanwhile, shifting all values by adding a constant \(b\) moves the entire distribution without stretching or compressing its width.

We can prove this property using the computational formula for variance:

\[\begin{split}\begin{align} \text{Var}(aX + b) &= E[(aX + b)^2] - (E[aX + b])^2 \\ &= E[a^2X^2 + 2abX + b^2] - (a\mu_X + b)^2 \\ &= a^2E[X^2] + 2abE[X] + b^2 - a^2\mu_X^2 - 2ab\mu_X - b^2 \\ &= a^2E[X^2] + 2ab\mu_X + b^2 - a^2\mu_X^2 - 2ab\mu_X - b^2 \\ &= a^2E[X^2] - a^2\mu_X^2 \\ &= a^2(E[X^2] - \mu_X^2) \\ &= a^2\text{Var}(X) \end{align}\end{split}\]

B. Variance of Sums of Independent RVs

For independent random variables, the variance of their sum equals the sum of their individual variances:

\[\text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y)\]

This extends to any number of mutually independent random variables:

\[\text{Var}(X_1 \pm X_2 \pm \cdots \pm X_n) = \text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n)\]

Why do the negative signs disappear?

You can think of the negative signs as coefficients (-1) multiplied to the following random variable. Then using the first property of variance,

\[\begin{split}Var(X-Y) &= Var(X + (-1)Y)\\ &= Var(X)+(-1)^2Var(Y) = Var(X) + Var(Y).\end{split}\]

C. Variance of Linear Combinations of Independent RVs

For linear combinations of independent random variables:

\[\text{Var}(aX \pm bY) = a^2\text{Var}(X) + b^2\text{Var}(Y)\]

This simply combines the two properties we’ve just seen.

Using Variance Properties to Compute Standard Deviation

Properties of variance do not apply to standard deviations, in general. To compute the standard deviation of a linear compbination of random variables, always compute the variance first, then take its square root.

Example💡: Bean & Butter, Continued

Consider the revenue per wave at Beans & Butter:

\[R = 4X + 3Y.\]

Item-wise revenues are first computed by multiplying the price of each item by its sales count. These individual revenues are then added up to obtain the total revenue.

What is the standard deviation of the total revenue?

We begin by computing the variance of \(R\). Since \(X\) and \(Y\) are independent, we can use the third property of variance:

\[\begin{split}\begin{aligned} \text{Var}(R) &= 4^2 \text{Var}(X) + 3^2 \text{Var}(Y)\\ &= 16(0.49) + 9(0.60) = 13.24\\ \sigma_R &= \sqrt{13.24} \approx \$3.64 \end{aligned}\end{split}\]

The standard deviation of revenue per wave is $3.64.

Suppose a new random variable \(Z\) represents the cost per wave of running the store. It is known that \(\sigma_Z = 2.2\) and that \(Z\) is independent of \(X\) and \(Y\).

What is the standard deviation in the total profit per wave?

The total profit can be expressed as \(P = R - Z\).

Again, begin by computing the variance of \(P\) first. Because \(R\) and \(Z\) are independent, we can use the second property of variance:

\[\begin{split}Var(P) &= Var(R) + Var(Z) = 13.24 + 2.2^2 = 18.08\\ \sigma_P &= \sqrt{18.08} \approx 4.2521.\end{split}\]

Note that the negative sign between \(R\) and \(Z\) disappears.

5.4.3. Common Mistakes to Avoid

When working with variance and standard deviation, be careful to avoid these common errors:

Common Mistakes to Avoid 🛑

  1. Forgetting to square the coefficient in variance

\(Var(aX) = a²Var(X)\), not \(aVar(X)\).

  1. Not including the negative sign when squaring the coefficient

\(Var(-aX) = (-a)^2Var(X)\). \((-a)^2\) is positive!

  1. Assuming standard deviations add

For independent \(X\) and \(Y\), \(\sigma_{X+Y} ≠ \sigma_X + \sigma_Y.\) Always add variances first, then take the square root.

  1. Blindly applying the independence formula

The formula \(Var(X + Y) = Var(X) + Var(Y)\) only applies when \(X\) and \(Y\) are independent.

  1. Calculating \(E[X]^2\) instead of \(E[X^2]\)

\(E[X]^2\) and \(E[X^2]\) are different! \(E[X^2]\) is found by squaring individual outcomes first, then taking their expectation.

5.4.4. Bringing It All Together

Key Takeaways 📝

  1. The variance of a discrete random variable is the expected value of the squared deviation from its mean, measuring how spread out the distribution is.

  2. The standard deviation is the square root of the variance and has the same units as the original random variable.

  3. \(Var(X) = E[X^2] - (E[X])^2\) is often used as computational shortcut for variance.

  4. For linear transformations, \(Var(aX + b) = a^2Var(X)\), meaning that scaling affects variance quadratically while shifting has no effect.

  5. For independent random variables, \(Var(X \pm Y) = Var(X) + Var(Y)\), showing that variances (not standard deviations) add for independent variables.

  6. When calculating any standard deviation, compute the variance first, then take the square root.

In the next section, we’ll explore how to handle dependent random variables, where the relationship between variables adds another layer of complexity to our analysis.

5.4.5. Exercises

These exercises develop your skills in computing variance and standard deviation, applying the computational shortcut formula, and using variance properties for linear transformations and sums of independent random variables.

Exercise 1: Basic Variance Calculation

A network engineer monitors packet loss \(X\) (number of dropped packets per minute) on a router. The PMF is:

Table 5.14 PMF for Packet Loss

\(x\)

0

1

2

3

\(p_X(x)\)

0.50

0.30

0.15

0.05

  1. Calculate \(E[X]\).

  2. Calculate \(E[X^2]\).

  3. Calculate \(\text{Var}(X)\) using the computational shortcut \(\text{Var}(X) = E[X^2] - (E[X])^2\).

  4. Calculate \(\sigma_X\), the standard deviation.

  5. Interpret the standard deviation in context.

Solution

Part (a): E[X]

\[E[X] = (0)(0.50) + (1)(0.30) + (2)(0.15) + (3)(0.05) = 0 + 0.30 + 0.30 + 0.15 = 0.75\]

Part (b): E[X²]

\[E[X^2] = (0)^2(0.50) + (1)^2(0.30) + (2)^2(0.15) + (3)^2(0.05)\]
\[= 0 + 0.30 + 0.60 + 0.45 = 1.35\]

Part (c): Var(X)

\[\text{Var}(X) = E[X^2] - (E[X])^2 = 1.35 - (0.75)^2 = 1.35 - 0.5625 = 0.7875\]

Part (d): Standard deviation

\[\sigma_X = \sqrt{0.7875} \approx 0.887\]

Part (e): Interpretation

The standard deviation of about 0.89 packets tells us that the number of dropped packets per minute typically deviates from the average (0.75) by roughly 0.89 packets. This gives the engineer a sense of the typical variability in packet loss.


Exercise 2: Variance of Linear Transformations

A quality control inspector measures the diameter \(X\) of manufactured bolts (in mm). Historical data shows \(E[X] = 10.0\) mm and \(\sigma_X = 0.2\) mm.

The bolt specifications require converting to inches using \(Y = 0.03937X\) (since 1 mm ≈ 0.03937 inches).

  1. Find \(E[Y]\), the expected diameter in inches.

  2. Find \(\text{Var}(Y)\).

  3. Find \(\sigma_Y\), the standard deviation in inches.

  4. A machinist claims “The standard deviation in inches is just 0.03937 times the standard deviation in mm.” Is this correct? Explain.

  5. Suppose measurements are reported as deviations from the target of 10 mm: \(D = X - 10\). Find \(\text{Var}(D)\).

Solution

Part (a): E[Y]

Using linearity: \(E[Y] = E[0.03937X] = 0.03937 \cdot E[X] = 0.03937 \times 10.0 = 0.3937\) inches.

Part (b): Var(Y)

Using \(\text{Var}(aX) = a^2\text{Var}(X)\):

\[\text{Var}(Y) = (0.03937)^2 \cdot \text{Var}(X) = (0.03937)^2 \times (0.2)^2 = 0.00155 \times 0.04 = 0.000062\]

Part (c): σ_Y

\[\sigma_Y = \sqrt{0.000062} = 0.00787 \text{ inches}\]

Part (d): Is the machinist correct?

Yes, the machinist is correct! For standard deviation (unlike variance):

\[\sigma_Y = |a| \cdot \sigma_X = 0.03937 \times 0.2 = 0.00787 \text{ inches}\]

This works because \(\sigma_{aX} = \sqrt{a^2\text{Var}(X)} = |a|\sqrt{\text{Var}(X)} = |a|\sigma_X\).

Part (e): Var(D)

Using \(\text{Var}(X + b) = \text{Var}(X)\) (shifting doesn’t change variance):

\[\text{Var}(D) = \text{Var}(X - 10) = \text{Var}(X) = (0.2)^2 = 0.04 \text{ mm}^2\]

Exercise 3: Variance of Independent Sums

A small online store processes orders from two independent sources:

  • Website orders \(W\): \(E[W] = 50\), \(\text{Var}(W) = 100\)

  • Mobile app orders \(M\): \(E[M] = 30\), \(\text{Var}(M) = 64\)

Assume \(W\) and \(M\) are independent.

  1. Find \(E[W + M]\), the expected total orders.

  2. Find \(\text{Var}(W + M)\), the variance of total orders.

  3. Find \(\sigma_{W+M}\), the standard deviation of total orders.

  4. A manager incorrectly calculates \(\sigma_{W+M} = \sigma_W + \sigma_M = 10 + 8 = 18\). What is the correct value, and why is the manager’s approach wrong?

  5. The store’s profit per order is $5 from website and $3 from mobile. Find the variance of total daily profit \(P = 5W + 3M\).

Solution

Part (a): E[W + M]

\[E[W + M] = E[W] + E[M] = 50 + 30 = 80 \text{ orders}\]

Part (b): Var(W + M)

Since W and M are independent:

\[\text{Var}(W + M) = \text{Var}(W) + \text{Var}(M) = 100 + 64 = 164\]

Part (c): σ_{W+M}

\[\sigma_{W+M} = \sqrt{164} \approx 12.81\]

Part (d): Manager’s error

The manager got 18, but the correct answer is 12.81.

Standard deviations do NOT add! Variances add for independent random variables, not standard deviations. The correct approach is:

  1. Add variances: \(100 + 64 = 164\)

  2. Take square root: \(\sqrt{164} \approx 12.81\)

The manager’s error of adding standard deviations overestimates the variability.

Part (e): Var(5W + 3M)

Since W and M are independent:

\[\text{Var}(5W + 3M) = 5^2\text{Var}(W) + 3^2\text{Var}(M) = 25(100) + 9(64) = 2500 + 576 = 3076\]

The variance of daily profit is $3076 (in dollars²).


Exercise 4: The E[X²] = Var(X) + (E[X])² Trick

This exercise demonstrates a powerful technique: rearranging the variance formula to find \(E[X^2]\) when you’re given \(E[X]\) and \(\text{Var}(X)\).

Key Identity 🔑

From \(\text{Var}(X) = E[X^2] - (E[X])^2\), we can rearrange to get:

\[E[X^2] = \text{Var}(X) + (E[X])^2\]

A data center monitors humidity deviation \(X\) from the ideal level (in percentage points). Studies show that \(E[X] = 0\) and \(\text{Var}(X) = 1.86\).

The efficiency loss \(Y\) (in percentage points) due to humidity deviation is modeled as:

\[Y = 0.1X^2 + 2\]
  1. Explain why you cannot directly apply linearity of expectation to find \(E[Y]\).

  2. Use the identity \(E[X^2] = \text{Var}(X) + (E[X])^2\) to find \(E[X^2]\).

  3. Calculate \(E[Y]\), the expected efficiency loss.

  4. If the mean humidity deviation shifts to \(E[X] = 1\) while \(\text{Var}(X)\) remains 1.86, what is the new \(E[X^2]\) and expected efficiency loss?

Solution

Part (a): Why linearity doesn’t directly apply

Linearity of expectation states \(E[aX + b] = aE[X] + b\), which only works for linear functions of X.

The function \(Y = 0.1X^2 + 2\) is not linear in X because of the \(X^2\) term. We cannot simply substitute \(E[X]\) for X.

Specifically, \(E[0.1X^2 + 2] \neq 0.1(E[X])^2 + 2\) in general.

Part (b): Find E[X²] using the trick

Using the rearranged variance formula:

\[E[X^2] = \text{Var}(X) + (E[X])^2 = 1.86 + (0)^2 = 1.86\]

Part (c): Calculate E[Y]

Now we can use linearity on \(E[X^2]\):

\[E[Y] = E[0.1X^2 + 2] = 0.1 \cdot E[X^2] + 2 = 0.1(1.86) + 2 = 0.186 + 2 = 2.186\]

The expected efficiency loss is 2.186 percentage points.

Part (d): New expected loss with shifted mean

With \(E[X] = 1\) and \(\text{Var}(X) = 1.86\):

\[E[X^2] = \text{Var}(X) + (E[X])^2 = 1.86 + (1)^2 = 2.86\]
\[E[Y] = 0.1(2.86) + 2 = 0.286 + 2 = 2.286\]

The new expected efficiency loss is 2.286 percentage points.

Notice that shifting the mean (even though variance stayed the same) increased the expected loss because \(E[X^2]\) depends on both variance and the squared mean.


Exercise 5: Working with Abstract Functions (Don’t Overthink It!)

A machine learning engineer is analyzing model performance. Let \(X\) be the number of training epochs (a discrete random variable with a known PMF). Define:

\[Y = \ln(1 + e^{X^2 - 3X + 2})\]

You are told that \(Y\) has been analyzed and the following quantities are known:

  • \(E[Y] = 5\)

  • \(\text{Var}(Y) = 12\)

Independently, let \(W\) represent a noise term with \(E[W] = 0\) and \(\text{Var}(W) = 3\).

The total model loss is defined as:

\[L = 2Y - 3W + 10\]
  1. Find \(E[L]\).

  2. Find \(\text{Var}(L)\).

  3. Find \(\sigma_L\).

  4. Find \(E[L^2]\) using the identity \(E[L^2] = \text{Var}(L) + (E[L])^2\).

  5. A colleague says “We need to find the PMF of \(Y\) first before we can answer these questions.” Explain why they are wrong.

Solution

The Key Insight: The complicated function \(Y = \ln(1 + e^{X^2 - 3X + 2})\) is irrelevant! We’re given \(E[Y]\) and \(\text{Var}(Y)\) directly. Just let \(Y\) be a random variable with these properties and proceed.

Part (a): E[L]

Using linearity of expectation:

\[E[L] = E[2Y - 3W + 10] = 2E[Y] - 3E[W] + 10 = 2(5) - 3(0) + 10 = 20\]

Part (b): Var(L)

Since \(Y\) and \(W\) are independent:

\[\text{Var}(L) = \text{Var}(2Y - 3W + 10) = 4\text{Var}(Y) + 9\text{Var}(W)\]
\[= 4(12) + 9(3) = 48 + 27 = 75\]

Part (c): σ_L

\[\sigma_L = \sqrt{75} \approx 8.66\]

Part (d): E[L²]

Using \(E[L^2] = \text{Var}(L) + (E[L])^2\):

\[E[L^2] = 75 + 20^2 = 75 + 400 = 475\]

Part (e): Why we don’t need the PMF of Y

The colleague is wrong because:

  1. Expected value and variance are sufficient for the calculations we need. The properties of expectation (linearity, additivity) and variance (scaling, additivity for independent RVs) only require knowing \(E[Y]\) and \(\text{Var}(Y)\).

  2. We never need to compute \(E[\ln(1 + e^{X^2 - 3X + 2})]\) directly—it’s given as 5.

  3. This is the power of working with abstract properties: once we know the mean and variance of a random variable, we can analyze linear combinations without knowing the full distribution.

This is a common exam trick: students waste time trying to compute complicated transformations when all necessary information is already provided!


Exercise 6: LOTUS for Variance (Table Given)

A cybersecurity team monitors the number of intrusion attempts \(X\) per hour on a server. The PMF is:

Table 5.15 PMF for Intrusion Attempts

\(x\)

0

1

2

3

\(p_X(x)\)

0.50

0.30

0.15

0.05

The security response cost (in hundreds of dollars) is modeled by:

\[C(X) = \frac{X^3 + 2X}{X + 1}\]
  1. Create a table showing \(x\), \(p_X(x)\), \(C(x)\), \(C(x) \cdot p_X(x)\), and \(C(x)^2 \cdot p_X(x)\).

  2. Calculate \(E[C(X)]\) using LOTUS.

  3. Calculate \(E[C(X)^2]\) using LOTUS.

  4. Calculate \(\text{Var}(C(X))\).

  5. If the company budgets for the expected cost plus two standard deviations, what should their hourly budget be (in dollars)?

Solution

Part (a): Build the table

First, compute \(C(x) = \frac{x^3 + 2x}{x + 1}\) for each value:

  • \(C(0) = \frac{0 + 0}{0 + 1} = 0\)

  • \(C(1) = \frac{1 + 2}{1 + 1} = \frac{3}{2} = 1.5\)

  • \(C(2) = \frac{8 + 4}{2 + 1} = \frac{12}{3} = 4\)

  • \(C(3) = \frac{27 + 6}{3 + 1} = \frac{33}{4} = 8.25\)

Table 5.16 LOTUS Calculation Table

\(x\)

\(p_X(x)\)

\(C(x)\)

\(C(x) \cdot p_X(x)\)

\(C(x)^2 \cdot p_X(x)\)

0

0.50

0

0

0

1

0.30

1.5

0.45

0.675

2

0.15

4

0.60

2.40

3

0.05

8.25

0.4125

3.403

Sum

1.00

1.4625

6.478

Part (b): E[C(X)]

\[E[C(X)] = \sum_{x} C(x) \cdot p_X(x) = 0 + 0.45 + 0.60 + 0.4125 = 1.4625\]

Expected cost is $146.25 per hour (since C is in hundreds).

Part (c): E[C(X)²]

\[E[C(X)^2] = \sum_{x} C(x)^2 \cdot p_X(x) = 0 + 0.675 + 2.40 + 3.403 = 6.478\]

Part (d): Var(C(X))

\[\text{Var}(C(X)) = E[C(X)^2] - (E[C(X)])^2 = 6.478 - (1.4625)^2\]
\[= 6.478 - 2.139 = 4.339\]

Part (e): Budget calculation

Standard deviation: \(\sigma_{C(X)} = \sqrt{4.339} \approx 2.083\) (in hundreds)

Budget = \(E[C(X)] + 2\sigma_{C(X)} = 1.4625 + 2(2.083) = 1.4625 + 4.166 = 5.63\) (in hundreds)

Hourly budget should be approximately $563.

Note on Approach

Even though \(C(X) = \frac{X^3 + 2X}{X + 1}\) looks complicated, LOTUS lets us:

  1. Evaluate \(C(x)\) at each value in the support

  2. Weight by the original probabilities \(p_X(x)\)

  3. Sum to get the expected value

We never need to find the PMF of \(C(X)\) itself!


Exercise 7: Sample Mean Properties

Let \(X_1, X_2, X_3, X_4\) be independent and identically distributed (i.i.d.) random variables with \(E[X_i] = 4\) and \(\text{Var}(X_i) = 32\).

Define the sample mean: \(\bar{X} = \frac{1}{4}(X_1 + X_2 + X_3 + X_4)\)

  1. Find \(E[\bar{X}]\).

  2. Find \(\text{Var}(\bar{X})\).

  3. Find \(\text{Var}(2\bar{X} + 1)\).

  4. Find \(E[(2\bar{X} + 1)^2]\).

  5. Compare \(\text{Var}(\bar{X})\) to \(\text{Var}(X_i)\). What happens to the variance as we average more observations?

Solution

Part (a): E[X̄]

Using linearity and additivity:

\[E[\bar{X}] = E\left[\frac{1}{4}(X_1 + X_2 + X_3 + X_4)\right] = \frac{1}{4}(E[X_1] + E[X_2] + E[X_3] + E[X_4])\]
\[= \frac{1}{4}(4 + 4 + 4 + 4) = \frac{16}{4} = 4\]

Part (b): Var(X̄)

Since the \(X_i\) are independent:

\[\text{Var}(\bar{X}) = \text{Var}\left(\frac{1}{4}(X_1 + X_2 + X_3 + X_4)\right) = \frac{1}{16}\text{Var}(X_1 + X_2 + X_3 + X_4)\]
\[= \frac{1}{16}(\text{Var}(X_1) + \text{Var}(X_2) + \text{Var}(X_3) + \text{Var}(X_4)) = \frac{1}{16}(32 \times 4) = \frac{128}{16} = 8\]

Part (c): Var(2X̄ + 1)

Using \(\text{Var}(aX + b) = a^2\text{Var}(X)\):

\[\text{Var}(2\bar{X} + 1) = 4 \cdot \text{Var}(\bar{X}) = 4 \times 8 = 32\]

Part (d): E[(2X̄ + 1)²]

Using \(E[Y^2] = \text{Var}(Y) + (E[Y])^2\):

Let \(Y = 2\bar{X} + 1\). We have:

  • \(E[Y] = 2E[\bar{X}] + 1 = 2(4) + 1 = 9\)

  • \(\text{Var}(Y) = 32\) (from part c)

Therefore:

\[E[(2\bar{X} + 1)^2] = E[Y^2] = \text{Var}(Y) + (E[Y])^2 = 32 + 81 = 113\]

Part (e): Comparison

  • \(\text{Var}(X_i) = 32\)

  • \(\text{Var}(\bar{X}) = 8 = \frac{32}{4}\)

In general, \(\text{Var}(\bar{X}) = \frac{\text{Var}(X)}{n}\) for i.i.d. random variables.

As we average more observations, the variance of the sample mean decreases by a factor of \(n\). This is why averaging reduces variability—the sample mean becomes more stable (less variable) with larger samples. This is a preview of concepts in sampling distributions (Chapter 7).


Exercise 8: Variance from Definition vs. Shortcut

A startup tracks daily user signups \(X\) with the following PMF:

Table 5.17 PMF for Daily Signups

\(x\)

10

20

30

\(p_X(x)\)

0.25

0.50

0.25

  1. Calculate \(E[X]\).

  2. Calculate \(\text{Var}(X)\) using the definition: \(\text{Var}(X) = E[(X - \mu)^2] = \sum_x (x - \mu)^2 p_X(x)\).

  3. Calculate \(\text{Var}(X)\) using the shortcut: \(\text{Var}(X) = E[X^2] - (E[X])^2\).

  4. Verify that both methods give the same answer.

  5. Which method required fewer calculations? When might the definition method be preferable?

Solution

Part (a): E[X]

\[E[X] = (10)(0.25) + (20)(0.50) + (30)(0.25) = 2.5 + 10 + 7.5 = 20\]

Part (b): Var(X) using the definition

With \(\mu = 20\):

\[\text{Var}(X) = \sum_x (x - 20)^2 p_X(x)\]
\[= (10 - 20)^2(0.25) + (20 - 20)^2(0.50) + (30 - 20)^2(0.25)\]
\[= (-10)^2(0.25) + (0)^2(0.50) + (10)^2(0.25)\]
\[= 100(0.25) + 0 + 100(0.25) = 25 + 0 + 25 = 50\]

Part (c): Var(X) using the shortcut

First find \(E[X^2]\):

\[E[X^2] = (10)^2(0.25) + (20)^2(0.50) + (30)^2(0.25)\]
\[= 100(0.25) + 400(0.50) + 900(0.25) = 25 + 200 + 225 = 450\]

Then:

\[\text{Var}(X) = E[X^2] - (E[X])^2 = 450 - 20^2 = 450 - 400 = 50\]

Part (d): Verification

Both methods give \(\text{Var}(X) = 50\)

Part (e): Comparison

The shortcut method typically requires fewer calculations because: - You compute \(E[X]\) and \(E[X^2]\) (which you often need anyway) - You avoid computing \((x - \mu)^2\) for each value

The definition method might be preferable when: - You want to see how each value contributes to variance - The mean is a “nice” number (like 0) that makes \((x - \mu)^2\) easy to compute - You’re verifying your shortcut calculation


5.4.6. Additional Practice Problems

True/False Questions (1 point each)

  1. Variance can be negative if the random variable takes negative values.

    Ⓣ or Ⓕ

  2. For any constant \(c\), \(\text{Var}(X + c) = \text{Var}(X) + c\).

    Ⓣ or Ⓕ

  3. \(\text{Var}(2X) = 2\text{Var}(X)\).

    Ⓣ or Ⓕ

  4. For independent \(X\) and \(Y\), \(\text{Var}(X - Y) = \text{Var}(X) - \text{Var}(Y)\).

    Ⓣ or Ⓕ

  5. The standard deviation has the same units as the original random variable.

    Ⓣ or Ⓕ

  6. If \(E[X] = 0\), then \(E[X^2] = \text{Var}(X)\).

    Ⓣ or Ⓕ

Multiple Choice Questions (2 points each)

  1. A random variable \(X\) has \(E[X] = 5\) and \(\text{Var}(X) = 9\). What is \(\text{Var}(3X - 2)\)?

    Ⓐ 25

    Ⓑ 27

    Ⓒ 79

    Ⓓ 81

  2. If \(X\) and \(Y\) are independent with \(\sigma_X = 3\) and \(\sigma_Y = 4\), what is \(\sigma_{X+Y}\)?

    Ⓐ 5

    Ⓑ 7

    Ⓒ 12

    Ⓓ 25

  3. A random variable \(X\) has \(E[X] = 2\) and \(E[X^2] = 8\). What is \(\text{Var}(X)\)?

    Ⓐ 2

    Ⓑ 4

    Ⓒ 6

    Ⓓ 8

  4. Let \(X\) have \(E[X] = 3\) and \(\text{Var}(X) = 4\). What is \(E[X^2]\)?

    Ⓐ 7

    Ⓑ 9

    Ⓒ 12

    Ⓓ 13

Answers to Practice Problems

True/False Answers:

  1. False — Variance is always non-negative: \(\text{Var}(X) \geq 0\). It equals zero only if X is a constant. The sign of X’s values doesn’t affect this.

  2. False — Adding a constant shifts the distribution but doesn’t change its spread. \(\text{Var}(X + c) = \text{Var}(X)\), not \(\text{Var}(X) + c\).

  3. False\(\text{Var}(aX) = a^2\text{Var}(X)\), so \(\text{Var}(2X) = 4\text{Var}(X)\), not \(2\text{Var}(X)\).

  4. False — For independent RVs, variances add regardless of the sign: \(\text{Var}(X - Y) = \text{Var}(X) + \text{Var}(Y)\).

  5. True — Since \(\sigma_X = \sqrt{\text{Var}(X)}\), the standard deviation has the same units as X, while variance has squared units.

  6. True — From \(\text{Var}(X) = E[X^2] - (E[X])^2\), if \(E[X] = 0\), then \(\text{Var}(X) = E[X^2] - 0 = E[X^2]\).

Multiple Choice Answers:

  1. \(\text{Var}(3X - 2) = 3^2 \text{Var}(X) = 9 \times 9 = 81\).

  2. \(\text{Var}(X + Y) = \sigma_X^2 + \sigma_Y^2 = 9 + 16 = 25\), so \(\sigma_{X+Y} = \sqrt{25} = 5\).

  3. \(\text{Var}(X) = E[X^2] - (E[X])^2 = 8 - 4 = 4\).

  4. — Using the trick: \(E[X^2] = \text{Var}(X) + (E[X])^2 = 4 + 9 = 13\).