5.4. Varianace of a Discrete Random Variable

Just as the expected value tells us about the center of a probability distribution, we often need to quantify how spread out or dispersed the values are around this center. Variance and standard deviation provide this crucial second dimension to our understanding of random variables.

Road Map 🧭

  • Define variance and standard deviation for discrete random variables.

  • Explore an alternative computational formula for variance.

  • Derive key properties of variance for linear transformations and sums.

5.4.1. From Sample to Population: Defining Variance

In our exploration of sample statistics, we measured the spread of data using sample variance—the average of squared deviations from the mean. For random variables, we take a similar approach, but with an important twist: instead of averaging deviations with equal weights, we weight each deviation by its probability.

Definition

The variance of a discrete random variable \(X\), denoted \(Var(X)\) or \(\sigma^2_X\), is the expected value of the squared deviation from its mean:

\[\sigma_X^2 = \text{Var}(X) = E[(X - \mu_X)^2] = \sum_{x\in\text{supp}(X)} (x - \mu_X)^2 p_X(x)\]

The standard deviation, denoted \(\sigma_X\), is simply the square root of the variance:

\[\sigma_X = \sqrt{\text{Var}(X)} = \sqrt{E[(X - \mu_X)^2]}\]

Note the variance has squared units (e.g., dollars² if \(X\) is in dollars). The standard deviation returns us to the original units, making it often more interpretable in practice.

This definition requires that the series be absolutely convergent for the variance to be well-defined.

A Computational Shortcut for Variance

Calculating variance directly from its definition can be cumbersome, especially when the mean \(\mu_X\) is not a simple value. Fortunately, there’s an equivalent formula that is typically easier to apply:

\[\sigma_X^2 = E[X^2] - \mu_X^2.\]

The derivation of this formula follows from expanding the squared term in the original definition:

\[\begin{split}\begin{align} \text{Var}(X) &= E[(X - \mu_X)^2] \\ &= E[X^2 - 2X\mu_X + \mu_X^2] \\ &= E[X^2] - 2\mu_X E[X] + \mu_X^2 \\ &= E[X^2] - 2\mu_X \mu_X + \mu_X^2 \\ &= E[X^2] - \mu_X^2 \end{align}\end{split}\]

This computational formula often simplifies the work significantly, as we’ll see in our examples.

Example💡: Bean & Butter

Bean & Butter is a small campus café that sells only two morning items: coffee for $4 per cup and pastry for $3 each. The shop records its sales in waves—each wave is short enough that \(X\) (cups of coffee) and \(Y\) (pastries) follow a stable pattern but long enough to summarize cleanly.

It is known that \(X\) and \(Y\) are independent. The sales distribution for a single wave is:

Outcome

\(p_X(x)\) (coffee)

\(p_Y(y)\) (pastry)

0

0.20

0.30

1

0.50

0.40

2

0.30

0.30

Let us first compute the expected sales count of coffee and pastries.

\[\begin{split}E[X] &= (0) (0.2) + (1) (0.5) + (2) (0.3) = 1.1\\ E[Y] &= (0) (0.3) + (1) (0.4) + (2) (0.3) = 1.0\end{split}\]

On average, 1.1 cups of cofee and 1.0 patry are sold per wave.

For staffing, buying milk, or setting aside cash for the till, the owner also cares about variability of sales–how much does an individual wave fluctuate from the average?

To answer this question, we compute the variance and standard deviation of each random variable. For cofee,

\[\begin{split}E[X^2] &= (0^2) (0.2) + (1^2) (0.5) + (2^2) (0.3) = 1.7\\ \text{Var}(X) &= E[X^2]- (E[X])^2 =1.7 - 1.1^2 = 0.49\\ \sigma_X &= \sqrt{0.49} \approx 0.70\end{split}\]

Similarly for pastries:

\[\begin{split}E[Y^2] &= (0^2) (0.3) + (1^2) (0.4) + (2^2) (0.3) = 1.6\\ \text{Var}(Y) &= 1.6 - 1.0^2 = 0.60\\ \sigma_Y &= \sqrt{0.60} \approx 0.77\end{split}\]

A standard deviation of about 0.70 coffees and 0.77 pastries tells us that, in a typical wave, each count strays by roughly three-quarters of an item from its own average.

5.4.2. Properties of Variance

Variance follows several key properties that make calculations more manageable, especially when dealing with linear transformations of random variables.

1. Variance of Linear Transformations

For a linear transformation of a random variable, \(g(X) = aX + b\), where \(a\) and \(b\) are constants:

\[\text{Var}(aX + b) = a^2 \text{Var}(X).\]

Notice two important implications:

  • Scaling a random variable by a factor of \(a\) multiplies its variance by \(a^2\).

  • Adding a constant \(b\) has no effect on variance.

This makes intuitive sense. Multiplying all values by \(a\) stretches (or compresses) the distribution, amplifying (or reducing) the deviations also by a factor of \(a\). But since deviations are squared in the variance calculation, the variance increases by a factor of \(a^2\). Meanwhile, shifting all values by adding a constant \(b\) moves the entire distribution without stretching or compressing its width.

We can prove this property using the computational formula for variance:

\[\begin{split}\begin{align} \text{Var}(aX + b) &= E[(aX + b)^2] - (E[aX + b])^2 \\ &= E[a^2X^2 + 2abX + b^2] - (a\mu_X + b)^2 \\ &= a^2E[X^2] + 2abE[X] + b^2 - a^2\mu_X^2 - 2ab\mu_X - b^2 \\ &= a^2E[X^2] + 2ab\mu_X + b^2 - a^2\mu_X^2 - 2ab\mu_X - b^2 \\ &= a^2E[X^2] - a^2\mu_X^2 \\ &= a^2(E[X^2] - \mu_X^2) \\ &= a^2\text{Var}(X) \end{align}\end{split}\]

2. Variance of Sums of Independent RVs

For independent random variables, the variance of their sum equals the sum of their individual variances:

\[\text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y)\]

This extends to any number of mutually independent random variables:

\[\text{Var}(X_1 \pm X_2 \pm \cdots \pm X_n) = \text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n)\]

Why do the negative signs disappear?

You can think of the negative signs as coefficients (-1) multiplied to the following random variable. Then using the first property of variance,

\[\begin{split}Var(X-Y) &= Var(X + (-1)Y)\\ &= Var(X)+(-1)^2Var(Y) = Var(X) + Var(Y).\end{split}\]

3. Variance of Linear Combinations of Independent RVs

For linear combinations of independent random variables:

\[\text{Var}(aX \pm bY) = a^2\text{Var}(X) + b^2\text{Var}(Y)\]

This simply combines the two properties we’ve just seen.

Using Variance Properties to Compute Standard Deviation

Properties of variances listed in this section does not apply to standard deviations, in general. To compute the standard deviation of a linear compbination of random variables, always compute the variance first, then take its square root.

Example💡: Bean & Butter, Continued

Consider the revenue per wave at Beans & Butter:

\[R = 4X + 3Y.\]

Item-wise revenues are first computed by multiplying the price of each item by its sales count. These individual revenues are then added up to obtain the total revenue.

What is the standard deviation of the total revenue?

We begin by computing the variance of \(R\). Since \(X\) and \(Y\) are independent, we can use the third property of variance:

\[\begin{split}\begin{aligned} \text{Var}(R) &= 4^2 \text{Var}(X) + 3^2 \text{Var}(Y)\\ &= 16(0.49) + 9(0.60) = 13.24\\ \sigma_R &= \sqrt{13.24} \approx \$3.64 \end{aligned}\end{split}\]

The standard deviation of revenue per wave is $3.64.

Suppose a new random variable \(Z\) represents the cost per wave of running the store. It is known that \(\sigma_Z = 2.2\) and that \(Z\) is independent of \(X\) and \(Y\).

What is the standard deviation in the total profit per wave?

The total profit can be expressed as \(P = R - Z\).

Again, begin by computing the variance of \(P\) first. Because \(R\) and \(Z\) are independent, we can use the second property of variance:

\[\begin{split}Var(P) &= Var(R) + Var(Z) = 13.24 + 2.2^2 = 18.08\\ \sigma_P &= \sqrt{18.08} \approx 4.2521.\end{split}\]

Note that the negative sign between \(R\) and \(Z\) disappears.

5.4.3. Common Mistakes to Avoid

When working with variance and standard deviation, be careful to avoid these common errors:

Common Mistakes to Avoid 🛑

  1. Forgetting to square the coefficient in variance

\(Var(aX) = a²Var(X)\), not \(aVar(X)\).

  1. Not including the negative sign when squaring the coefficient

\(Var(-aX) = (-a)^2Var(X)\). \((-a)^2\) is positive!

  1. Assuming standard deviations add

For independent \(X\) and \(Y\), \(\sigma_{X+Y} ≠ \sigma_X + \sigma_Y.\) Always add variances first, then take the square root.

  1. Blindly applying the independence formula

The formula \(Var(X + Y) = Var(X) + Var(Y)\) only applies when \(X\) and \(Y\) are independent.

  1. Calculating \(E[X]^2\) instead of \(E[X^2]\)

\(E[X]^2\) and \(E[X^2]\) are different! \(E[X^2]\) is found by squaring individual outcomes first, then taking their expectation.

5.4.4. Bringing It All Together

Key Takeaways 📝

  1. The variance of a discrete random variable is the expected value of the squared deviation from its mean, measuring how spread out the distribution is.

  2. The standard deviation is the square root of the variance and has the same units as the original random variable.

  3. \(Var(X) = E[X^2] - (E[X])^2\) is often used as computational shortcut for variance.

  4. For linear transformations, \(Var(aX + b) = a^2Var(X)\), meaning that scaling affects variance quadratically while shifting has no effect.

  5. For independent random variables, \(Var(X \pm Y) = Var(X) + Var(Y)\), showing that variances (not standard deviations) add for independent variables.

  6. When calculating any standard deviation, compute the variance first, then take the square root.

In the next section, we’ll explore how to handle dependent random variables, where the relationship between variables adds another layer of complexity to our analysis.

Exercises

  1. Basic Calculations: For a random variable \(X\) with PMF \(p_X(0) = 0.2, p_X(1) = 0.5\), and \(p_X(2) = 0.3\):

    1. Calculate \(E[X]\) and \(Var(X)\).

    2. Calculate \(E[2X + 3]\) and \(Var(2X + 3)\).

  2. Game of Chance: In a certain game, you flip a fair coin. If it lands heads, you win $5; if it lands tails, you lose $3.

    1. Let \(X\) be your net gain. Find \(E[X]\) and \(Var(X)\).

    2. If you play this game 100 times independently, what is the expected value and variance of your total net gain?

    3. What is the standard deviation of your total net gain after 100 plays?