.. _5-4-variance-of-discrete-rv: .. raw:: html
Varianace of a Discrete Random Variable ================================================== Just as the expected value tells us about the center of a probability distribution, we often need to quantify how spread out or dispersed the values are around this center. Variance and standard deviation provide this crucial second dimension to our understanding of random variables. .. admonition:: Road Map 🧭 :class: important • Define **variance** and **standard deviation** for discrete random variables. • Explore an alternative **computational formula** for variance. • Derive key **properties of variance** for linear transformations and sums. From Sample to Population: Defining Variance ------------------------------------------------ In our exploration of sample statistics, we measured the spread of data using sample variance—the average of squared deviations from the mean. For random variables, we take a similar approach, but with an important twist: instead of averaging deviations with equal weights, we **weight each deviation by its probability**. Definition ~~~~~~~~~~~~~~~~~~~~ The variance of a discrete random variable :math:`X`, denoted :math:`Var(X)` or :math:`\sigma^2_X`, is the expected value of the squared deviation from its mean: .. math:: \sigma_X^2 = \text{Var}(X) = E[(X - \mu_X)^2] = \sum_{x\in\text{supp}(X)} (x - \mu_X)^2 p_X(x) The standard deviation, denoted :math:`\sigma_X`, is simply the square root of the variance: .. math:: \sigma_X = \sqrt{\text{Var}(X)} = \sqrt{E[(X - \mu_X)^2]} Note the **variance has squared units** (e.g., dollars² if :math:`X` is in dollars). The standard deviation returns us to the original units, making it often more interpretable in practice. This definition requires that the series be absolutely convergent for the variance to be well-defined. A Computational Shortcut for Variance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Calculating variance directly from its definition can be cumbersome, especially when the mean :math:`\mu_X` is not a simple value. Fortunately, there's an equivalent formula that is typically easier to apply: .. math:: \sigma_X^2 = E[X^2] - \mu_X^2. The derivation of this formula follows from expanding the squared term in the original definition: .. math:: \begin{align} \text{Var}(X) &= E[(X - \mu_X)^2] \\ &= E[X^2 - 2X\mu_X + \mu_X^2] \\ &= E[X^2] - 2\mu_X E[X] + \mu_X^2 \\ &= E[X^2] - 2\mu_X \mu_X + \mu_X^2 \\ &= E[X^2] - \mu_X^2 \end{align} This computational formula often simplifies the work significantly, as we'll see in our examples. .. admonition:: Example💡: Bean & Butter :class: note **Bean & Butter** is a small campus café that sells only two morning items: coffee for $4 per cup and pastry for $3 each. The shop records its sales in *waves*—each wave is short enough that :math:`X` (cups of coffee) and :math:`Y` (pastries) follow a stable pattern but long enough to summarize cleanly. It is known that :math:`X` and :math:`Y` are independent. The sales distribution for a single wave is: .. flat-table:: :header-rows: 1 * - Outcome - :math:`p_X(x)` (coffee) - :math:`p_Y(y)` (pastry) * - 0 - 0.20 - 0.30 * - 1 - 0.50 - 0.40 * - 2 - 0.30 - 0.30 Let us first compute the expected sales *count* of coffee and pastries. .. math:: E[X] &= (0) (0.2) + (1) (0.5) + (2) (0.3) = 1.1\\ E[Y] &= (0) (0.3) + (1) (0.4) + (2) (0.3) = 1.0 On average, 1.1 cups of cofee and 1.0 patry are sold per wave. For staffing, buying milk, or setting aside cash for the till, the owner also cares about *variability* of sales--how much does an individual wave fluctuate from the average? To answer this question, we compute the variance and standard deviation of each random variable. For cofee, .. math:: E[X^2] &= (0^2) (0.2) + (1^2) (0.5) + (2^2) (0.3) = 1.7\\ \text{Var}(X) &= E[X^2]- (E[X])^2 =1.7 - 1.1^2 = 0.49\\ \sigma_X &= \sqrt{0.49} \approx 0.70 Similarly for pastries: .. math:: E[Y^2] &= (0^2) (0.3) + (1^2) (0.4) + (2^2) (0.3) = 1.6\\ \text{Var}(Y) &= 1.6 - 1.0^2 = 0.60\\ \sigma_Y &= \sqrt{0.60} \approx 0.77 A standard deviation of about **0.70 coffees** and **0.77 pastries** tells us that, in a typical wave, each count strays by **roughly three-quarters of an item** from its own average. Properties of Variance ----------------------- Variance follows several key properties that make calculations more manageable, especially when dealing with linear transformations of random variables. 1. Variance of Linear Transformations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For a linear transformation of a random variable, :math:`g(X) = aX + b`, where :math:`a` and :math:`b` are constants: .. math:: \text{Var}(aX + b) = a^2 \text{Var}(X). Notice two important implications: * Scaling a random variable by a factor of :math:`a` multiplies its variance by :math:`a^2`. * Adding a constant :math:`b` has no effect on variance. This makes intuitive sense. Multiplying all values by :math:`a` stretches (or compresses) the distribution, amplifying (or reducing) the deviations also by a factor of :math:`a`. But since deviations are squared in the variance calculation, the variance increases by a factor of :math:`a^2`. Meanwhile, shifting all values by adding a constant :math:`b` moves the entire distribution without stretching or compressing its *width*. We can prove this property using the computational formula for variance: .. math:: \begin{align} \text{Var}(aX + b) &= E[(aX + b)^2] - (E[aX + b])^2 \\ &= E[a^2X^2 + 2abX + b^2] - (a\mu_X + b)^2 \\ &= a^2E[X^2] + 2abE[X] + b^2 - a^2\mu_X^2 - 2ab\mu_X - b^2 \\ &= a^2E[X^2] + 2ab\mu_X + b^2 - a^2\mu_X^2 - 2ab\mu_X - b^2 \\ &= a^2E[X^2] - a^2\mu_X^2 \\ &= a^2(E[X^2] - \mu_X^2) \\ &= a^2\text{Var}(X) \end{align} 2. Variance of Sums of Independent RVs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For **independent** random variables, the variance of their sum equals the sum of their individual variances: .. math:: \text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y) This extends to any number of **mutually independent** random variables: .. math:: \text{Var}(X_1 \pm X_2 \pm \cdots \pm X_n) = \text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n) .. admonition:: Why do the negative signs disappear? :class: important You can think of the negative signs as coefficients (-1) multiplied to the following random variable. Then using the first property of variance, .. math:: Var(X-Y) &= Var(X + (-1)Y)\\ &= Var(X)+(-1)^2Var(Y) = Var(X) + Var(Y). .. A crucial caveat: this property holds only when the random variables are independent. When variables are dependent, we need to account for their covariance, which we'll explore in a later section. An important consequence: for standard deviation, we cannot simply add the standard deviations of independent random variables. Instead: .. math:: \sigma_{X \pm Y} = \sqrt{\sigma_X^2 + \sigma_Y^2} 3. Variance of Linear Combinations of Independent RVs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For linear combinations of independent random variables: .. math:: \text{Var}(aX \pm bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) This simply combines the two properties we've just seen. Using Variance Properties to Compute Standard Deviation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Properties of variances listed in this section does not apply to standard deviations, in general. To compute the standard deviation of a linear compbination of random variables, always **compute the variance first, then take its square root**. .. admonition:: Example💡: Bean & Butter, Continued :class: note Consider the revenue per wave at Beans & Butter: .. math:: R = 4X + 3Y. Item-wise revenues are first computed by multiplying the price of each item by its sales count. These individual revenues are then added up to obtain the total revenue. **What is the standard deviation of the total revenue?** We begin by computing the variance of :math:`R`. Since :math:`X` and :math:`Y` are **independent**, we can use the third property of variance: .. math:: \begin{aligned} \text{Var}(R) &= 4^2 \text{Var}(X) + 3^2 \text{Var}(Y)\\ &= 16(0.49) + 9(0.60) = 13.24\\ \sigma_R &= \sqrt{13.24} \approx \$3.64 \end{aligned} The standard deviation of revenue per wave is $3.64. Suppose a new random variable :math:`Z` represents the **cost** per wave of running the store. It is known that :math:`\sigma_Z = 2.2` and that :math:`Z` is independent of :math:`X` and :math:`Y`. **What is the standard deviation in the total profit per wave?** The total profit can be expressed as :math:`P = R - Z`. Again, begin by computing the variance of :math:`P` first. Because :math:`R` and :math:`Z` are independent, we can use the second property of variance: .. math:: Var(P) &= Var(R) + Var(Z) = 13.24 + 2.2^2 = 18.08\\ \sigma_P &= \sqrt{18.08} \approx 4.2521. Note that the negative sign between :math:`R` and :math:`Z` disappears. .. Illustrating the Concepts: The Dice 💡 -------------------------------------------- Let's apply these concepts to a clear example: rolling multiple fair four-sided dice and finding the variance of their sum. Suppose we roll n independent four-sided dice (with faces labeled 1, 2, 3, and 4) and let Xᵢ be the outcome of the ith die. We want to understand both the expected value and the variance of the sum of these dice. First, let's determine the properties of a single die. Since each face is equally likely to appear, the PMF for each die is: .. math:: p_{X_i}(x_i) = \frac{1}{4} \text{ for } x_i \in \{1, 2, 3, 4\} \text{ and for all } i \in \{1, 2, \ldots, n\} The expected value of each die is: .. math:: \mu_{X_i} = E[X_i] = 1 \times \frac{1}{4} + 2 \times \frac{1}{4} + 3 \times \frac{1}{4} + 4 \times \frac{1}{4} = \frac{10}{4} = 2.5 To calculate the variance, we first find E[X_i²]: .. math:: E[X_i^2] = 1^2 \times \frac{1}{4} + 2^2 \times \frac{1}{4} + 3^2 \times \frac{1}{4} + 4^2 \times \frac{1}{4} = \frac{1 + 4 + 9 + 16}{4} = \frac{30}{4} = 7.5 Now we can compute the variance: .. math:: \text{Var}(X_i) = E[X_i^2] - \mu_{X_i}^2 = 7.5 - 2.5^2 = 7.5 - 6.25 = 1.25 The standard deviation is: .. math:: \sigma_{X_i} = \sqrt{\text{Var}(X_i)} = \sqrt{1.25} \approx 1.12 Now, let's consider the sum of all n dice, which we'll denote as S = X₁ + X₂ + ... + Xₙ. Since the dice are independent, we can apply our properties: 1. Expected value of the sum: .. math:: E[S] = E[X_1 + X_2 + \cdots + X_n] = E[X_1] + E[X_2] + \cdots + E[X_n] = n \times 2.5 = 2.5n 2. Variance of the sum: .. math:: \text{Var}(S) = \text{Var}(X_1 + X_2 + \cdots + X_n) = \text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n) = n \times 1.25 = 1.25n 3. Standard deviation of the sum: .. math:: \sigma_S = \sqrt{\text{Var}(S)} = \sqrt{1.25n} These results tell us that: - The average sum when rolling n four-sided dice is 2.5n - The variance of this sum is 1.25n - The standard deviation grows with the square root of n This example illustrates a fundamental principle in probability: as we add more independent random variables, the expected value grows linearly with n, but the standard deviation grows with √n. This is a preview of what we'll see more formally in the Central Limit Theorem in later chapters. Common Mistakes to Avoid ------------------------- When working with variance and standard deviation, be careful to avoid these common errors: .. admonition:: Common Mistakes to Avoid 🛑 :class: error 1. **Forgetting to square the coefficient in variance** :math:`Var(aX) = a²Var(X)`, not :math:`aVar(X)`. 2. **Not including the negative sign when squaring the coefficient** :math:`Var(-aX) = (-a)^2Var(X)`. :math:`(-a)^2` is positive! 3. **Assuming standard deviations add** For independent :math:`X` and :math:`Y`, :math:`\sigma_{X+Y} ≠ \sigma_X + \sigma_Y.` Always add variances first, then take the square root. 4. **Blindly applying the independence formula** The formula :math:`Var(X + Y) = Var(X) + Var(Y)` only applies when :math:`X` and :math:`Y` are independent. 5. **Calculating** :math:`E[X]^2` **instead of** :math:`E[X^2]` :math:`E[X]^2` and :math:`E[X^2]` are different! :math:`E[X^2]` is found by squaring individual outcomes first, then taking their expectation. Bringing It All Together --------------------------- .. admonition:: Key Takeaways 📝 :class: important 1. The **variance** of a discrete random variable is the expected value of the squared deviation from its mean, measuring how spread out the distribution is. 2. The **standard deviation** is the square root of the variance and has the same units as the original random variable. 3. :math:`Var(X) = E[X^2] - (E[X])^2` is often used as computational shortcut for variance. 4. For linear transformations, :math:`Var(aX + b) = a^2Var(X)`, meaning that scaling affects variance quadratically while shifting has no effect. 5. For independent random variables, :math:`Var(X \pm Y) = Var(X) + Var(Y)`, showing that variances (not standard deviations) add for independent variables. 6. When calculating any standard deviation, compute the variance first, then take the square root. In the next section, we'll explore how to handle dependent random variables, where the relationship between variables adds another layer of complexity to our analysis. Exercises ~~~~~~~~~~~~~ 1. **Basic Calculations**: For a random variable :math:`X` with PMF :math:`p_X(0) = 0.2, p_X(1) = 0.5`, and :math:`p_X(2) = 0.3`: a) Calculate :math:`E[X]` and :math:`Var(X)`. b) Calculate :math:`E[2X + 3]` and :math:`Var(2X + 3)`. 2. **Game of Chance**: In a certain game, you flip a fair coin. If it lands heads, you win $5; if it lands tails, you lose $3. a) Let :math:`X` be your net gain. Find :math:`E[X]` and :math:`Var(X)`. b) If you play this game 100 times independently, what is the expected value and variance of your total net gain? c) What is the standard deviation of your total net gain after 100 plays?