5.5. Covariance of Dependent Random Variables

Many real-world scenarios involve random variables that influence each other—driving violations may correlate with accident rates, stock prices often move together, and rainfall affects crop yields. When random variables are dependent, their joint behavior becomes more complex, requiring us to understand how they vary together.

Road Map 🧭

  • Introduce covariance, a measure of how random variables change together.

  • Define correlation as a standardized measure of relationship strength.

  • Extend variance formulas for sums of dependent random variables.

  • Explore the independence property and its effect on covariance.

5.5.1. Beyond Independence: Understanding Covariance

When analyzing two random variables \(X\) and \(Y\) together, we often want to know: When \(X\) is large, does \(Y\) also tend to be larger? Or do \(X\) and \(Y\) tend to move in the opposite direction? Covariance provides a mathematical way to measure this relationship.

Definition

The covariance between two discrete random variables \(X\) and \(Y\), denoted \(Cov(X,Y)\) or \(\sigma_{XY}\), is defined as:

\[\sigma_{XY} = \text{Cov}(X,Y) = E[(X - \mu_X)(Y - \mu_Y)]\]

Interpreting the formula

  • If \(X\) and \(Y\) tend to be simultaneously above their means or simultaneously below their means, their covariance will be positive.

  • If \(Y\) tends to be below its mean when \(X\) is above its mean, and vice versa, their covariance will be negative.

  • If \(X\) and \(Y\) have no systematic relationship, their covariance will be close to zero.

  • In general, the covariance describes the strength (magnitude) and direction (sign) of the linear relationship between \(X\) and \(Y\).

Computational shortcut

Covariance has a computational shortcut similar to that of variance.

\[\sigma_{XY}= E[XY] - \mu_X\mu_Y.\]

Its derivation is analogous to the derivation of the shortcut for variance. We leave it as an indepdent exercise.

Also note that computing covariance requires working with the joint probability mass function:

\[E[XY] = \sum_{(x,y)\in \text{supp}(X,Y)} xy \, p_{X,Y}(x,y)\]

Example💡: Salamander Insurance Company (SIC), Continued

Recall Salamander Insurance Company (SIC), who keeps track of the proababilities of moving violations (\(X\)) and accidents (\(Y\)) made by there customers.

\(x\) \ \(y\)

0

1

2

\(p_X(x)\)

0

0.58

0.015

0.005

0.60

1

0.18

0.058

0.012

0.25

2

0.02

0.078

0.002

0.10

3

0.02

0.029

0.001

0.05

\(p_Y(y)\)

0.80

0.18

0.02

1

SIC wants to know whether the number of moving violations (\(X\)) and the number accidents (\(Y\)) made by a customer are linearly associated. To answer this question, we must compute the covariance of the two random variables.

We already know:

  • \(E[X] = 0.6\) (average number of moving violations)

  • \(E[Y] = 0.22\) (average number of accidents)

from the previous examples. To calculate the covariance, we need to find \(E[XY]\).

\[\begin{split}E[XY] =& \sum_{(x,y)\in \text{supp}(X,Y)} xy \, p_{X,Y}(x,y) \\ =& 1 \cdot 1 \cdot 0.058 + 1 \cdot 2 \cdot 0.012 + 2 \cdot 1 \cdot 0.078 \\ &+ 2 \cdot 2 \cdot 0.002 + 3 \cdot 1 \cdot 0.029 + 3 \cdot 2 \cdot 0.001 \\ =& 0.339\end{split}\]

Now we can compute the covariance:

\[\begin{split}\text{Cov}(X,Y) &= E[XY] - E[X]E[Y] = 0.339 - 0.6 \cdot 0.22\\ &= 0.339 - 0.132 = 0.207\end{split}\]

The positive covariance indicates that customers with more moving violations tend to have more accidents, which aligns with our intuition about driving behavior. However, it is not easy to assess the strength of this relationship with covariance alone. To evaluate the strength more objectively, we now turn to our next topic.

5.5.2. Correlation: A Standardized Measure

The sign of the covariance tells us the direction of the relationship, but its magnitude is difficult to interpret since it depends on the scales of X and Y. For instance, if we measured X in inches and then converted to centimeters, the covariance would change even though the underlying relationship remains the same.

To address the scale dependency of covariance, we use correlation, which standardizes covariance to a value between -1 and +1.

Definition

The correlation between two discrete random variables \(X\) and \(Y\), denoted \(\rho_{XY}\), is defined as:

\[\rho_{XY} = \frac{\sigma_{XY}}{\sigma_X \sigma_Y}.\]

From the formula, we can say that the correlation is obtained by taking the covariance, then removing the scales of \(X\) and \(Y\) by diving by both \(\sigma_X\) and \(\sigma_Y\).

This standardization provides several advantages:

  • Correlation is always between -1 and +1.

  • A correlation of +1 indicates a perfect positive linear relationship.

  • A correlation of -1 indicates a perfect negative linear relationship.

  • A correlation of 0 suggests no linear relationship.

  • Being unitless, correlation allows for meaningful comparisons across different variable pairs.

Distributions with different correlations

Fig. 5.4 Plots of joint distributions witth varying degrees of correlation

Eaxample💡: Salamander Insurance Company (SIC), Continued

Let us compute the correlation between \(X\) and \(Y\) for an objective assessment of the strength of their linear relationship.

\(x\) \ \(y\)

0

1

2

\(p_X(x)\)

0

0.58

0.015

0.005

0.60

1

0.18

0.058

0.012

0.25

2

0.02

0.078

0.002

0.10

3

0.02

0.029

0.001

0.05

\(p_Y(y)\)

0.80

0.18

0.02

1

We already know:

  • \(E[X] = 0.6\) (average number of moving violations)

  • \(E[Y] = 0.22\) (average number of accidents)

  • \(E[X^2] = 1.1\)

  • \(Cov(X,Y) = 0.207\)

from the previous examples. To use the formula for correlation, we must find the standard deviations of \(X\) and \(Y\).

\[\begin{split}Var(X) &= E[X^2] - (E[X])^2\\ &= 1.1 - (0.6)^2 = 0.74\\ \sigma_X & = \sqrt{0.74} \approx 0.8602\end{split}\]
\[\begin{split}Var(Y) &= E[Y^2] - (E[Y])^2 \\ &= [(1^2)(0.18) + (2^2)(0.02)]- (0.22)^2\\ &= 0.26 - 0.0484 = 0.2116\\ \sigma_Y &= \sqrt{0.2116} = 0.46.\end{split}\]

Now the correlation is

\[\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y} = \frac{0.207}{(0.8602)(0.46)} \approx 0.5231.\]

We now see that the positive linear association between \(X\) and \(Y\) are moderate.

5.5.3. Independence and Covariance

Theorem: Independence implies Zero Covariance

If X and Y are independent random variables, then

\[\text{Cov}(X,Y) = 0.\]

Proof of theorem

We use the expectation independence property:

\[\begin{split}\begin{align} E[XY] &= \sum_{x,y: p_{X,Y}(x,y) > 0} xy \, p_{X,Y}(x,y) \\ &= \sum_{x: p_X(x) > 0} \sum_{y: p_Y(y) > 0} xy \, p_X(x)p_Y(y) \\ &= \sum_{x: p_X(x) > 0} x \, p_X(x) \sum_{y: p_Y(y) > 0} y \, p_Y(y) \\ &= E[X] \cdot E[Y] = \mu_X \mu_Y \end{align}\end{split}\]

Therefore,

\[\text{Cov}(X,Y) = E[XY] - \mu_X\mu_Y = \mu_X\mu_Y - \mu_X\mu_Y = 0.\]

This property is crucial because it allows us to determine when we can use the simpler variance formulas for sums of independent random variables. If covariance is non-zero, we must account for the dependence.

Zero Covariance Does Not Imply Independence

It’s important to note that the converse of the previous theorem is not always true—a zero covariance does not necessarily imply independence. This is because “no linear relationship” does not rule out other types of relationships. See Fig. 5.5 for some examples:

Dependent distributions with zero covariance

Fig. 5.5 Dependent distributions with zero covariance

5.5.4. Variance of Sums of Dependent Random Variables

When random variables are dependent, the variance of their sum (or difference) includes an additional term that accounts for their covariance:

\[\text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y) \pm 2\text{Cov}(X,Y)\]

For linear combinations:

\[\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\text{Cov}(X,Y)\]

These formulas highlight a critical insight: dependence between random variables can either increase or decrease the variance of their sum, depending on whether the covariance is positive (variables tend to move together) or negative (variables tend to offset each other).

For n dependent random variables, the formula extends to:

\[\text{Var}(X_1 + X_2 + \ldots + X_n) = \sum_{i=1}^{n} \text{Var}(X_i) + 2 \sum_{i=1}^{n} \sum_{j>i}^{n} \text{Cov}(X_i, X_j)\]

This formula includes the variance of each individual random variable plus the covariance between each pair of variables.

Example💡: SIC, Continued

SIC is planning a promotional offer based on a risk score \(Z = 2X + 5Y\), which combines both factors with different weights. The company wants to know the average value and standard deviation of the sum of these scores for its 35 customers.

\(x\) \ \(y\)

0

1

2

\(p_X(x)\)

0

0.58

0.015

0.005

0.60

1

0.18

0.058

0.012

0.25

2

0.02

0.078

0.002

0.10

3

0.02

0.029

0.001

0.05

\(p_Y(y)\)

0.80

0.18

0.02

1

Calculate the Expected Value of Z

The expected value of Z is:

\[\begin{split}E[Z] &= E[2X + 5Y] = 2E[X] + 5E[Y]\\ &= 2 \cdot 0.6 + 5 \cdot 0.22 = 1.2 + 1.1 = 2.3.\end{split}\]

For all 35 customers combined:

\[E[\sum_{i=1}^{35} Z_i] = 35 \cdot E[Z] = 35 \cdot 2.3 = 80.5\]

Calculate the Variance of Z

For a single customer, using the formula for the variance of a linear combination of dependent variables:

\[\begin{split}\begin{align} \text{Var}(Z) &= \text{Var}(2X + 5Y) \\ &= 2^2 \text{Var}(X) + 5^2 \text{Var}(Y) + 2 \cdot 2 \cdot 5 \cdot \text{Cov}(X,Y) \\ &= 4 \cdot 0.74 + 25 \cdot 0.2116 + 20 \cdot 0.207 \\ &= 2.96 + 5.29 + 4.14 \\ &= 12.39 \end{align}\end{split}\]

Now, assuming the 35 customers are independent of each other (one customer’s driving behavior doesn’t affect another’s), the variance of the sum is:

\[\text{Var}(\sum_{i=1}^{35} Z_i) = 35 \cdot \text{Var}(Z) = 35 \cdot 12.39 = 433.65\]

Calculate the Standard Deviation

The standard deviation is the square root of the variance:

\[\sigma_{\sum Z_i} = \sqrt{433.65} \approx 20.82\]

This standard deviation tells SIC how much typical variation to expect in the sum of risk scores across their 35 customers—valuable information for setting appropriate thresholds for their promotional offer.

The Effect of Dependence on Risk Assessment

It’s worth noting how the dependency between moving violations and accidents affects SIC’s risk calculations. If we had incorrectly assumed that X and Y were independent (ignoring their positive covariance of 0.207), the variance calculation would have been:

\[\begin{split}\begin{align} \text{Var}_{incorrect}(Z) &= 4 \cdot 0.74 + 25 \cdot 0.2116 \\ &= 2.96 + 5.29 \\ &= 8.25 \end{align}\end{split}\]

This would have resulted in an underestimation of the variance by approximately 33% and an underestimation of the standard deviation by about 18%. Such an error could lead to significant mispricing of insurance policies or inadequate risk management.

5.5.5. Bringing It All Together

Key Takeaways 📝

  1. Covariance measures how two random variables change together. Positive values indicate that they tend to move in the same direction, and negative values indicate opposite movements.

  2. Correlation standardizes covariance to a unitless measure between -1 and +1, making it easier to interpret the strength of relationships regardless of variable scales.

  3. Independent random variables have zero covariance, though zero covariance doesn’t necessarily imply independence.

  4. The variance of a linear combination of dependent random variables includes an additional term accounting for their covariances: \(Var(aX + bY) = a^2Var(X) + b^2Var(Y) + ab2Cov(X,Y)\).

  5. Positive covariance increases the variance of a sum, while negative covariance decreases it—reflecting how dependencies can either amplify or mitigate variability.

Understanding how random variables covary is essential for modeling complex systems where independence is the exception rather than the rule. While the mathematics becomes more involved when accounting for dependencies, the resulting models more faithfully represent reality, leading to better decisions and predictions.

In our next section, we’ll explore specific named discrete probability distributions that occur frequently in practice, beginning with the binomial distribution—a foundational model for many counting problems.

Exercises

  1. Understanding Covariance: Two discrete random variables X and Y have the joint PMF:

    x \y

    1

    2

    3

    2

    0.1

    0.2

    0.1

    4

    0.3

    0.2

    0.1

    1. Calculate the marginal PMFs for \(X\) and \(Y\).

    2. Calculate \(E[X], E[Y], Var(X)\), and \(Var(Y)\).

    3. Calculate the covariance between \(X\) and \(Y\).

    4. Calculate the correlation between \(X\) and \(Y\).

    5. Interpret the meaning of the correlation in context.

  2. Dependent vs. Independent Sums: Random variables \(X\) and \(Y\) have

    \(Var(X) = 4\), \(Var(Y) = 9\), and \(Cov(X,Y) = -3\).

    1. Calculate \(Var(X + Y)\) accounting for their dependence.

    2. What would \(Var(X + Y)\) be if \(X\) and \(Y\) were independent?

    3. Explain why the variance is lower in the dependent case here.

  3. Linear Combinations with Dependence: Random variables \(X\), \(Y\), and \(Z\) have variances of 2, 3, and 4 respectively. The covariances are

    \(Cov(X,Y) = 1\), \(Cov(X,Z) = -1\), and \(Cov(Y,Z) = 2\).

    1. Calculate \(Var(X + Y + Z)\).

    2. Calculate \(Var(2X - Y + 3Z)\).

    3. Would a portfolio split equally among these three variables have more or less risk than investing in just one variable? Explain.

  4. Insurance Portfolio: An insurance company offers three types of policies: auto (A), home (H), and life (L). The annual claims (in thousands of dollars) for each policy type are random variables with

    • \(E[A] = 1.5, E[H] = 2.0, E[L] = 5.0\)

    • \(Var(A) = 2.25, Var(H) = 4.0 Var(L) = 16.0\)

    • \(Cov(A,H) = 1.5, Cov(A,L) = 0.5, Cov(H,L) = 1.0\).

    1. If the company has 100 auto policies, 80 home policies, and 50 life policies, calculate the expected total annual claims.

    2. Calculate the standard deviation of the total annual claims.

    3. What would the standard deviation be if the claims from different policy types were independent?

  5. Proving Properties: Suppose \(X\) and \(Y\) are dependent random variables. Show that the following statements are true:

    1. \(Var(X - Y) = Var(X) + Var(Y) - 2Cov(X,Y)\).

    2. For any constant \(c\), \(Cov(X,c) = 0\).

    3. \(Cov(X,X) = Var(X)\).

    4. If \(Z = aX + bY\), then \(Cov(X,Z) = aVar(X) + bCov(X,Y)\).