.. _5-5-covariance-of-dependent-rvs:


.. raw:: html

   <div class="video-placeholder">
     <iframe
       src="https://www.youtube.com/embed/xP5_W5ZtBYs?list=PLHKEwTHXfbagA3ybKLAcEJriGT-6k89c6"
       allowfullscreen>
     </iframe>
   </div>

Covariance of Dependent Random Variables
==================================================================

Many real-world scenarios involve random variables that **influence each other**—driving violations 
may correlate with accident rates, stock prices often move together, and rainfall affects crop yields. 
When random variables are dependent, their joint behavior becomes more complex, requiring us to understand 
how they vary together.

.. admonition:: Road Map 🧭
   :class: important

   • Introduce **covariance**, a measure of how random variables change together.
   • Define **correlation** as a standardized measure of relationship strength.
   • Extend variance formulas for **sums of dependent random variables**.
   • Explore the **independence property** and its effect on covariance.

Beyond Independence: Understanding Covariance
-----------------------------------------------

When analyzing two random variables :math:`X` and :math:`Y` together, we often want to know: 
When :math:`X` is large, does :math:`Y` also tend to be larger? 
Or do :math:`X` and :math:`Y` tend to move in the opposite direction? 
Covariance provides a mathematical way to measure this relationship.

Definition
~~~~~~~~~~~~~

The **covariance** between two discrete random variables :math:`X` and :math:`Y`, 
denoted :math:`Cov(X,Y)` or :math:`\sigma_{XY}`, is defined as:

.. math::

   \sigma_{XY} = \text{Cov}(X,Y) = E[(X - \mu_X)(Y - \mu_Y)]

Interpreting the formula
~~~~~~~~~~~~~~~~~~~~~~~~~~~

- If :math:`X` and :math:`Y` tend to be simultaneously above their means or simultaneously 
  below their means, their covariance will be positive.
- If :math:`Y` tends to be below its mean when :math:`X` is above its mean, and vice versa, 
  their covariance will be negative.
- If :math:`X` and :math:`Y` have no systematic relationship, their covariance will be close to zero.
- In general, the covariance describes the strength (magnitude) and direction (sign) of
  the **linear relationship** between :math:`X` and :math:`Y`.

Computational shortcut
~~~~~~~~~~~~~~~~~~~~~~~~

Covariance has a computational shortcut similar to that of variance.

.. math:: 
   \sigma_{XY}= E[XY] - \mu_X\mu_Y.

Its derivation is analogous to the derivation of the shortcut for variance.
We leave it as an indepdent exercise.

Also note that computing covariance requires working with the joint probability mass function:

.. math::

   E[XY] = \sum_{(x,y)\in \text{supp}(X,Y)} xy \, p_{X,Y}(x,y)


.. admonition:: Example💡: Salamander Insurance Company (SIC), Continued
   :class: note

   Recall Salamander Insurance Company (SIC), who keeps track of the proababilities of
   moving violations (:math:`X`) and accidents (:math:`Y`) made by there customers. 

   .. flat-table:: 
      :header-rows: 1
      :width: 80%
      :align: center

      * - :math:`x` \\ :math:`y`
        - 0
        - 1
        - 2
        - :math:`p_X(x)`
      * - 0
        - 0.58
        - 0.015
        - 0.005
        - 0.60
      * - 1
        - 0.18
        - 0.058
        - 0.012
        - 0.25
      * - 2
        - 0.02
        - 0.078
        - 0.002
        - 0.10
      * - 3
        - 0.02
        - 0.029
        - 0.001
        - 0.05
      * - :math:`p_Y(y)`
        - 0.80
        - 0.18
        - 0.02
        - 1

   SIC wants to know whether the number of moving violations (:math:`X`) and the number 
   accidents (:math:`Y`) made by a customer are linearly associated. To answer this
   question, we must compute the covariance of the two random variables.

   We already know:

   - :math:`E[X] = 0.6` (average number of moving violations)
   - :math:`E[Y] = 0.22` (average number of accidents)

   from the previous examples. To calculate the covariance, we need to find :math:`E[XY]`.

   .. math::
      E[XY] =& \sum_{(x,y)\in \text{supp}(X,Y)} xy \, p_{X,Y}(x,y) \\
      =& 1 \cdot 1 \cdot 0.058 + 1 \cdot 2 \cdot 0.012 + 2 \cdot 1 \cdot 0.078 \\
      &+ 2 \cdot 2 \cdot 0.002 + 3 \cdot 1 \cdot 0.029 + 3 \cdot 2 \cdot 0.001 \\
      =& 0.339

   Now we can compute the covariance:

   .. math::

      \text{Cov}(X,Y) &= E[XY] - E[X]E[Y] = 0.339 - 0.6 \cdot 0.22\\
      &= 0.339 - 0.132 = 0.207

   The positive covariance indicates that customers with more moving 
   violations tend to have more accidents, which aligns with our 
   intuition about driving behavior. However, it is not easy to assess 
   the strength of this relationship with covariance alone. 
   To evaluate the strength more objectively, we now turn to our next topic.

Correlation: A Standardized Measure
-------------------------------------

The sign of the covariance tells us the direction of the relationship, 
but its **magnitude is difficult to interpret** since it depends on the scales of X and Y. 
For instance, if we measured X in inches and then converted to centimeters, 
the covariance would change even though the underlying relationship remains the same.

To address the scale dependency of covariance, we use correlation, which **standardizes 
covariance to a value between -1 and +1**.

Definition
~~~~~~~~~~~~

The **correlation** between two discrete random variables :math:`X` and :math:`Y`, denoted 
:math:`\rho_{XY}`, is defined as:

.. math::

   \rho_{XY} = \frac{\sigma_{XY}}{\sigma_X \sigma_Y}.

From the formula, we can say that the correlation is obtained by **taking the covariance,
then removing the scales** of :math:`X` and :math:`Y` by diving by both :math:`\sigma_X` 
and :math:`\sigma_Y`.

This standardization provides several advantages:

- Correlation is always between -1 and +1.
- A correlation of +1 indicates a perfect positive linear relationship.
- A correlation of -1 indicates a perfect negative linear relationship.
- A correlation of 0 suggests no linear relationship.
- Being unitless, correlation allows for meaningful comparisons across different variable pairs.

.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter5/correlation-plots.png
   :alt: Distributions with different correlations
   :align: center
   :figwidth: 90%

   Plots of joint distributions witth varying degrees of correlation 

.. admonition:: Eaxample💡: Salamander Insurance Company (SIC), Continued
   :class: note

   Let us compute the correlation between :math:`X` and :math:`Y` for an
   objective assessment of the strength of their linear relationship.

   .. flat-table:: 
      :header-rows: 1
      :width: 80%
      :align: center

      * - :math:`x` \\ :math:`y`
        - 0
        - 1
        - 2
        - :math:`p_X(x)`
      * - 0
        - 0.58
        - 0.015
        - 0.005
        - 0.60
      * - 1
        - 0.18
        - 0.058
        - 0.012
        - 0.25
      * - 2
        - 0.02
        - 0.078
        - 0.002
        - 0.10
      * - 3
        - 0.02
        - 0.029
        - 0.001
        - 0.05
      * - :math:`p_Y(y)`
        - 0.80
        - 0.18
        - 0.02
        - 1

   We already know:

   - :math:`E[X] = 0.6` (average number of moving violations)
   - :math:`E[Y] = 0.22` (average number of accidents)
   - :math:`E[X^2] = 1.1`
   - :math:`Cov(X,Y) = 0.207`

   from the previous examples. To use the formula for correlation, we must find
   the standard deviations of :math:`X` and :math:`Y`.

   .. math:: 
      Var(X) &= E[X^2] - (E[X])^2\\
      &= 1.1 - (0.6)^2 = 0.74\\
      \sigma_X & = \sqrt{0.74} \approx 0.8602 

   .. math::    
      Var(Y) &= E[Y^2] - (E[Y])^2 \\
      &= [(1^2)(0.18) + (2^2)(0.02)]- (0.22)^2\\
      &= 0.26 - 0.0484 = 0.2116\\
      \sigma_Y &= \sqrt{0.2116} = 0.46.
   
   Now the correlation is

   .. math::

      \rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y} 
      = \frac{0.207}{(0.8602)(0.46)} \approx 0.5231.

   We now see that the positive linear association between
   :math:`X` and :math:`Y` are moderate.

Independence and Covariance
-----------------------------

Theorem: Independence implies Zero Covariance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If X and Y are independent random variables, then

.. math::

   \text{Cov}(X,Y) = 0.

Proof of theorem
^^^^^^^^^^^^^^^^^^
   We use the expectation independence property:

   .. math::

      \begin{align}
      E[XY] &= \sum_{x,y: p_{X,Y}(x,y) > 0} xy \, p_{X,Y}(x,y) \\
      &= \sum_{x: p_X(x) > 0} \sum_{y: p_Y(y) > 0} xy \, p_X(x)p_Y(y) \\
      &= \sum_{x: p_X(x) > 0} x \, p_X(x) \sum_{y: p_Y(y) > 0} y \, p_Y(y) \\
      &= E[X] \cdot E[Y] = \mu_X \mu_Y
      \end{align}

   Therefore,

   .. math::

      \text{Cov}(X,Y) = E[XY] - \mu_X\mu_Y = \mu_X\mu_Y - \mu_X\mu_Y = 0.

This property is crucial because it allows us to determine when we can use the simpler variance formulas for sums of independent random variables. If covariance is non-zero, we must account for the dependence.

Zero Covariance Does Not Imply Independence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's important to note that the converse of the previous theorem is not always 
true—**a zero covariance does not necessarily imply independence**. 
This is because "no linear relationship" does not rule out other types of relationships.
See :numref:`zero-cov-dependent-plots` for some examples:

.. _zero-cov-dependent-plots:
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter5/zero-cov-dependent-plots.png
   :alt: Dependent distributions with zero covariance
   :align: center
   :figwidth: 90%

   Dependent distributions with zero covariance

Variance of Sums of Dependent Random Variables
-------------------------------------------------

When random variables are dependent, the variance of their sum (or difference) 
includes an additional term that accounts for their covariance:

.. math::

   \text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y) \pm 2\text{Cov}(X,Y)

For linear combinations:

.. math::

   \text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\text{Cov}(X,Y)

These formulas highlight a critical insight: dependence between random variables can either increase or decrease the variance of their sum, depending on whether the covariance is positive (variables tend to move together) or negative (variables tend to offset each other).

For n dependent random variables, the formula extends to:

.. math::

   \text{Var}(X_1 + X_2 + \ldots + X_n) = \sum_{i=1}^{n} \text{Var}(X_i) + 2 \sum_{i=1}^{n} \sum_{j>i}^{n} \text{Cov}(X_i, X_j)

This formula includes the variance of each individual random variable plus the covariance between each pair of variables.

.. admonition:: Example💡: SIC, Continued
   :class: note 

   SIC is planning a promotional offer based on a risk score :math:`Z = 2X + 5Y`, 
   which combines both factors with different weights. The company wants 
   to know the average value and standard deviation of **the sum of these scores for
   its 35 customers**.

   .. flat-table:: 
      :header-rows: 1
      :width: 80%
      :align: center

      * - :math:`x` \\ :math:`y`
        - 0
        - 1
        - 2
        - :math:`p_X(x)`
      * - 0
        - 0.58
        - 0.015
        - 0.005
        - 0.60
      * - 1
        - 0.18
        - 0.058
        - 0.012
        - 0.25
      * - 2
        - 0.02
        - 0.078
        - 0.002
        - 0.10
      * - 3
        - 0.02
        - 0.029
        - 0.001
        - 0.05
      * - :math:`p_Y(y)`
        - 0.80
        - 0.18
        - 0.02
        - 1

   **Calculate the Expected Value of Z**

   The expected value of Z is:

   .. math::

      E[Z] &= E[2X + 5Y] = 2E[X] + 5E[Y]\\ 
      &= 2 \cdot 0.6 + 5 \cdot 0.22 = 1.2 + 1.1 = 2.3.

   For all 35 customers combined:

   .. math::

      E[\sum_{i=1}^{35} Z_i] = 35 \cdot E[Z] = 35 \cdot 2.3 = 80.5

   **Calculate the Variance of Z**

   For a single customer, using the formula for the variance of a linear combination of dependent variables:

   .. math::

      \begin{align}
      \text{Var}(Z) &= \text{Var}(2X + 5Y) \\
      &= 2^2 \text{Var}(X) + 5^2 \text{Var}(Y) + 2 \cdot 2 \cdot 5 \cdot \text{Cov}(X,Y) \\
      &= 4 \cdot 0.74 + 25 \cdot 0.2116 + 20 \cdot 0.207 \\
      &= 2.96 + 5.29 + 4.14 \\
      &= 12.39
      \end{align}

   Now, assuming the 35 customers are independent of each other (one customer's driving behavior doesn't affect another's), the variance of the sum is:

   .. math::

      \text{Var}(\sum_{i=1}^{35} Z_i) = 35 \cdot \text{Var}(Z) = 35 \cdot 12.39 = 433.65

   **Calculate the Standard Deviation**

   The standard deviation is the square root of the variance:

   .. math::

      \sigma_{\sum Z_i} = \sqrt{433.65} \approx 20.82

   This standard deviation tells SIC how much typical variation to expect in the sum of 
   risk scores across their 35 customers—valuable information for setting appropriate thresholds for their promotional offer.

   **The Effect of Dependence on Risk Assessment**


   It's worth noting how the dependency between moving violations and accidents 
   affects SIC's risk calculations. If we had incorrectly assumed that X and Y were independent 
   (ignoring their positive covariance of 0.207), the variance calculation would have been:

   .. math::

      \begin{align}
      \text{Var}_{incorrect}(Z) &= 4 \cdot 0.74 + 25 \cdot 0.2116 \\
      &= 2.96 + 5.29 \\
      &= 8.25
      \end{align}

   This would have resulted in an underestimation of the variance by approximately 33% and 
   an underestimation of the standard deviation by about 18%. Such an error could lead to 
   significant mispricing of insurance policies or inadequate risk management.

Bringing It All Together
--------------------------

.. admonition:: Key Takeaways 📝
   :class: important

   1. **Covariance** measures how two random variables change together. Positive 
      values indicate that they tend to move in the same direction, and negative values 
      indicate opposite movements.
   
   2. **Correlation** standardizes covariance to a unitless measure between -1 and +1, 
      making it easier to interpret the strength of relationships regardless of variable scales.
   
   3. Independent random variables have **zero covariance**, though zero covariance 
      doesn't necessarily imply independence.
   
   4. The **variance of a linear combination of dependent random variables** includes an additional 
      term accounting for their covariances: :math:`Var(aX + bY) = a^2Var(X) + b^2Var(Y) + ab2Cov(X,Y)`.
   
   5. Positive covariance **increases the variance** of a sum, while negative 
      covariance **decreases it**—reflecting how dependencies can either amplify 
      or mitigate variability.

Understanding how random variables covary is essential for modeling complex systems
where independence is the exception rather than the rule. While the mathematics becomes 
more involved when accounting for dependencies, the resulting models more faithfully 
represent reality, leading to better decisions and predictions.

In our next section, we'll explore specific named discrete probability distributions 
that occur frequently in practice, beginning with the binomial distribution—a foundational 
model for many counting problems.

Exercises
~~~~~~~~~~~~~

1. **Understanding Covariance**: Two discrete random variables X and Y have the joint PMF:

   .. list-table::
      :header-rows: 1
      :width: 60%
      :align: center

      * - x \\\ y
        - 1
        - 2
        - 3
      * - 2
        - 0.1
        - 0.2
        - 0.1
      * - 4
        - 0.3
        - 0.2
        - 0.1

   a) Calculate the marginal PMFs for :math:`X` and :math:`Y`.
   b) Calculate :math:`E[X], E[Y], Var(X)`, and :math:`Var(Y)`.
   c) Calculate the covariance between :math:`X` and :math:`Y`.
   d) Calculate the correlation between :math:`X` and :math:`Y`.
   e) Interpret the meaning of the correlation in context.

2. **Dependent vs. Independent Sums**: Random variables :math:`X` and :math:`Y` have 

   :math:`Var(X) = 4`, :math:`Var(Y) = 9`, and :math:`Cov(X,Y) = -3`.

   a) Calculate :math:`Var(X + Y)` accounting for their dependence.
   b) What would :math:`Var(X + Y)` be if :math:`X` and :math:`Y` were independent?
   c) Explain why the variance is lower in the dependent case here.

3. **Linear Combinations with Dependence**: Random variables :math:`X`, :math:`Y`, and :math:`Z` have 
   variances of 2, 3, and 4 respectively. The covariances are 
   
   :math:`Cov(X,Y) = 1`, :math:`Cov(X,Z) = -1`, and :math:`Cov(Y,Z) = 2`.

   a) Calculate :math:`Var(X + Y + Z)`.
   b) Calculate :math:`Var(2X - Y + 3Z)`.
   c) Would a portfolio split equally among these three variables have more or less 
      risk than investing in just one variable? Explain.

4. **Insurance Portfolio**: An insurance company offers three types of policies: 
   auto (A), home (H), and life (L). The annual claims (in thousands of dollars) 
   for each policy type are random variables with 
   
   * :math:`E[A] = 1.5, E[H] = 2.0, E[L] = 5.0` 
   * :math:`Var(A) = 2.25, Var(H) = 4.0  Var(L) = 16.0`
   * :math:`Cov(A,H) = 1.5, Cov(A,L) = 0.5, Cov(H,L) = 1.0`.

   a) If the company has 100 auto policies, 80 home policies, and 50 life policies, 
      calculate the expected total annual claims.
   b) Calculate the standard deviation of the total annual claims.
   c) What would the standard deviation be if the claims from different policy types were independent?

5. **Proving Properties**: Suppose :math:`X` and :math:`Y` are dependent random variables.
   Show that the following statements are true:

   a) :math:`Var(X - Y) = Var(X) + Var(Y) - 2Cov(X,Y)`.
   b) For any constant :math:`c`, :math:`Cov(X,c) = 0`.
   c) :math:`Cov(X,X) = Var(X)`.
   d) If :math:`Z = aX + bY`, then :math:`Cov(X,Z) = aVar(X) + bCov(X,Y)`.