Worksheet 9: The Normal Distribution

Learning Objectives 🎯

Understand the Normal (Gaussian) distribution and its properties
Apply the Empirical Rule (68-95-99.7 rule) for quick probability estimates
Master the standard Normal distribution and z-score transformations
Use Normal tables to calculate probabilities and percentiles
Solve forward problems (finding probabilities) and backward problems (finding values)
Verify Normal distribution calculations using R

Introduction

We have previously discussed the general form of probability density functions and cumulative distribution functions, as well as two fundamental examples of named continuous distributions: the Uniform and Exponential distributions. Many real-world measurements, such as heights and measurement errors, naturally cluster around an average in a roughly symmetrical, bell-shaped pattern. The Gaussian distribution, also called the Normal distribution, is the most widely used model for describing this shape. Sums or averages of many independent non-Normal random variables often converge to a Normal distribution, which makes it a cornerstone of statistical inference and practical applications.

In this worksheet, we explore the key properties of the Normal distribution, show how the standard Normal serves as a reference distribution, and practice essential probability calculations for finding areas and percentiles under the bell-shaped curve.

Part 1: The Normal Distribution

A continuous Normal random variable \(X\) has support the entire real line \(\mathbb{R} = (-\infty, +\infty)\), and is defined by the following probability density function:

\[f_X(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\]

The distribution has the following properties: it is symmetrical, unimodal, and bell shaped (concave down at the center and then concave up starting about one standard deviation on each side). It is completely determined by two parameters, the mean \(\mu\) and standard deviation \(\sigma\). However, computing probabilities from this distribution directly is non-trivial since the cumulative distribution function has no closed-form solution. Instead, probabilities must be obtained through numerical methods, such as integration algorithms, statistical tables, or software.

Note

We denote a Normal distribution with mean \(\mu\) and standard deviation \(\sigma\) as \(X \sim N(\mu, \sigma)\). Some textbooks use variance \(\sigma^2\) instead, so always check the notation!

The Standard Normal Distribution

A continuous Normal random variable \(Z\) with mean \(\mu = 0\) and standard deviation \(\sigma = 1\) is known as the standard Normal distribution and is defined by the following density function:

\[f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}\]

The standard Normal distribution will serve as a reference distribution for our probability calculations for Normal random variables.

The Empirical Rule

One of the most useful properties of the Normal distribution is the Empirical Rule, also known as the 68-95-99.7 rule. This rule provides a quick way to estimate probabilities based on how data is distributed around the mean.

Empirical rule showing 68-95-99.7 percentages — Fig. 1 The empirical rule provides quick probability estimates for any normal distribution

Specifically, for any Normal distribution:

Approximately 68% of values fall within one standard deviation of the mean (\(\mu \pm \sigma\))
Approximately 95% of values fall within two standard deviations of the mean (\(\mu \pm 2\sigma\))
Approximately 99.7% of values fall within three standard deviations of the mean (\(\mu \pm 3\sigma\))

Key Connection 🔗

The Empirical Rule percentages correspond to specific z-scores:

68% ↔ z = ±1
95% ↔ z = ±1.96 (more precisely)
99.7% ↔ z = ±3

Understanding this connection helps bridge the Empirical Rule with z-score calculations!

Question 1: A manufacturing company produces metal rods that follow a Normal distribution with a mean length of 50 cm and a standard deviation of 2 cm.

Let \(X\) denote the length of a randomly selected rod, so \(X \sim N(\mu = 50, \sigma = 2)\).

First, fill in the Empirical Rule diagram for this specific distribution:

Using the Empirical Rule, determine the probability that a randomly selected rod has a length within the following ranges. Clearly write out the probability statement before computing each value.
1. Greater than 52 cm.
  
  :math:`P(X > 52) = ` ____
  
  Show your reasoning:
2. Less than 48 cm.
  
  :math:`P(X < 48) = ` ____
  
  Show your reasoning:
3. Between 44 cm and 54 cm.
  
  :math:`P(44 < X < 54) = ` ____
  
  Show your reasoning:
4. Greater than 54 cm or less than 44 cm.
  
  :math:`P({X > 54} cup {X < 44}) = ` ____
  
  Show your reasoning:

# Verify your Empirical Rule calculations
mu <- 50
sigma <- 2

# Enter your calculated probabilities
your_prob_i <- ___    # P(X > 52)
your_prob_ii <- ___   # P(X < 48)
your_prob_iii <- ___  # P(44 < X < 54)
your_prob_iv <- ___   # P(X > 54 or X < 44)

# Check against exact Normal probabilities
exact_i <- 1 - pnorm(52, mean = mu, sd = sigma)
exact_ii <- pnorm(48, mean = mu, sd = sigma)
exact_iii <- pnorm(54, mean = mu, sd = sigma) - pnorm(44, mean = mu, sd = sigma)
exact_iv <- pnorm(44, mean = mu, sd = sigma) + (1 - pnorm(54, mean = mu, sd = sigma))

# Display comparison
cat("Comparison of Empirical Rule vs Exact:\n")
cat(sprintf("i.   Your answer: %.3f, Exact: %.4f\n", your_prob_i, exact_i))
cat(sprintf("ii.  Your answer: %.3f, Exact: %.4f\n", your_prob_ii, exact_ii))
cat(sprintf("iii. Your answer: %.3f, Exact: %.4f\n", your_prob_iii, exact_iii))
cat(sprintf("iv.  Your answer: %.3f, Exact: %.4f\n", your_prob_iv, exact_iv))

Suppose you randomly select a rod from a pile labeled “greater than 52 cm”. Using the Empirical Rule, determine the probability that the rod is also greater than 54 cm.

:math:`P(X > 54 | X > 52) = frac{P(X > 54)}{P(X > 52)} = frac{___}{___} = ` ____
Using the Empirical Rule determine the 84th percentile of rod lengths.

The 84th percentile is: ____ cm

Explain your reasoning:
A quality control engineer decides that rods must be between 46 cm and 54 cm to be considered within acceptable tolerance limits. A batch contains 10,000 rods, and rods are considered defective if their lengths fall outside the range 46 cm to 54 cm. How many rods would be expected to fail the quality check?

Probability of failure: ____

Expected number of defective rods: ____

Part 2: The Standard Normal Table

While the Empirical Rule provides useful approximations, we need more precise methods for exact probability calculations. The standard Normal table provides cumulative probabilities for the standard Normal distribution.

Note

The standard Normal table is available on Brightspace under “Extra Documents”. You can also access an online version at: https://www.z-table.com/

Alternatively, after learning the table method, you can verify with R using pnorm() and qnorm() functions.

The cumulative distribution function (CDF) of the standard Normal is denoted \(\Phi(z)\):

\[\Phi(z) = P(Z \leq z) \text{ where } Z \sim N(0, 1)\]

Question 2: Using the standard normal table answer the following questions.

Tip

If you get stuck, draw the normal curve and shade the region you need to find before using the table!

Compute the following probabilities:
1. :math:`Phi(2.34) = P(Z leq 2.34) = ` ____
2. :math:`P(Z > -0.12) = ` ____
  
  (Hint: Use symmetry or complement rule)
3. :math:`P(-1.64 < Z < 1.64) = ` ____
  
  Show your work: :math:`P(Z < 1.64) - P(Z < -1.64) = ` ____ - ____ = ____
Determine the following percentiles (find \(z_p\)):
1. \(P(Z < z_p) = 0.9192\)
  
  :math:`z_p = ` ____
2. \(P(Z < z_p) = 0.95\)
  
  :math:`z_p = ` ____
  
  (Note: You may need to interpolate between table values)
Determine the following upper percentiles:
1. \(P(Z > z_p) = 0.017\)
  
  This is equivalent to :math:`P(Z < z_p) = ` ____, so :math:`z_p = ` ____
2. \(P(Z > z_p) = 0.9990\)
  
  :math:`z_p = ` ____
Determine two values a and b symmetric about zero such that \(P(a \leq Z \leq b) = 0.95\).

Since the values are symmetric: :math:`P(Z < a) = ` ____ and :math:`P(Z < b) = ` ____

Therefore: :math:`a = ` ____ and :math:`b = ` ____

# Verify your standard Normal calculations

# Part (a) - Forward problems
your_2a_i <- ___    # Φ(2.34)
your_2a_ii <- ___   # P(Z > -0.12)
your_2a_iii <- ___  # P(-1.64 < Z < 1.64)

exact_2a_i <- pnorm(2.34)
exact_2a_ii <- 1 - pnorm(-0.12)
exact_2a_iii <- pnorm(1.64) - pnorm(-1.64)

cat("Part (a) Verification:\n")
cat(sprintf("i.   Your: %.4f, R: %.4f\n", your_2a_i, exact_2a_i))
cat(sprintf("ii.  Your: %.4f, R: %.4f\n", your_2a_ii, exact_2a_ii))
cat(sprintf("iii. Your: %.4f, R: %.4f\n", your_2a_iii, exact_2a_iii))

# Part (b) - Backward problems
your_2b_i <- ___    # z_p for P(Z < z_p) = 0.9192
your_2b_ii <- ___   # z_p for P(Z < z_p) = 0.95

exact_2b_i <- qnorm(0.9192)
exact_2b_ii <- qnorm(0.95)

cat("\nPart (b) Verification:\n")
cat(sprintf("i.  Your z_p: %.2f, R: %.4f\n", your_2b_i, exact_2b_i))
cat(sprintf("ii. Your z_p: %.3f, R: %.4f\n", your_2b_ii, exact_2b_ii))

Part 3: Z-Score Transformation

Not all Normal distributions are standard. We use the z-score transformation to convert any Normal variable to standard form:

\[z = \frac{x - \mu}{\sigma}\]

Problem-Solving Framework 📋

Forward Problems (given value, find probability): 1. Draw and shade the region 2. Convert to z-score: \(z = \frac{x - \mu}{\sigma}\) 3. Use standard Normal table 4. Write conclusion in context

Backward Problems (given probability, find value): 1. Draw and identify the point 2. Find z-score from table 3. Convert back: \(x = \mu + \sigma \cdot z\) 4. Write conclusion in context

Question 3: Let \(X\) be a Normal random variable with mean \(\mu = 10\) and variance \(\sigma^2 = 25\).

First, identify the standard deviation: :math:`sigma = sqrt{25} = ` ____

Using the forward process determine the following probabilities.
1. \(P(X < 19.8)\)
  
  Step 1: Convert to z-score: :math:`z = frac{19.8 - 10}{__} = ` ____
  
  Step 2: :math:`P(X < 19.8) = P(Z < __) = ` ____
2. \(P(X > -10)\)
  
  z-score: ____
  
  :math:`P(X > -10) approx ` ____
3. \(P(0.2 < X < 19.8)\)
  
  Lower z-score: :math:`z_1 = ` ____
  
  Upper z-score: :math:`z_2 = ` ____
  
  :math:`P(0.2 < X < 19.8) = P(__ < Z < __) = ` ____
Using the backward process determine the following percentiles:
1. Find \(x_p\) such that \(P(X < x_p) = 0.7486\)
  
  Step 1: Find z-score: :math:`z_p = ` ____
  
  Step 2: Convert to x: :math:`x_p = 10 + 5 cdot ` ____ = ____
2. Find \(x_p\) such that \(P(X < x_p) = 0.8\)
  
  :math:`z_p = ` ____ (closest table value)
  
  :math:`x_p = ` ____
Find \(x_p\) such that \(P(X > x_p) = 0.99\)

This is equivalent to :math:`P(X < x_p) = ` ____

:math:`z_p = ` ____

:math:`x_p = ` ____
Determine the interquartile range (IQR). Find \(x_1\) and \(x_2\) such that \(P(x_1 \leq X \leq x_2) = 0.5\) with 25% in each tail.

First quartile (25th percentile): :math:`z_{0.25} = ` ____, so :math:`x_1 = ` ____

Third quartile (75th percentile): :math:`z_{0.75} = ` ____, so :math:`x_2 = ` ____

IQR = :math:`x_2 - x_1 = ` ____

Question 4: An insurance company is analyzing policyholders who have all survived to age 60. Historical data suggests that their eventual ages at death can be approximated by a Normal distribution with mean of 85 years and a standard deviation of 4 years.

Let \(X \sim N(\mu = 85, \sigma = 4)\) represent age at death.

A newly turned 60-year-old policyholder asks, “What is the probability I will live past 90?”

z-score for 90: ____

:math:`P(X > 90) = ` ____
The insurance company wants to define an age cutoff above which only 5% of these policyholders will live. Find the 95th percentile.

\(P(X < x_{0.95}) = 0.95\)

:math:`z_{0.95} = ` ____ (from table or use 1.645)

:math:`x_{0.95} = ` ____ years

R Visualization Exercise 🖥️

Creating Normal Distribution Visualizations with R

After completing the problems above, create visualizations to better understand the Normal distribution. Use your AI assistant with these prompts:

Step 1: Basic Normal Curve “Help me create a ggplot2 visualization of a Normal distribution with mean 85 and SD 4. Show the density curve and shade the area above 90.”

Step 2: Empirical Rule Visualization “Create a Normal distribution plot that clearly shows the 68-95-99.7 rule with different colors for each region. Add vertical lines at each standard deviation.”

Step 3: Interactive Probability Calculator “Help me build a simple Shiny app where I can input mean, SD, and a value, then see the probability and shaded region on a Normal curve.”

# Complete verification and visualization
library(ggplot2)

# Question 4 visualization
mu <- 85
sigma <- 4

# Create the distribution
x <- seq(mu - 4*sigma, mu + 4*sigma, length.out = 1000)
y <- dnorm(x, mean = mu, sd = sigma)

df <- data.frame(x = x, y = y)

# Your calculations
your_prob_past_90 <- ___
your_95th_percentile <- ___

# Create plot
p <- ggplot(df, aes(x = x, y = y)) +
  geom_line(size = 1.2, color = "darkblue") +
  geom_area(data = subset(df, x > 90),
            aes(x = x, y = y),
            fill = "red", alpha = 0.3) +
  geom_vline(xintercept = c(mu, 90),
             linetype = c("solid", "dashed"),
             color = c("black", "red")) +
  geom_vline(xintercept = your_95th_percentile,
             linetype = "dotted", color = "green", size = 1) +
  labs(title = "Life Expectancy Distribution for 60+ Policyholders",
       subtitle = sprintf("Your P(X > 90) = %.4f, Your 95th percentile = %.2f years",
                        your_prob_past_90, your_95th_percentile),
       x = "Age at Death", y = "Density") +
  theme_minimal() +
  annotate("text", x = 92, y = max(y)*0.5,
           label = "P(X > 90)", color = "red")

print(p)

# Verify calculations
exact_prob <- 1 - pnorm(90, mean = mu, sd = sigma)
exact_percentile <- qnorm(0.95, mean = mu, sd = sigma)

cat("\nVerification:\n")
cat(sprintf("P(X > 90): Your answer = %.4f, Exact = %.4f\n",
            your_prob_past_90, exact_prob))
cat(sprintf("95th percentile: Your answer = %.2f, Exact = %.2f\n",
            your_95th_percentile, exact_percentile))

Key Takeaways

Summary 📝

The Normal distribution is bell-shaped, symmetric, and determined by mean \(\mu\) and SD \(\sigma\)
The Empirical Rule: approximately 68% within \(\pm 1\sigma\), 95% within \(\pm 2\sigma\), 99.7% within \(\pm 3\sigma\)
The standard Normal has \(\mu = 0, \sigma = 1\) and serves as reference distribution
Z-score transformation: \(z = \frac{x - \mu}{\sigma}\) standardizes any Normal variable
Forward problems: given value, find probability (standardize, then use table)
Backward problems: given probability, find value (use table to find z, then unstandardize)
The CDF is denoted \(\Phi(z) = P(Z \leq z)\) for standard Normal
R functions pnorm() and qnorm() can verify table calculations