Slides 📊

6.4. Normal Distribution

We now encounter the most important continuous distribution in all of statistics: the normal distribution.

Road Map 🧭

Understand the historical development and significance of the normal distribution.
Master the mathematical definition and properties of the normal PDF.
Explore how the parameters $\mu$ and $\sigma$ control location and shape.
Learn the famous empirical rule for quick probability estimates.
Understand why standardization is essential for normal computations.

6.4.1. The Historical Legacy: From Gauss to Modern Statistics

The normal distribution carries a rich mathematical heritage spanning over two centuries. While often called the “Gaussian distribution” in honor of Carl Friedrich Gauss (1777-1855), the distribution’s development involved several brilliant mathematicians who recognized patterns in natural variation.

Gauss and the Method of Least Squares

Fig. 6.6 Carl Friedrich Gauss (1777-1855)

In the late 1700s and early 1800s, Gauss was working on astronomical calculations and geodetic surveys—problems requiring precise measurements where small errors were inevitable. He sought to understand how these measurement errors behaved and how to optimally combine multiple measurements of the same quantity.

Gauss discovered that measurement errors followed a specific pattern: most errors were small and clustered around zero, with larger errors becoming increasingly rare. More importantly, he found that this error distribution had a particular exponential form with quadratic decay that optimized his least squares fitting procedure.

The Connection to Binomial Distributions

Gauss recognized that his continuous error distribution emerged as a limiting case of discrete binomial distributions. When the number of trials becomes very large while the probability of success becomes very small (in a specific balanced way), the jagged, discrete binomial distribution smooths into the graceful bell curve we now call the normal distribution.

This connection between discrete counting processes and continuous measurement errors revealed a profound unity in probability theory—the same mathematical structure appears whether we’re flipping coins or measuring stellar positions.

A Universal Pattern in Nature

What makes the normal distribution truly remarkable is its ubiquity. It describes not just measurement errors, but heights and weights of organisms, intelligence test scores, particle velocities in gases, and countless other natural phenomena. This universality isn’t coincidental—it emerges from a deep mathematical principle we’ll encounter later called the Central Limit Theorem.

6.4.2. The Mathematical Definition: Anatomy of the Bell Curve

Notation and Parameters

If a random variable $X$ has a normal distribution, we write:

\[X \sim N(\mu, \sigma^2) \quad \text{or} \quad X \sim N(\mu, \sigma).\]

A normal random variable takes two parameters:

	Mean $\mu$	Standard Deviation $\sigma$
Possible values	$\mu \in (-\infty, +\infty)$. It can be any real number.	$\sigma >0$. It must be a postive value.
Interpretation	The location parameter. It represents the center of the distribution of $X$.	The scale parameter. It represents how spread out the distribution of $X$ is.
Effect on the appearance of the PDF	Slides the curve left or right, without changing the shape	Makes the graph tall and narrow (small $\sigma$) or wide and flat (large $\sigma$). It does not change the location of the center.

Variance or Standard Deviation?

It is standard to describe a normal distribution using either variance or standard deviation, but we must be explicit about which we’re using.

The constraints and interpretations of standard deviation transfer almost directly to variance. Variance must be a positive number, and it controls how wide the distribution is. The only difference is their scale—variance is in the squared scale, while standard deviation is on the same scale as $X$.

Normal pdfs for different sets of parameters mu and sigma — Fig. 6.7 How different values of μ and σ affect the normal distribution’s appearance

The Normal PDF

The PDF of a normal random variable $X$ takes the form:

\[f_X(x) = \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \quad \text{for all } x \in \mathbb{R}\]

This elegant formula contains several key components:

The normalizing constant $\frac{1}{\sqrt{2\pi \sigma}}$ ensures the total area under the curve equals 1.
The exponential function $e^{-(\cdot)}$ creates the smooth, continuous decay.
The quadratic expression $\left(\frac{x-\mu}{\sigma}\right)^2$ in the exponent produces the symmetric, bell-shaped curve.
The parameters $\mu$ and $\sigma$ control the distribution’s location and spread.

Fundamental Properties

Regardless of its parameters, every normal distribution satisfies the following properties:

It is symmetrical about the mean $\mu$.
It is unimodal with a single peak at $x = \mu$.
Since the distribution is perfectly symmetric, the mean equals the median: $\mu = \tilde{\mu}$.
It is bell-shaped with smooth, continuous curves.
The two tails approach but never reach zero as $x \to \pm\infty$. This implies that $\text{supp}(X) = (-\infty, +\infty)$.
The points where the normal curve changes from concave down to concave up (its inflection points) occur exactly at $x = \mu - \sigma$ and $x = \mu + \sigma$.

Fig. 6.8 The normal curve changes concavity at exactly one standard deviation from the mean

6.4.3. The Empirical Rule: A Practical Tool

One of the most useful properties of normal distributions is that they all follow the same probability pattern, regardless of their specific parameter values. This universal pattern is called the empirical rule or 68-95-99.7 rule.

For any normal random variable $X \sim N(\mu, \sigma)$,

68% of the probability lies within one standard deviation from the mean: $P(\mu - \sigma < X < \mu + \sigma) \approx 0.68$
95% of the probability lies within two standard deviations: $P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95$
99.7% of the probability lies within three standard deviations: $P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997$

Empirical rule showing 68-95-99.7 percentages — Fig. 6.9 The empirical rule provides quick probability estimates for any normal distribution

Extended Breakdown of the Empirical Rule

34% of probability lies in each of $(\mu, \mu + \sigma)$ and $(\mu - \sigma, \mu)$
- Each interval is half of 68%.
13.5% of probability lies in each of $(\mu + \sigma, \mu + 2\sigma)$ and $(\mu - 2\sigma, \mu - \sigma)$
- Each interval is half of 95%, minus an interval from #1.
2.35% of probability lies in each of $(\mu + 2\sigma, \mu + 3\sigma)$ and $(\mu - 3\sigma, \mu - 2\sigma)$.
- Each interval is half of 99.7%, minus an interval from #2 and an interval from #1.
0.15% of probability lies beyond $\mu + 3\sigma$ and another 0.15% beyond $\mu - 3\sigma$.
- Each region is half of 100% - 99.7%.

Insights from the Empirical Rule

About 2/3 of values fall within one standard deviation of the mean.
About 19 out of 20 values fall within two standard deviations.
Nearly all values (99.7%) fall within three standard deviations.

Example💡: Computing Normal Probabilities Using Empirical Rules

A chemical lab reports that the amount of active ingredient in a single tablet of a medication is normally distributed with a mean of 500 mg and a standard deviation of 5 mg.

Q1. What is the probability that a tablet contains between 490 mg and 505 mg of active ingredient?

$490 = \mu - 2\sigma \text{ and } 505 = \mu + \sigma$. Therefore, we are looking for

\[P(\mu -2\sigma \leq X \leq \mu + \sigma)\]

There are many different ways to solve this using the empirical rule. One way is to view the probability as

\[P(\mu -2\sigma \leq X \leq \mu + 2\sigma) - P(\mu+\sigma \leq X \leq \mu +2\sigma)\]

The first term is approximately 0.95 by the empirical rule, and the second term is approximately 0.135. Then finally,

\[P(\mu -2\sigma \leq X \leq \mu + \sigma) \approx 0.95 - 0.135 = 0.815\]

6.4.4. The Standard Normal Distribution: The Foundation of All Normal Computations

While normal distributions can have any mean and standard deviation, there’s one particular normal distribution that serves as the foundation for all normal probability calculations.

Definition of the Standard Normal Distribution

The standard normal distribution is the normal distribution with mean 0 and standard deviation 1. When a random variable follows the standard normal distribution, we denote it with $Z$ and write:

\[Z \sim N(0, 1)\]

Its PDF is obtained by plugging in 0 and 1 for $\mu$ and $\sigma$, respectively, in the general form:

\[f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} \quad \text{for all } z \in \mathbb{R}\]

Because the standard normal is so important, it also gets special notations for its PDF and CDF:

PDF: $\phi(z) = f_Z(z)$ (lowercase Greek letter phi)
CDF: $\Phi(z) = P(Z \leq z)$ (uppercase Greek letter phi)

Standardization of Normal Random Variables

Any normal random variable can be converted to a standard normal random variable using the standardization formula:

\[Z = \frac{X - \mu}{\sigma}.\]

If $X \sim N(\mu, \sigma)$, then $Z \sim N(0, 1)$.

Why Standardization Works

Standardization re-centers the distribution at 0 by subtracting the mean from $X$ .

It rescales the distribution to have unit variance by dividing $X$ by its own standard deviation.

For a more concrete demonstration, we first need to know a special property of normal distribution:

When a normal random variable is multiplied or added by a constant, the resulting random variable will still be normal, just with a new set of mean and variance parameters.

Since $\mu$ and $\sigma$ are constants, the operation on $X$ to get to $Z$ leaves us with another normal random variable. Also,

$E[Z] = E\left[\frac{X-\mu}{\sigma}\right]= \frac{E[X]-\mu}{\sigma} = \frac{\mu - \mu}{\sigma}=0$.

$\sigma^2[Z] = \text{Var}(Z) = \text{Var}\left(\frac{X-\mu}{\sigma}\right) = \frac{\text{Var}(X)}{\sigma^2}= \frac{\sigma^2}{\sigma^2} =1$.

Why Do We Standardize?

The fundamental problem with normal distributions is that their CDFs cannot be expressed in terms of elementary functions. There’s no simple formula for:

\[P(X\leq x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{1}{2}\left(\frac{t-\mu}{\sigma}\right)^2} dt\]

However, we can numerically approximate these integrals for the standard normal distribution and tabulate the results. Instead of creating tables of approximations for all possible pairs of parameters—which would be impossible—we standardize, so that we can refer to one table for any normal random variables.

6.4.5. Forward Problems: $x$ to Probability

Now that we understand the theoretical foundation, let’s learn how to actually compute probabilities for normal distributions. Since we cannot integrate the normal PDF analytically, we rely on numerical approximations tabulated in standard normal tables.

The Standard Normal Table (Z-Table)

Statisticians have computed high-precision numerical approximations for the standard normal CDF $\Phi(z) = P(Z \leq z)$ and compiled them into tables. These tables typically provide probabilities accurate to four decimal places for z-values given to two decimal places.

For example, if we want to find $P(Z \leq -1.38)$, first locate $-1.3$ from the row labels. Then find the column with the label $0.08$. The intersection of the row and the column gives the desired probability. $P(Z \leq -1.38)=0.0838$.

Half of the Z-table for negative z values

The other half of the Z-table for positive z- values

The Strategy for Non-standard Normal RVs

We said we would apply the standardization technique to use the Z-table for any normal distributions. How will this work? The key steps are the following:

Recognize that subtracting the same value on both sides or multiplying by the same positive value on both sides does not change the truth of an (in)equality. It follows that the probability of the (in)equality also remains unchanged.
Using #1,

\[P(X \leq a) = P\left(\frac{X-\mu}{\sigma} \leq \frac{a-\mu}{\sigma}\right) = P\left(Z \leq \frac{a-\mu}{\sigma}\right) = \Phi\left(\frac{a-\mu}{\sigma}\right).\]
- $\frac{a-\mu}{\sigma}$, the value obtained by standardizing $a$, is called the z-score of $a$.

The Strategy for Probabilities Which Do Not Match the CDF

We are often interested in probabilities which are not in the form $\Phi(z) = P(Z \leq z)$.

For “greater than” probabilities, use the complement rule: $P(Z > z) = 1 - \Phi(z)$.
For probabilities of intervals, use $P(a < Z < b) = \Phi(b) - \Phi(a)$
Because the standard normal distribution is symmetric around zero, we have an additional tool: $\Phi(-z) = 1 - \Phi(z)$ (Fig. 6.10).

Special property of standard normal CDF due to symmetry around 0 — Fig. 6.10 Due to symmetry around zero, the two grey regions have equal probability.

Forward Problems

When a problem gives a value in the support and asks for a related probability, we call it a forward problem. The systematic approach is:

Identify the desired probability in correct probability notation. Sketch the region on a normal PDF plot if needed.
Standardize by converting $x$ values to $z$-scores using $z = \frac{x-\mu}{\sigma}$.
Modify the probability statement to an expression involving $P(Z \leq z)$ so the Z-table can be used directly.
Round the z-score to two decimal places and look it up in the table.
Write your conclusion in the context of the problem.

Example💡: Systolic Blood Pressure

Systolic blood pressure readings for healthy adults, in mmHg, follow a normal distribution with $\mu=112$ and $\sigma^2= 100$. Find the probability that a randomly selected adult has blood pressure between 90 and 134 mmHg.

Sketch of the probability P(90 < X < 134). — Fig. 6.11 A sketch of $P(90 < X < 134)$

Step 1: Write the random variable and its distribution in correct notation

Let $X$ be the blood pressure readings for healthy adults. $X \sim N(\mu=112, \sigma^2=100)$.
Step 2: Find the correct probability statement

We are looking for

\[P(90 < X < 134) = P(X < 134) - P(X < 90).\]

We need to find $z_1$ and $z_2$ such that $P(X < 134) = P(Z< z_2)$ and $P(X< 90)=P(Z< z_1)$.
Step 3: Standardize to find $z_1$ and $z_2$

Note that the spread parameter is given as variance. We must use $\sigma = \sqrt{100} = 10$ for standardization.

\[z_1 = \frac{90 - 112}{10} = \frac{-22}{10} = -2.2 \text{ and } z_2 = \frac{134 - 112}{10} = \frac{22}{10} = 2.2\]
Step 4: Convert to standard normal probability

\[P(90 < X < 134) = P(Z< z_2) - P(Z< z_1) = \Phi(2.2) - \Phi(-2.2)\]
Step 5: Use symmetry to simplify

We can look up the CDF values for $z_1=-2.2$ and $z_2=2.2$ separately, but when the two $z$-scores are negatives of each other, we can also simplify the search step using $\Phi(-2.2) = 1 - \Phi(2.2)$.

\[P(-2.2 < Z < 2.2) = \Phi(2.2) - (1 - \Phi(2.2)) = 2\Phi(2.2) - 1\]
Step 6: Look up in the Z-table and calculate the final answer

From the Z-table: $\Phi(2.2) = 0.9861$. Finally,

\[P(90 < X < 134) = 2(0.9861) - 1 = 0.9722\]

There is approximately 0.9722 probability that a randomly selected healthy adult will have systolic blood pressure between 90 and 134 mmHg.

6.4.6. Backward Problems: Probability to $x$ (Percentile)

Backward problems reverse the process: given a probability, we must find the corresponding value (percentile) in the support.

Walkthrough of a Backward Problem

Consider a typical backward question:

The gas price on a fixed date in State A follows normal distribution with mean $3.30 and standard deviation $0.12. If Gas Station B has a price higher than 63% of all gas stations in the state that day, what is the gas price in Gas Station B?

In this problem, a probability is given (63% or 0.63), and we are asked for the cutoff whose left region under the PDF has an area of 0.63. This cutoff is the 63th percentile of the gas price distribution.

To solve for this type of problems, we begin by setting up the correct probability statement.

\[P(X \leq x_{0.63}) = 0.63.\]

Standardize to get a probability statement in terms of $Z$:

\[P(Z \leq \frac{x_{0.63}-\mu}{\sigma})=0.63.\]

The right-hand side of the inequality above now fits the definition of the 63th percentile of a standard normal random variable. That is,

\[z_{0.63} = \frac{x_{0.63}-\mu}{\sigma}.\]

We will look for $z_{0.63}$ and convert back to $x_{0.63}$ using this relationship.

To find $z_{0.63}$, we locate 0.63 (or the value closest to it) in the main body of the table, then obtain the $z$- score from its margins. 0.6293 is the value closest to 0.63 in the main body, and its margins give us $z_{0.63}=0.33$.

Converting back, $x_{0.63} = \sigma z_{0.63} +\mu = (0.12)(0.33) + 3.3 = 3.3396$.

Finally, the price at Gas Station B is around $3.34.

Summary of the Key Steps

Identify the value you need to find using correct probability notation. Sketch the region if needed.
Find the z-score by first locating the probability in the body of the Z-table then going to its margins.
Convert the z-score to the original scale using $x = \sigma z + \mu$.
Write your conclusion in the correct context.

Points That Require Special Attention

Depending on the problem, the probability given may correspond to a right (upper) region rather than a left (lower) one under the PDF. Since percentiles are always defined in terms of the lower region, you need to make adjustments accordingly. For example, if Gas Station C has a price lower than 23 % of all other gas stations in the state, its price corresponds to the (100 – 23)th percentile.
If the given probability does not have an exact match in the table, take the z-value for the closest entry. If it is exactly in the middle of two values on the table, take the average between the z-scores of the two entries.

Example💡: Systolic Blood Pressure, Continued

Continue with the RV of blood pressure measurements: $X \sim N(\mu = 112, \sigma^2 = 100)$.

find the 95th percentile.

We want to find $x_{0.95}$ such that $P(X \leq x_{0.95}) = 0.95$ First, find $z_{0.95}$ such that $\Phi(z_{0.95}) = 0.95$

In the body of the Z-table, we find that 0.9495 and 0.9505 are the closest to 0.95. Since 0.95 is exactly halfway between these values, we average the corresponding z-scores:

\[z_{0.95} = \frac{1.64 + 1.65}{2} = 1.645.\]

Converting to the original scale,

$x_{0.95} = \mu + \sigma z_{0.95} = 112 + 10(1.645) = 128.45$.

Conclusion: The 95th percentile of systolic blood pressure is 128.45 mmHg. This means that 95% of healthy adults have blood pressure at or below this value.
Find the cutoffs for the middle 50% of blood pressure measurements. Using the cutoffs, also compute the interquartile range.

Fig. 6.12 A sketch of problem 2

We need to find two cutoffs: the 25th percentile and the 75th percentile.

For the 25th percentile:
- $\Phi(z_{0.25}) = 0.25$
- From the table: $z_{0.25} = -0.67$
- $x_{0.25} = 112 + 10(-0.67) = 105.3$ mmHg
For the 75th percentile:
- $\Phi(z_{0.75}) = 0.75$
- From the table (or using symmetry): $z_{0.75} = 0.67$
- $x_{0.75} = 112 + 10(0.67) = 118.7$ mmHg
Conclusion: The middle 50% of systolic blood pressure readings fall between 105.3 and 118.7 mmHg. The interquartile range is $118.7 - 105.3 = 13.4$ mmHg.

6.4.7. Proving the Theoretical Properties of Normal Distribution

Validity of the PDF

To establish that a normal PDF is legitimate, we must verify that it satisfies the two fundamental requirements for any probability density function.

Property 1: Non-Negativity

We need to show that $f_X(x) \geq 0$ for all $x$.

Since $\sigma > 0$, we have $\frac{1}{\sqrt{2\pi \sigma}} > 0$. The exponential function $e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$ is always positive because:

The exponent $-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2$ is always negative (or zero).
$e^{\text{negative number}}$ is always positive.
$e^0 = 1 > 0$.

Therefore, $f_X(x) > 0$ for all $x \in \mathbb{R}$. ✓

Property 2: Integration to Unity

We must prove that $\int_{-\infty}^{\infty} f_X(x) \, dx = 1$.

Step 1: Change of Variables

Let $z = \frac{x - \mu}{\sigma}$, so $x = \sigma z + \mu$ and $dx = \sigma \, dz$.

$z = -\infty$ when $x = -\infty$, and $z = +\infty$ when $x = +\infty$. The integral becomes:

\[I = \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} dx = \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz\]

Step 2: The Squaring Trick

This integral has no elementary antiderivative, so we use a clever approach. Let’s compute $I^2$:

\[I^2 = \left(\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz\right)\left(\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{v^2}{2}} dv\right)\]

Since the integrals converge absolutely, we can rewrite this as a double integral:

\[I^2 = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \frac{1}{2\pi} e^{-\frac{1}{2}(z^2 + v^2)} \, dz \, dv\]

Step 3: Polar Coordinate Transformation

Let:

$z = r\cos\theta$
$v = r\sin\theta$
$z^2 + v^2 = r^2$
$dz \, dv = r \, dr \, d\theta$

The integration limits become:

$r$: from 0 to $\infty$
$\theta$: from 0 to $2\pi$

Therefore:

\[I^2 = \int_0^{2\pi} \int_0^{\infty} \frac{1}{2\pi} e^{-\frac{r^2}{2}} \cdot r \, dr \, d\theta\]

Step 4: Separating the Integrals

\[I^2 = \frac{1}{2\pi} \int_0^{2\pi} d\theta \int_0^{\infty} r e^{-\frac{r^2}{2}} dr\]

The first integral gives us $2\pi$. For the second integral, use substitution $u = \frac{r^2}{2}$, so $du = r \, dr$:

\[\int_0^{\infty} r e^{-\frac{r^2}{2}} dr = \int_0^{\infty} e^{-u} du = \left[-e^{-u}\right]_0^{\infty} = 0 - (-1) = 1\]

Step 5: Final Result

\[I^2 = \frac{1}{2\pi} \cdot 2\pi \cdot 1 = 1\]

Since $I > 0$ (the integrand is positive), we have $I = 1$. ✓

This completes the proof that the normal PDF is a valid probability density function.

The Parameter Relationships: Expected Value and Variance

To complete our theoretical understanding, we must prove that the parameters $\mu$ and $\sigma^2$ are indeed the mean and variance of the distribution.

Theorem: The Expected Value is μ

For $X \sim N(\mu, \sigma)$, $E[X] = \mu$.

Proof:

\[E[X] = \int_{-\infty}^{\infty} x \cdot \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} dx\]

Using the standardization substitution $z = \frac{x-\mu}{\sigma}$, we have $x = \sigma z + \mu$ and $dx = \sigma \, dz$.

\[E[X] = \int_{-\infty}^{\infty} (\sigma z + \mu) \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz\]

Distributing the integral,

\[E[X] = \sigma \int_{-\infty}^{\infty} z \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz + \mu \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz\]

The second integral equals 1 since it’s the integral of the standard normal PDF. For the first integral, note that $z \phi(z)$ is an odd function, and we’re integrating over a symmetric interval, so:

\[\int_{-\infty}^{\infty} z \cdot \phi(z) \, dz = 0\]

Therefore, $E[X] = \sigma \cdot 0 + \mu \cdot 1 = \mu$. ✓

Theorem: The Variance is $\sigma^2$

For $X \sim N(\mu, \sigma)$, $\text{Var}(X) = \sigma^2$ .

Proof:

Using the standardization $Z = \frac{X-\mu}{\sigma}$, we know that $X = \sigma Z + \mu$. By the properties of variance:

\[\text{Var}(X) = \text{Var}(\sigma Z + \mu) = \sigma^2 \text{Var}(Z)\]

So we need to show that $\text{Var}(Z) = 1$ for the standard normal.

\[\text{Var}(Z) = E[Z^2] - (E[Z])^2 = E[Z^2] - 0^2 = E[Z^2]\]

\[E[Z^2] = \int_{-\infty}^{\infty} z^2 \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz\]

Using integration by parts with $u = z$ and $dv = z e^{-\frac{z^2}{2}} dz$, we have $du = dz$ and $v = -e^{-\frac{z^2}{2}}$. Then:

\[\int z^2 e^{-\frac{z^2}{2}} dz = z(-e^{-\frac{z^2}{2}}) - \int (-e^{-\frac{z^2}{2}}) dz = -ze^{-\frac{z^2}{2}} + \int e^{-\frac{z^2}{2}} dz\]

The boundary term $\left[-ze^{-\frac{z^2}{2}}\right]_{-\infty}^{\infty} = 0$ since exponential decay dominates linear growth. Therefore,

\[E[Z^2] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-\frac{z^2}{2}} dz = \frac{1}{\sqrt{2\pi}} \cdot \sqrt{2\pi} = 1\]

Thus $\text{Var}(Z) = 1$ and $\text{Var}(X) = \sigma^2$. ✓

6.4.8. Assessing Normality in Practice: Why It Matters

In statistical practice, we frequently need to determine whether observed data comes from a normal distribution. This assessment is crucial because many statistical procedures—confidence intervals, t-tests, ANOVA, and regression—assume normality or rely on estimators whose sampling distributions are approximately normal.

While we’ve established the theoretical foundation of the normal distribution, real data is messy. Heights, weights, test scores, and measurement errors may approximately follow normal patterns, but we need systematic methods to evaluate how close our data comes to this idealized mathematical model.

The Challenge of Real-World Assessment

Unlike our theoretical examples with known parameters, real data presents several challenges:

We don’t know the true population parameters $\mu$ and $\sigma$.
Sample sizes are finite, introducing sampling variability.
Real phenomena may deviate from perfect normality in subtle ways.
We need to distinguish between minor departures that don’t affect our analyses and serious violations that require different approaches.

A Multi-Faceted Approach

Assessing normality requires multiple complementary methods because no single approach provides complete information. We combine:

Visual methods that reveal patterns and deviations at a glance,
Numerical checks that quantify adherence to normal distribution properties, and
Formal statistical tests that provide rigorous hypothesis testing frameworks.

6.4.9. Visual Assessments for Normality

A. Histograms with Overlaid Curves

The most intuitive approach overlays three elements on a histogram of the data:

The histogram itself, showing the actual distribution of observations,
A kernel density estimate (smooth red curve) that traces the data’s shape without assuming any particular distribution, and
A normal density curve (blue curve) fitted using the sample mean and standard deviation.

Histogram with kernel density and normal curve overlay — Fig. 6.13 Comparing actual data distribution (purple histogram) with its smooth estimate (red) and fitted normal curve (blue)*

When data follows a normal distribution, these three elements align closely. Deviations reveal specific patterns:

Skewness: The red curve shifts away from the blue curve
Heavy tails: The red curve extends further than the blue curve
Light tails: The red curve falls short of the blue curve’s extent
Multimodality: The red curve shows multiple peaks while the blue curve shows only one

B. Normal Probability Plots: A Sophisticated Diagnostic

Normal probability plots (also called QQ-plots for “quantile-quantile plots”) provide a more sensitive method for detecting departures from normality. These plots directly compare the quantiles of our data with the quantiles we would expect if the data truly came from a normal distribution.

Steps of Constructing a QQ-Plot

Order the Data

Arrange the $n$ observations from smallest to largest: $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$.
Assign Theoretical Probabilities

Each ordered observation $x_{(i)}$ represents approximately the $\frac{i-0.5}{n}$ quantile of the data distribution. The adjustment of $-0.5$ centers each data point within its expected quantile interval, providing more accurate comparisons.
Find Corresponding Normal Quantiles

For each probability $p_i = \frac{i-0.5}{n}$, find the z-value $z_i$ such that $\Phi(z_i) = p_i$. These are the theoretical quantiles we would expect if the data came from a standard normal distribution.
Create the Plot

Plot the ordered data values $x_{(i)}$ (y-axis) against the theoretical quantiles $z_i$ (x-axis).
Add a Reference Line

The reference line $y = \bar{x} + s \cdot z$ shows where points would fall if the data perfectly matched a normal distribution with the sample’s mean and standard deviation.

Interpreting QQ-Plots

The power of QQ-plots lies in how different departures from normality create characteristic patterns.

Perfect Normality: Points fall exactly on the reference line.

QQ-plot showing perfectly normal data — Fig. 6.14 Normal probability plot for normal data

Long Tails: Points begin below the line but curves above for larger values.

Data has more extreme values than a normal distribution would predict
The lower tail extends further left, upper tail extends further right
Common in financial data, measurement errors with occasional large mistakes

Short Tails: Points begin above the line but curves below for larger values.

Data is more concentrated around the center than normal
Fewer extreme values than expected
Sometimes seen in truncated or bounded measurements

Right (Positive) Skewness: Concave-up curve

Left (Negative) Skewness: Concave-down curve

Bimodality: S-shaped curve with plateaus

Points cluster in the middle region of the plot
Suggests the data might come from a mixture of two populations

6.4.10. Numerical Assessments for Normality

While visual methods provide intuitive insights, numerical methods offer precise, quantifiable assessments of normality.

A. The Empirical Rule in Reverse

Instead of using the 68-95-99.7 rule to predict probabilities, we can use it in reverse to check whether our data behaves as a normal distribution should:

For truly normal data,

Approximately 68% of observations should fall within one standard deviation: $\bar{x} \pm s$.
Approximately 95% should fall within two standard deviations: $\bar{x} \pm 2s$.
Approximately 99.7% should fall within three standard deviations: $\bar{x} \pm 3s$.

Implementation Steps

Calculate the sample mean $\bar{x}$ and sample standard deviation $s$.
Count observations within each interval.
Compare observed proportions to expected proportions (0.68, 0.95, 0.997).
Large deviations suggest non-normality.

B. The IQR-to-Standard Deviation Ratio

For normal distributions, there’s a consistent relationship between the interquartile range and the standard deviation. This relationship arises from the fixed positions of quantiles in any normal distribution.

For any normal distribution $N(\mu, \sigma)$:

The first quartile (25th percentile) occurs at $\mu - 0.674\sigma$.
The third quartile (75th percentile) occurs at $\mu + 0.674\sigma$.
Therefore: $IQR = Q_3 - Q_1 = 1.348\sigma$.
The ratio $\frac{IQR}{\sigma} \approx 1.35$ (often rounded to 1.4).

Implementation Steps

Calculate the sample IQR and sample standard deviation $s$.
Compute the ratio $\frac{IQR}{s}$.
Values close to 1.35 suggest normality.
Values substantially different indicate departures from normality.

6.4.11. Formal Statistical Tests for Assessing Normality

While visual and numerical methods provide insights, formal statistical tests such as Shapiro-Wilk Test and Kolmovorov-Smirnov Tests offer rigorous hypothesis testing frameworks for normality. These tests are covered in more advanced statistics courses.

6.4.12. Bringing It All Together

Key Takeaways 📝

The normal distribution emerged from Gauss’s work on measurement errors and has become the most important continuous distribution in statistics.
The PDF $f_X(x) = \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$ is completely determined by two parameters: location $\mu$ and scale $\sigma$.
All normal distributions are symmetric, unimodal, and bell-shaped, with inflection points at $\mu \pm \sigma$.
The empirical rule (68-95-99.7) provides quick probability estimates and applies to every normal distribution regardless of parameters.
Mathematical rigor: We proved the normal PDF is valid through polar coordinate integration and confirmed that $E[X] = \mu$ and $\text{Var}(X) = \sigma^2$.
The standard normal distribution $N(0,1)$ serves as the foundation for all normal computations. The standardizing transformation $Z = \frac{X-\mu}{\sigma}$ links any normal distribution to the standard one.
Assessing normality requires multiple approaches: visual methods (histograms, QQ-plots), numerical checks (empirical rule, IQR ratios), and formal tests (Shapiro-Wilk, Kolmogorov-Smirnov variants).

Exercises

Empirical Rule Applications: A normal distribution has $\mu = 100$ and $\sigma = 15$.
1. Find the intervals containing approximately 68%, 95%, and 99.7% of the probability
2. What percentage of values fall between 85 and 130?
3. Values beyond what points would be considered unusual (more than 2 standard deviations from the mean)?
Standardization Practice: For $X \sim N(\mu = 25, \sigma^2 = 16)$, find the standardized values corresponding to:
1. $x = 25$
2. $x = 29$
3. $x = 17$
4. What do these z-values tell you about the original x-values?
5. Now compute $P(X\leq x)$ for each value of $x$ in a)-c).
Parameter Estimation: If you know that a normal distribution has its inflection points at $x = 12$ and $x = 18$, determine $\mu$ and $\sigma$.

	Mean \(\mu\)	Standard Deviation \(\sigma\)
Possible values	\(\mu \in (-\infty, +\infty)\). It can be any real number.	\(\sigma >0\). It must be a postive value.
Interpretation	The location parameter. It represents the center of the distribution of \(X\).	The scale parameter. It represents how spread out the distribution of \(X\) is.
Effect on the appearance of the PDF	Slides the curve left or right, without changing the shape	Makes the graph tall and narrow (small \(\sigma\)) or wide and flat (large \(\sigma\)). It does not change the location of the center.

6.4. Normal Distribution

6.4.1. The Historical Legacy: From Gauss to Modern Statistics

Gauss and the Method of Least Squares

The Connection to Binomial Distributions

A Universal Pattern in Nature

6.4.2. The Mathematical Definition: Anatomy of the Bell Curve

Notation and Parameters

The Normal PDF

Fundamental Properties

6.4.3. The Empirical Rule: A Practical Tool

Extended Breakdown of the Empirical Rule

Insights from the Empirical Rule

6.4.4. The Standard Normal Distribution: The Foundation of All Normal Computations

Definition of the Standard Normal Distribution

Standardization of Normal Random Variables

Why Standardization Works

Why Do We Standardize?

6.4.5. Forward Problems: \(x\) to Probability

The Standard Normal Table (Z-Table)

The Strategy for Non-standard Normal RVs

The Strategy for Probabilities Which Do Not Match the CDF

Forward Problems

6.4.6. Backward Problems: Probability to \(x\) (Percentile)

Walkthrough of a Backward Problem

Summary of the Key Steps

Points That Require Special Attention

6.4.7. Proving the Theoretical Properties of Normal Distribution

Validity of the PDF

Property 1: Non-Negativity

Property 2: Integration to Unity

The Parameter Relationships: Expected Value and Variance

Theorem: The Expected Value is μ

Theorem: The Variance is \(\sigma^2\)

6.4.8. Assessing Normality in Practice: Why It Matters

The Challenge of Real-World Assessment

A Multi-Faceted Approach

6.4.9. Visual Assessments for Normality

A. Histograms with Overlaid Curves

B. Normal Probability Plots: A Sophisticated Diagnostic

Steps of Constructing a QQ-Plot

Interpreting QQ-Plots

6.4.10. Numerical Assessments for Normality

A. The Empirical Rule in Reverse

Implementation Steps

B. The IQR-to-Standard Deviation Ratio

Implementation Steps

6.4.11. Formal Statistical Tests for Assessing Normality

6.4.12. Bringing It All Together

Exercises