6.4. Normal Distribution
We now encounter the most important continuous distribution in all of statistics: the normal distribution.
Road Map 🧭
Understand the historical development and significance of the normal distribution.
Master the mathematical definition and properties of the normal PDF.
Explore how the parameters μ and σ control location and shape.
Learn the famous empirical rule for quick probability estimates.
Understand why standardization is essential for normal computations.
6.4.1. The Historical Legacy: From Gauss to Modern Statistics
The normal distribution carries a rich mathematical heritage spanning over two centuries. While often called the “Gaussian distribution” in honor of Carl Friedrich Gauss (1777-1855), the distribution’s development involved several brilliant mathematicians who recognized patterns in natural variation.
Gauss and the Method of Least Squares

Fig. 6.6 Carl Friedrich Gauss (1777-1855)
In the late 1700s and early 1800s, Gauss was working on astronomical calculations and geodetic surveys—problems requiring precise measurements where small errors were inevitable. He sought to understand how these measurement errors behaved and how to optimally combine multiple measurements of the same quantity.
Gauss discovered that measurement errors followed a specific pattern: most errors were small and clustered around zero, with larger errors becoming increasingly rare. More importantly, he found that this error distribution had a particular exponential form with quadratic decay that optimized his least squares fitting procedure.
The Connection to Binomial Distributions
Gauss recognized that his continuous error distribution emerged as a limiting case of discrete binomial distributions. When the number of trials becomes very large while the probability of success becomes very small (in a specific balanced way), the jagged, discrete binomial distribution smooths into the graceful bell curve we now call the normal distribution.
This connection between discrete counting processes and continuous measurement errors revealed a profound unity in probability theory—the same mathematical structure appears whether we’re flipping coins or measuring stellar positions.
A Universal Pattern in Nature
What makes the normal distribution truly remarkable is its ubiquity. It describes not just measurement errors, but heights and weights of organisms, intelligence test scores, particle velocities in gases, and countless other natural phenomena. This universality isn’t coincidental—it emerges from a deep mathematical principle we’ll encounter later called the Central Limit Theorem.
6.4.2. The Mathematical Definition: Anatomy of the Bell Curve
Notation and Parameters
If a random variable \(X\) has normal distribution, we write:
A normal random variable takes two parameters:
Mean \(\mu\) |
Standard Deviation \(\sigma\) |
|
---|---|---|
Possible values |
\(\mu \in (-\infty, +\infty)\). It can be any real number. |
\(\sigma >0\). It must be a postive value. |
Interpretation |
The location parameter. It represents the center of the distribution of \(X\). |
The scale parameter. It represents how spread out the distribution of \(X\) is. |
Effect on the appearance of the PDF |
Slides the curve left or right, without changing the shape |
Makes the graph tall and narrow (small \(\sigma\)) or wide and flat (large \(\sigma\)). It does not change the location of the center. |
Variance or Standard Deviation?
It is standard to describe a normal distribution using either variance or standard deviation, but we must be explicit about which we’re using.
The constraints and interpretations of standard deviation transfer almost directly to variance. Variance must be a positive number, and it controls how wide the distribution is. The only difference is their scale—variance is in the squared scale, while standard deviation is on the same scale as \(X\).

Fig. 6.7 How different values of μ and σ affect the normal distribution’s appearance
The Normal PDF
The PDF of a normal random variable \(X\) takes the form:
This elegant formula contains several key components:
The normalizing constant \(\frac{1}{\sqrt{2\pi \sigma}}\) ensures the total area under the curve equals 1.
The exponential function \(e^{-(\cdot)}\) creates the smooth, continuous decay.
The quadratic expression \(\left(\frac{x-\mu}{\sigma}\right)^2\) in the exponent produces the symmetric, bell-shaped curve.
The parameters \(\mu\) and \(\sigma\) control the distribution’s location and spread.
Fundamental Properties
Regardless of its parameters, every normal distribution satisfies the following properties:
It is symmetrical about the mean \(\mu\).
It is unimodal with a single peak at \(x = \mu\).
Since the distribution is perfectly symmetric, the mean equals the median: \(\mu = \tilde{\mu}\).
It is bell-shaped with smooth, continuous curves.
The two tails approach but never reach zero as \(x \to \pm\infty\). This implies that \(\text{supp}(X) = (-\infty, +\infty)\).
The points where the normal curve changes from concave down to concave up (its inflection points) occur exactly at \(x = \mu - \sigma\) and \(x = \mu + \sigma\).
Fig. 6.8 The normal curve changes concavity at exactly one standard deviation from the mean
6.4.3. The Empirical Rule: A Practical Tool
One of the most useful properties of normal distributions is that they all follow the same probability pattern, regardless of their specific parameter values. This universal pattern is called the empirical rule or 68-95-99.7 rule.
For any normal distribution \(X \sim N(\mu, \sigma)\):
68% of the probability lies within one standard deviation: \(P(\mu - \sigma < X < \mu + \sigma) \approx 0.68\)
95% of the probability lies within two standard deviations: \(P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95\)
99.7% of the probability lies within three standard deviations: \(P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997\)

Fig. 6.9 The empirical rule provides quick probability estimates for any normal distribution
Extended Breakdown of the Empirical Rule
34% of probability lies in each of \((\mu, \mu + \sigma)\). and \((\mu - \sigma, \mu)\)
Each interval is half of 68%
13.5% of probability lies in each of \((\mu + \sigma, \mu + 2\sigma)\). and \((\mu - 2\sigma, \mu - \sigma)\)
Each interval is half of 95%, minus an interval from #1.
2.35% of probability lies in each of \((\mu + 2\sigma, \mu + 3\sigma)\) and \((\mu - 3\sigma, \mu - 2\sigma)\).
Each interval is half of 99.7%, minus an interval from #2 and an interval from #1.
0.15% of probability lies beyond \(\mu + 3\sigma\) and another 0.15% beyond \(\mu - 3\sigma\).
Each region is half of 100% - 99.7%.
Insights from the Empirical Rule
About 2/3 of values fall within one standard deviation of the mean.
About 19 out of 20 values fall within two standard deviations.
Nearly all values (99.7%) fall within three standard deviations.
Example💡: Computing Normal Probabilities Using Empirical Rules
A chemical lab reports that the amount of active ingredient in a single tablet of a medication is normally distributed with a mean of 500 mg and a standard deviation of 5 mg.
What percentage of the tablets contain between 490 mg and 505 mg of active ingredient?
\(490 = \mu - 2\sigma \text{ and } 505 = \mu + \sigma\). Therefore, we are looking for
\[P(\mu -2\sigma \leq X \leq \mu + \sigma)\]There are many different ways to solve this using the empirical rule. One way is to view the probability as
\[P(\mu -2\sigma \leq X \leq \mu + 2\sigma) - P(\mu+\sigma \leq X \leq \mu +2\sigma)\]The first term is approximately 0.95 by the empirical rule, and the second term is approximately 0.135. Then finally,
\[P(\mu -2\sigma \leq X \leq \mu + \sigma) \approx 0.95 - 0.135 = 0.815\]
6.4.4. The Standard Normal Distribution: The Foundation of All Normal Computations
While normal distributions can have any mean and standard deviation, there’s one particular normal distribution that serves as the foundation for all normal probability calculations.
Definition of the Standard Normal Distribution
The standard normal distribution is the normal distribution with mean 0 and standard deviation 1. When a random variable follows the standard normal distribution, we denote it with \(Z\) and write:
Its PDF is obtained by plugging in 0 and 1 for \(\mu\) and \(\sigma\), respectively, in the general form:
Because the standard normal is so important, it also gets special notations for its PDF and CDF:
PDF: \(\phi(z) = f_Z(z)\) (lowercase Greek phi)
CDF: \(\Phi(z) = P(Z \leq z)\) (uppercase Greek phi)
Standardization of Normal Random Variables
Any normal random variable can be converted to a standard normal random variable using the standardization formula:
If \(X \sim N(\mu, \sigma)\), then \(Z \sim N(0, 1)\).
Why Standardization Works
Standardization is essentially a change of variables (u-substitution) that:
Centers the distribution at 0 by subtracting the mean.
Rescales the distribution to unit variance by dividing by the standard deviation.
For a more concrete demonstration, we first need to know a special property of normal distribution:
When a normal random variable is multiplied or added by a constant, the resulting random variable will still be normal, just with a new set of mean and variance parameters.
Since \(\mu\) and \(\sigma\) are constants, the operation on \(X\) to get to \(Z\) leaves us with another normal random variable. Also,
\(E[Z] = E\left[\frac{X-\mu}{\sigma}\right]= \frac{E[X]-\mu}{\sigma} = \frac{\mu - \mu}{\sigma}=0\).
\(\sigma^2[Z] = \text{Var}(Z) = \text{Var}\left(\frac{X-\mu}{\sigma}\right) = \frac{\text{Var}(X)}{\sigma^2}= \frac{\sigma^2}{\sigma^2} =1\).
Why Do We Standardize?
The fundamental problem with normal distributions is that their CDFs cannot be expressed in terms of elementary functions. There’s no simple formula for:
However, we can numerically approximate these integrals for the standard normal distribution and tabulate the results. Instead of creating tables of approximations for all possible pairs of parameters—which would be impossible—we standardize, so that we can refer to one table for any normal random variables.
6.4.5. Forward Problems: \(x\) to Probability
Now that we understand the theoretical foundation, let’s learn how to actually compute probabilities for normal distributions. Since we cannot integrate the normal PDF analytically, we rely on numerical approximations tabulated in standard normal tables.
The Standard Normal Table (Z-Table)
Statisticians have computed high-precision numerical approximations for the standard normal CDF \(\Phi(z) = P(Z \leq z)\) and compiled them into tables. These tables typically provide probabilities accurate to four decimal places for z-values given to two decimal places.
For example, if we want to find \(P(Z \leq -1.38)\), first located \(-1.3\) from the row labels. Then find the column with the label \(0.08\). The intersection of the row and the column gives the desired probability. \(P(Z \leq -1.38)=0.0838\).


The Strategy for Non-standard Normal RVs
We said we would apply the standardization technique to us the Z-table for any normal distribution. How will this work? The key steps are the following:
Recognize that subtracting the same value on both sides or multiplying by the same positive value on both sides does not change the truth of an (in)equality. It follows that the probability of the (in)equality also remains unchanged.
Using #1,
The Strategy for Probabilities Which Do Not Match the CDF
We are often interested in probabilities which are not in the form \(\Phi(z) = P(Z \leq z)\).
For “greater than” probabilities, use the complement rule: \(P(Z > z) = 1 - \Phi(z)\).
For probabilities of intervals, use \(P(a < Z < b) = \Phi(b) - \Phi(a)\)
Because the standard normal distribution is symmetric around zero, we have an additional tool: \(\Phi(-z) = 1 - \Phi(z)\) (Fig. 6.10).

Fig. 6.10 Due to symmetry around zero, the two grey regions have equal probability.
Forward Problems
When a problem gives a value and asks for a related probability, we call it a forward problem. The systematic approach is:
Identify what probability you need to calculate in correct probability notation. Sketch the region on a normal PDF plot if needed.
Standardize by converting x-values to z-scores using \(z = \frac{x-\mu}{\sigma}\).
Modify the probability statement to an expression involving \(P(Z \leq z)\) only so the Z-table can be used directly.
Round the z-score to two decimal places and look it up in the table.
Write your conclusion in the context of the original problem.
Example💡: Systolic Blood Pressure
Systolic blood pressure readings for healthy adults, in mmHg, follow a normal distribution with \(\mu=112\) and \(\sigma^2= 100\). Find the probability that a randomly selected adult has blood pressure between 90 and 134 mmHg.

Fig. 6.11 A sketch of \(P(90 < X < 134)\)
Step 1: Write the random variable and its distribution in correct notation
Let \(X\) be the blood pressure readings for healthy adults. \(X \sim N(\mu=112, \sigma^2=100)\).
Step 2: Find the correct probability statement
We are looking for
\[P(90 < X < 134) = P(X < 134) - P(X < 90).\]We need to find \(z_1\) and \(z_2\) such that \(P(X < 134) = P(Z< z_2)\) and \(P(X< 90)=P(Z< z_1)\).
Step 3: Standardize to find \(z_1\) and \(z_2\)
Note that the variance is given for the spread parameter. We must use \(\sigma = \sqrt{100} = 10\) for standardization.
\[z_1 = \frac{90 - 112}{10} = \frac{-22}{10} = -2.2 \text{ and } z_2 = \frac{134 - 112}{10} = \frac{22}{10} = 2.2\]Step 4: Convert to standard normal probability
\[P(90 < X < 134) = P(Z< z_2) - P(Z< z_1) = \Phi(2.2) - \Phi(-2.2)\]Step 5: Use symmetry to simplify
We can look up the CDF values for \(z=2.2\) and \(z=-2.2\) separately in the Z-table, but when the two z values are negatives of each other, we can simplify the search step using \(\Phi(-2.2) = 1 - \Phi(2.2)\).
\[P(-2.2 < Z < 2.2) = \Phi(2.2) - (1 - \Phi(2.2)) = 2\Phi(2.2) - 1\]Step 6: Look up in the Z-table and calculate the final answer
From the standard normal table: \(\Phi(2.2) = 0.9861\). So finally,
\[P(90 < X < 134) = 2(0.9861) - 1 = 0.9722\]There is approximately a 0.9722 probability that a randomly selected healthy adult will have systolic blood pressure between 90 and 134 mmHg.
6.4.6. Backward Problems: Probability to \(x\) (Percentile)
Backward problems reverse the process: given a probability, we must find the corresponding value (percentile).
Walkthrough of a Backward Problem
Consider a typical backward question:
The gas price on a fixed date in State A follows normal distribution with mean $3.30 and standard deviation $0.12. If Gas Station B has a price higher than 63% of all gas stations in the state that day, what is the gas price in Gas Station B?
In this problem, a probability is given (63% or 0.63), and we are asked for the cutoff whose lower region has area of 0.63 (the 63th percentile).
To solve for this type of problems, we begin by setting up the correct probability statement.
Standardize to get a probability statement in terms of \(Z\):
The right-hand side of the inequality above fits the definition of the 63th percentile of a standard normal random variable. That is,
We will now look for \(z_{0.63}\) and convert back to \(x_{0.63}\) using the above relationship.
To find \(z_{0.63}\), we locate 0.63 (or the value closest to it) in the main body of the table, then obtain the \(z\) value from its margins. 0.6293 is the value closest to 0.63 in the main body, and its margins give us \(z_{0.63}=0.33\).
Converting back, \(x_{0.63} = \sigma z_{0.63} +\mu = (0.12)(0.33) + 3.3 = 3.3396\).
Finally, the price at Gas Station B is around $3.34.
Summary of the Key Steps
Identify the value you need to find using correct probability notation. Sketch the region if needed.
Find the z-score by looking up the probability in the body of the standard normal table.
Convert the z-score to the original scale using \(x = \mu + \sigma z\).
Write your conclusion in context.
Points That Require Special Attention
The probability given may correspond to an upper region rather than a lower one. Since percentiles are always based on the area in the lower region, you need to adjust accordingly. For example, if Gas Station C has a price lower than 23 % of all other stations in the state, its price corresponds to the (100 – 23)th percentile.
If the given probability does not have an exact match in the table, take the z-value for the closest entry. If it is exactly in the middle of two values, take the average between the z-values of the two entries.
Example💡: Systolic Blood Pressure, Continued
Continue with the RV of blood pressure measurements: \(X \sim N(112, 100)\).
find the 95th percentile.
We want to find \(x_{0.95}\) such that \(P(X \leq x_{0.95}) = 0.95\) First, find \(z_{0.95}\) such that \(\Phi(z_{0.95}) = 0.95\)
Searching the body of the standard normal table for 0.95, we find it’s between 0.9495 and 0.9505. Since 0.95 is exactly halfway between these values, we average the corresponding z-values:
\[z_{0.95} = \frac{1.64 + 1.65}{2} = 1.645.\]Convertin to the original scale,
\(x_{0.95} = \mu + \sigma z_{0.95} = 112 + 10(1.645) = 128.45\).
Conclusion: The 95th percentile of systolic blood pressure is 128.45 mmHg. This means 95% of healthy adults have blood pressure at or below this value.
Find the cutoffs for the middle 50% of blood pressure measurements. Using the cutoffs, also compute the interquartile range.
Fig. 6.12 A sketch of problem 2
We need to find two cutoffs: the 25th percentile and the 75th percentile.
For the 25th percentile:
\(\Phi(z_{0.25}) = 0.25\)
From the table (using symmetry): \(z_{0.25} = -0.67\)
\(x_{0.25} = 112 + 10(-0.67) = 105.3\) mmHg
For the 75th percentile:
\(\Phi(z_{0.75}) = 0.75\)
From the table: \(z_{0.75} = 0.67\)
\(x_{0.75} = 112 + 10(0.67) = 118.7\) mmHg
Conclusion: The middle 50% of systolic blood pressure readings fall between 105.3 and 118.7 mmHg. The interquartile range is \(118.7 - 105.3 = 13.4\) mmHg.
6.4.7. Proving the Theoretical Properties of Normal Distribution
Validity of the PDF
To establish that the normal PDF is legitimate, we must verify that it satisfies the two fundamental requirements for any probability density function.
Property 1: Non-Negativity
We need to show that \(f_X(x) \geq 0\) for all \(x\).
Since \(\sigma > 0\), we have \(\frac{1}{\sqrt{2\pi \sigma}} > 0\). The exponential function \(e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\) is always positive because:
The exponent \(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\) is always negative (or zero).
\(e^{\text{negative number}}\) is always positive.
\(e^0 = 1 > 0\).
Therefore, \(f_X(x) > 0\) for all \(x \in \mathbb{R}\). ✓
Property 2: Integration to Unity
We must prove that \(\int_{-\infty}^{\infty} f_X(x) \, dx = 1\).
Step 1: Change of Variables
Let \(z = \frac{x - \mu}{\sigma}\), so \(x = \sigma z + \mu\) and \(dx = \sigma \, dz\).
\(z = -\infty\) when \(x = -\infty\), and \(z = +\infty\) when \(x = +\infty\). The integral becomes:
Step 2: The Squaring Trick
This integral has no elementary antiderivative, so we use a clever approach. Let’s compute \(I^2\):
Since the integrals converge absolutely, we can rewrite this as a double integral:
Step 3: Polar Coordinate Transformation
Let:
\(z = r\cos\theta\)
\(v = r\sin\theta\)
\(z^2 + v^2 = r^2\)
\(dz \, dv = r \, dr \, d\theta\)
The integration limits become:
\(r\): from 0 to \(\infty\)
\(\theta\): from 0 to \(2\pi\)
Therefore:
Step 4: Separating the Integrals
The first integral gives us \(2\pi\). For the second integral, use substitution \(u = \frac{r^2}{2}\), so \(du = r \, dr\):
Step 5: Final Result
Since \(I > 0\) (the integrand is positive), we have \(I = 1\). ✓
This completes the proof that the normal PDF is a valid probability density function.
The Parameter Relationships: Expected Value and Variance
To complete our theoretical understanding, we must prove that the parameters \(\mu\) and \(\sigma^2\) are indeed the mean and variance of the distribution.
Theorem: The Expected Value is μ
For \(X \sim N(\mu, \sigma)\), \(E[X] = \mu\).
Proof:
Using the standardization substitution \(z = \frac{x-\mu}{\sigma}\), we have \(x = \sigma z + \mu\) and \(dx = \sigma \, dz\).
Distributing the integral,
The second integral equals 1 since it’s the integral of the standard normal PDF. For the first integral, note that \(z \phi(z)\) is an odd function, and we’re integrating over a symmetric interval, so:
Therefore, \(E[X] = \sigma \cdot 0 + \mu \cdot 1 = \mu\). ✓
Theorem: The Variance is \(\sigma^2\)
For \(X \sim N(\mu, \sigma)\), \(\text{Var}(X) = \sigma^2\) .
Proof:
Using the standardization \(Z = \frac{X-\mu}{\sigma}\), we know that \(X = \sigma Z + \mu\). By the properties of variance:
So we need to show that \(\text{Var}(Z) = 1\) for the standard normal.
Using integration by parts with \(u = z\) and \(dv = z e^{-\frac{z^2}{2}} dz\), we have \(du = dz\) and \(v = -e^{-\frac{z^2}{2}}\). Then:
The boundary term \(\left[-ze^{-\frac{z^2}{2}}\right]_{-\infty}^{\infty} = 0\) since exponential decay dominates linear growth. Therefore,
Thus \(\text{Var}(Z) = 1\) and \(\text{Var}(X) = \sigma^2\). ✓
6.4.8. Assessing Normality in Practice: Why It Matters
In statistical practice, we frequently need to determine whether observed data comes from a normal distribution. This assessment is crucial because many statistical procedures—confidence intervals, t-tests, ANOVA, and regression—assume normality or rely on estimators whose sampling distributions are approximately normal.
While we’ve established the theoretical foundation of the normal distribution, real data is messy. Heights, weights, test scores, and measurement errors may approximately follow normal patterns, but we need systematic methods to evaluate how close our data comes to this idealized mathematical model.
The Challenge of Real-World Assessment
Unlike our theoretical examples with known parameters, real data presents several challenges:
We don’t know the true population parameters μ and σ
Sample sizes are finite, introducing sampling variability
Real phenomena may deviate from perfect normality in subtle ways
We need to distinguish between minor departures that don’t affect our analyses and serious violations that require different approaches
A Multi-Faceted Approach
Assessing normality requires multiple complementary methods because no single approach provides complete information. We combine:
Visual methods that reveal patterns and deviations at a glance
Numerical checks that quantify adherence to normal distribution properties
Formal statistical tests that provide rigorous hypothesis testing frameworks
6.4.9. Visual Assessments for Normality
A. Histograms with Overlaid Curves
The most intuitive approach overlays three elements on a histogram of the data:
The histogram itself, showing the actual distribution of observations
A kernel density estimate (smooth red curve) that traces the data’s shape without assuming any particular distribution
A normal density curve (blue curve) fitted using the sample mean and standard deviation

Fig. 6.13 Comparing actual data distribution (purple histogram) with its smooth estimate (red) and fitted normal curve (blue)*
When data follows a normal distribution, these three elements align closely. Deviations reveal specific patterns:
Skewness: The red curve shifts away from the blue curve
Heavy tails: The red curve extends further than the blue curve
Light tails: The red curve falls short of the blue curve’s extent
Multimodality: The red curve shows multiple peaks while the blue curve shows only one
B. Normal Probability Plots: A Sophisticated Diagnostic
Normal probability plots (also called QQ-plots for “quantile-quantile plots”) provide a more sensitive method for detecting departures from normality. These plots directly compare the quantiles of our data with the quantiles we would expect if the data truly came from a normal distribution.
Steps of Constructing a QQ-Plot
Order the Data
Arrange the n observations from smallest to largest: \(x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}\)
Assign Theoretical Probabilities
Each ordered observation \(x_{(i)}\) represents approximately the \(\frac{i-0.5}{n}\) quantile of the data distribution. The adjustment of \(-0.5\) centers each data point within its expected quantile interval, providing more accurate comparisons.
Find Corresponding Normal Quantiles
For each probability \(p_i = \frac{i-0.5}{n}\), find the z-value \(z_i\) such that \(\Phi(z_i) = p_i\). These are the theoretical quantiles we would expect if the data came from a standard normal distribution.
Create the Plot
Plot the ordered data values \(x_{(i)}\) (y-axis) against the theoretical quantiles \(z_i\) (x-axis).
Add a Reference Line
The reference line \(y = \bar{x} + s \cdot z\) shows where points would fall if the data perfectly matched a normal distribution with the sample’s mean and standard deviation.
Interpreting QQ-Plots
The power of QQ-plots lies in how different departures from normality create characteristic patterns.
Perfect Normality: Points fall exactly on the reference line.

Fig. 6.14 Normal probability plot for normal data
Long Tails: Points begin below the line but curves above for larger values.
Data has more extreme values than a normal distribution would predict
The lower tail extends further left, upper tail extends further right
Common in financial data, measurement errors with occasional large mistakes

Fig. 6.15 Normal probability plot for long-tailed data
Short Tails: Points begin above the line but curves below for larger values.
Data is more concentrated around the center than normal
Fewer extreme values than expected
Sometimes seen in truncated or bounded measurements

Fig. 6.16 Normal probability plot for short-tailed data
Right (Positive) Skewness: Concave-up curve

Fig. 6.17 Normal probability plot for right-skewed data
Left (Negative) Skewness: Concave-down curve

Fig. 6.18 Normal probability plot for left-skewed data
Bimodality: S-shaped curve with plateaus
Points cluster in the middle region of the plot
Suggests the data might come from a mixture of two populations

Fig. 6.19 Normal probability plot for bimodal data
6.4.10. Numerical Assessments for Normality
While visual methods provide intuitive insights, numerical methods offer precise, quantifiable assessments of normality.
A. The Empirical Rule in Reverse
Instead of using the 68-95-99.7 rule to predict probabilities, we can use it in reverse to check whether our data behaves as a normal distribution should:
For truly normal data,
Approximately 68% of observations should fall within one standard deviation: \(\bar{x} \pm s\).
Approximately 95% should fall within two standard deviations: \(\bar{x} \pm 2s\).
Approximately 99.7% should fall within three standard deviations: \(\bar{x} \pm 3s\).
Implementation Steps
Calculate the sample mean \(\bar{x}\) and sample standard deviation \(s\).
Count observations within each interval.
Compare observed proportions to expected proportions (0.68, 0.95, 0.997).
Large deviations suggest non-normality.
B. The IQR-to-Standard Deviation Ratio
For normal distributions, there’s a consistent relationship between the interquartile range and the standard deviation. This relationship arises from the fixed positions of quantiles in any normal distribution.
For any normal distribution \(N(\mu, \sigma)\):
The first quartile (25th percentile) occurs at \(\mu - 0.674\sigma\).
The third quartile (75th percentile) occurs at \(\mu + 0.674\sigma\).
Therefore: \(IQR = Q_3 - Q_1 = 1.348\sigma\).
The ratio \(\frac{IQR}{\sigma} \approx 1.35\) (often rounded to 1.4).
Implementation Steps
Calculate the sample IQR and sample standard deviation \(s\).
Compute the ratio \(\frac{IQR}{s}\).
Values close to 1.35 suggest normality.
Values substantially different indicate departures from normality.
6.4.11. Formal Statistical Tests for Assessing Normality
While visual and numerical methods provide insights, formal statistical tests such as Shapiro-Wilk Test and Kolmovorov-Smirnov Tests offer rigorous frameworks for hypothesis testing about normality. These tests are covered in more advanced statistics courses.
6.4.12. Bringing It All Together
Key Takeaways 📝
The normal distribution emerged from Gauss’s work on measurement errors and has become the most important continuous distribution in statistics.
The PDF \(f_X(x) = \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\) is completely determined by two parameters: location \(\mu\) and scale \(\sigma\).
All normal distributions are symmetric, unimodal, and bell-shaped, with inflection points at \(\mu \pm \sigma\).
The empirical rule (68-95-99.7) provides quick probability estimates and applies to every normal distribution regardless of parameters.
Mathematical rigor: We proved the normal PDF is valid through polar coordinate integration and confirmed that \(E[X] = \mu\) and \(\text{Var}(X) = \sigma^2\).
The standard normal \(N(0,1)\) serves as the foundation for all normal computations through the standardization transformation \(Z = \frac{X-\mu}{\sigma}\).
Assessing normality requires multiple approaches: visual methods (histograms, QQ-plots), numerical checks (empirical rule, IQR ratios), and formal tests (Shapiro-Wilk, Kolmogorov-Smirnov variants).
Exercises
Empirical Rule Applications: A normal distribution has \(\mu = 100\) and \(\sigma = 15\).
Find the intervals containing approximately 68%, 95%, and 99.7% of the probability
What percentage of values fall between 85 and 130?
Values beyond what points would be considered unusual (more than 2 standard deviations from the mean)?
Standardization Practice: For \(X \sim N(25, 16)\), find the standardized values corresponding to:
\(x = 25\)
\(x = 29\)
\(x = 17\)
What do these z-values tell you about the original x-values?
Parameter Estimation: If you know that a normal distribution has its inflection points at \(x = 12\) and \(x = 18\), determine \(\mu\) and \(\sigma\).