9.1. Introduction to Statistical Inference

After developing the foundational tools of probability theory, exploring random variables, and understanding sampling distributions, we have finally arrived at the core of statistical practice: statistical inference. This exciting chapter marks our transition from describing uncertainty to making decisions under uncertainty—the essence of statistics as a discipline.

Throughout the earlier chapters, we’ve methodically built a toolkit for this moment:

  • We developed the language of probability to quantify uncertainty

  • We explored both discrete and continuous probability distributions as mathematical models for random phenomena

  • We focused particularly on the normal distribution due to its central importance in statistical theory

  • We established the Central Limit Theorem as a bridge connecting theoretical probability to practical data analysis

  • We studied principles of experimental design and sampling to ensure our data collection produces valid information

Now, these elements converge as we learn to estimate unknown population parameters and—crucially—to quantify the uncertainty in our estimates.

Road Map 🧭

Fill Content

9.1.1. From Population Parameters to Estimators

The Challenge of Unknown Parameters

In statistical research, we aim to understand characteristics of a population that are fixed but unknown to us. These characteristics, called parameters, might include:

  • The population mean (\(\mu\))

  • The population variance (\(\sigma^2\))

  • The population proportion (\(p\))

  • Other quantities describing the population’s distribution

The fundamental challenge we face is that examining every member of a population is typically impractical or impossible. Instead, we must rely on a representative sample to make inferences about these parameters.

Diagram showing relationship between population parameters and sample estimates

Fig. 9.1 Figure 9.1: The relationship between population parameters and sample estimates. The population parameter (θ) is fixed but unknown, while the sample estimate (θ̂) varies from sample to sample.

Estimation Procedures

An estimator is a procedure (or formula) that tells us how to use sample data to calculate an estimate of a population parameter. The numerical result we obtain from applying this procedure to a particular sample is called a point estimate.

For instance, when trying to estimate the population mean (\(\mu\)), our estimator is the sample mean (\(\bar{x}\)). This calculation procedure—adding all observed values and dividing by the sample size—is our estimator, while the specific numerical result (e.g., 42.7) is our point estimate.

It’s important to realize that because our sample is just one possible sample from the population, our point estimate will likely differ from the true parameter value. This inherent variability is why understanding the sampling distribution of our estimator becomes crucial.

9.1.2. Evaluating Estimators: Bias and Variance

When selecting an estimator, we prioritize procedures that produce values close to the true parameter on average. Two key properties help us evaluate estimators:

Bias: Does the Estimator Target the Right Value?

The bias of an estimator measures whether it systematically overestimates or underestimates the parameter of interest. Mathematically, we define bias as:

\[\text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta\]

Where: - \(\hat{\theta}\) is our estimator - \(\mathbb{E}[\hat{\theta}]\) is the expected value (average) of the estimator across all possible samples - \(\theta\) is the true population parameter

An estimator is unbiased if its expected value equals the parameter it aims to estimate:

\[\mathbb{E}[\hat{\theta}] = \theta\]

The sample mean (\(\bar{x}\)) provides an excellent example of an unbiased estimator for the population mean (\(\mu\)):

\[\mathbb{E}[\bar{x}] = \mu\]

This unbiasedness holds regardless of the population distribution, provided it has a finite mean—a remarkable property that makes the sample mean so useful in statistics.

Variance: How Precise is the Estimator?

The variance of an estimator quantifies the spread of its sampling distribution—essentially how much the estimator fluctuates from sample to sample. Lower variance indicates greater precision and reliability.

When comparing unbiased estimators, we typically prefer the one with lower variance, as this reduces the expected “distance” between our estimate and the true parameter. The ideal is a minimum-variance unbiased estimator (MVUE), which has the smallest possible variance among all unbiased estimators for that parameter.

Visual comparison of biased versus unbiased estimators

Fig. 9.2 Figure 9.2: Visual representation of bias and variance. The left panel shows an unbiased estimator with high variance, the middle panel shows a biased estimator with low variance, and the right panel shows an unbiased estimator with low variance (ideal).

9.1.3. Examples of Estimators

To solidify these concepts, let’s examine several common estimators and their properties:

Sample Mean for Population Mean

The sample mean serves as our estimator for the population mean:

\[\bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i\]

This estimator is unbiased: \(\mathbb{E}[\bar{X}] = \mu\).

When sampling from a normal distribution, the sample mean is also a minimum-variance unbiased estimator (MVUE), making it optimal for estimating the population mean. Even for non-normal distributions, the sample mean remains an excellent estimator, especially as sample size increases, thanks to the Central Limit Theorem.

Sample Proportion for Population Proportion

For categorical variables where we’re interested in the proportion possessing a certain attribute, we use:

\[\hat{p} = \frac{X}{n}\]

Where \(X\) represents the count of “successes” or observations with the attribute of interest.

This estimator is unbiased: \(\mathbb{E}[\hat{p}] = p\).

Estimating Discrete Probabilities

When estimating the probability mass function for discrete random variables, we use relative frequencies. For a specific value \(x\), we construct an estimator using indicator functions:

\[\hat{p}_X(x) = \frac{1}{n}\sum_{i=1}^{n}I(X_i = x)\]

Where \(I(X_i = x)\) equals 1 if the observation equals \(x\), and 0 otherwise.

This essentially counts how many times the value \(x\) appears in our sample, divided by the sample size. We can prove this estimator is unbiased:

\[\begin{split}\mathbb{E}[\hat{p}_X(x)] &= \mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}I(X_i = x)\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[I(X_i = x)] \\ &= \frac{1}{n}\sum_{i=1}^{n}p_X(x) \\ &= p_X(x)\end{split}\]

Estimating Continuous Distribution Functions

Similarly, for continuous random variables, we can estimate the cumulative distribution function (CDF) at any point using:

\[\hat{F}_X(x) = \frac{1}{n}\sum_{i=1}^{n}I(X_i \leq x)\]

This counts the proportion of observations less than or equal to \(x\). This estimator is also unbiased:

\[\mathbb{E}[\hat{F}_X(x)] = F_X(x)\]

9.1.4. The Special Case of Variance Estimation

The estimation of population variance presents an interesting case where intuition alone might lead us astray.

Population Variance with Known Mean

If we knew the population mean \(\mu\) (which we typically don’t), we could estimate the population variance as:

\[\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^{n}(X_i - \mu)^2\]

This would be an unbiased estimator: \(\mathbb{E}[\hat{\sigma}^2] = \sigma^2\).

Sample Variance with Estimated Mean

In practice, we must estimate the mean using \(\bar{X}\). This changes our variance estimator to:

\[S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2\]

Notice the crucial difference: we divide by \(n-1\) rather than \(n\). This adjustment corrects for the bias introduced by using \(\bar{X}\) instead of \(\mu\).

To understand why this correction is necessary, we can analyze the expected value:

  1. Add and subtract \(\mu\) inside the squared term: \((X_i - \bar{X})^2 = ((X_i - \mu) - (\bar{X} - \mu))^2\)

  2. Expand the expression and take the expected value

  3. This reveals that dividing by \(n\) would give us \(\frac{n-1}{n}\sigma^2\), which is biased

  4. Dividing by \(n-1\) instead gives us an unbiased estimator: \(\mathbb{E}[S^2] = \sigma^2\)

This division by \(n-1\) rather than \(n\) accounts for the loss of one degree of freedom when estimating the mean.

The Biased Case of Sample Standard Deviation

While the sample variance \(S^2\) provides an unbiased estimate of \(\sigma^2\), the sample standard deviation \(S = \sqrt{S^2}\) is actually a biased estimator of the population standard deviation \(\sigma\).

This bias stems from Jensen’s inequality and the fact that the square root is a concave function. We can prove this bias exists:

  1. The variance of any non-constant random variable is positive: \(\text{Var}(S) > 0\)

  2. From the definition of variance: \(\text{Var}(S) = \mathbb{E}[S^2] - (\mathbb{E}[S])^2 > 0\)

  3. Rearranging: \(\mathbb{E}[S^2] > (\mathbb{E}[S])^2\)

  4. Since \(\mathbb{E}[S^2] = \mathbb{E}[\sigma^2] = \sigma^2\)

  5. Thus: \(\sigma^2 > (\mathbb{E}[S])^2\)

  6. Taking the square root: \(\sigma > \mathbb{E}[S]\)

This inequality shows that the expected value of \(S\) is less than \(\sigma\), meaning \(S\) systematically underestimates the population standard deviation.

Despite this bias, we still use \(S\) as our estimator for \(\sigma\) because: - The bias is typically small, especially for larger sample sizes - No unbiased estimator exists for \(\sigma\) that works for all distributions - The formula is straightforward and intuitive

9.1.5. Minimum Variance Unbiased Estimators (MVUE)

In the world of unbiased estimators, we often seek the most efficient option—the one with the smallest variance.

The Concept of MVUE

A minimum variance unbiased estimator (MVUE) achieves the lowest possible variance among all unbiased estimators for a given parameter. This represents the theoretical ideal for parameter estimation.

Finding MVUEs requires advanced statistical theory and depends on the specifics of the population distribution. However, some important results:

  • For normal populations, the sample mean \(\bar{X}\) is the MVUE for the population mean \(\mu\)

  • For normal populations, the sample variance \(S^2\) is the MVUE for the population variance \(\sigma^2\)

  • For Bernoulli populations, the sample proportion \(\hat{p}\) is the MVUE for the population proportion \(p\)

Why MVUEs Matter

MVUEs matter because they extract maximum information from our data. When we use an MVUE, we’re getting the most precise unbiased estimate possible given our sample size.

Even when we can’t achieve the theoretical MVUE (such as when the population distribution is unknown), understanding this concept helps us evaluate and compare different estimation procedures.

9.1.6. From Point Estimates to Interval Estimates

While estimators with desirable properties provide our best “point guesses” at parameters, they still suffer from a critical limitation: they don’t communicate uncertainty. A single number fails to convey how confident we should be in our estimate.

This limitation motivates our next major topic: confidence intervals. Rather than providing just a point estimate, a confidence interval gives a range of plausible values for the parameter, along with a measure of our confidence in that range.

A confidence interval for the population mean, assuming known population standard deviation \(\sigma\), takes the form:

\[\bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\]

Where \(z_{\alpha/2}\) is a critical value from the standard normal distribution that corresponds to our desired confidence level.

This concept of confidence intervals forms the bridge to our next section, where we’ll explore how to quantify the precision of our estimates and communicate statistical uncertainty honestly and accurately.

9.1.7. Brining It All Together

In this chapter, we’ve transitioned from probability theory to statistical inference by exploring the properties of estimators.

Key Takeaway 📝

  1. Parameters vs. Estimates: Population parameters are fixed but unknown, while sample estimates vary from sample to sample.

  2. Bias and Variance: Unbiased estimators target the correct parameter on average, while low-variance estimators provide more consistent results across samples.

  3. Common Estimators: The sample mean, proportion, variance, and their properties.

  4. Special Case of Variance: The necessity of dividing by n-1 in the sample variance formula, and the inherent bias in the sample standard deviation.

  5. Minimum Variance Unbiased Estimators (MVUEs): The theoretical ideal that minimizes estimation error while maintaining unbiasedness.

In subsequent chapters, we’ll build on these foundations to develop confidence intervals and hypothesis tests—powerful tools that form the backbone of modern statistical inference.

Exercises

  1. Proof Exercise: Show that the sample variance \(S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2\) is an unbiased estimator of the population variance \(\sigma^2\). (Hint: Add and subtract \(\mu\) inside the parentheses and expand the squared term.)

  2. Conceptual Question: Explain why we divide by n-1 rather than n when calculating the sample variance. How does this relate to degrees of freedom?

  3. Applied Problem: A researcher collects a sample of 25 measurements and calculates a sample mean of 42.7 and a sample standard deviation of 5.3. What can the researcher conclude about the precision of the estimated mean? How would this precision change if the sample size were increased to 100?