Slides 📊

9.1. Introduction to Statistical Inference

After developing the foundational tools of probability theory, exploring random variables, and understanding sampling distributions, we have finally arrived at the core of statistical practice: statistical inference. This exciting chapter marks our transition from describing uncertainty to making decisions under uncertainty—the essence of statistics as a discipline.

Road Map 🧭

Recall the relationship between a population parameter and its estimator. Estimator yields estimates that vary from sample to sample, and their behavior is described by the sampling distribution.
Select an optimal estimator out of many candidates by evaluating their distributional properties. Know the definitions of bias and variance, and use them appropriately as selection criteria. Be able to compute bias for some simple estimators.
Know the definition of Minimum Variance Unbiased Estimator (MVUE).

9.1.1. From Population Parameters to Estimators

In statistical research, we aim to understand certain characteristics of a population that are fixed but unknown to us. These characteristics, called parameters, include:

The population mean (\(\mu\)),
The population variance (\(\sigma^2\)), and
Other quantities describing the population’s distribution.

The fundamental challenge we face is that examining every member of a population is typically impractical or impossible, even though doing so would be required to determine these characteristics with certainty. Instead, we rely on a representative sample to make inferences about the popoulation and specify how uncertain we are about the result.

Diagram showing relationship between population parameters and sample estimates — Fig. 9.1 Relationship of Parameters, estimators, and estimates

Point Estimators and Estimates

An estimator is a random variable which contains instructions on how to use sample data to calculate an estimate of a population parameter. When an estimator is designed to yield a single numerical value as an outcome, it is called a point estimator. The single-valued outcome is called a point estimate.

Example💡: \(\bar{X}\) is a Point Estimator of \(\mu\)

One possible point estimator of the population mean \(\mu\) is the sample mean \(\bar{X}\). Its definition contains instructions on the calculation procedure—adding all observed values and dividing by the sample size. A point estimate \(\bar{x}\) is obtained as a single concrete numerical value (e.g., \(\bar{x} = 42.7\)) by applying these instructions to an observed sample.

A Parameter Has Many Point Estimators

There are many different ways to guess a parameter value using data. For example, we can choose to estimate the population mean \(\mu\) with

The sample mean \(\bar{X}\) (the typical choice),
The sample median \(\tilde{X}\) (for symmetric distributions whose true mean and true median are equal, this is reasonable),
The mean of all data points except \(m\) most extreme values, etc.

It is easy to generate many reasonable candidates. The key question is, then: What objective criteria can we use to determine which estimator is better than others?

Recall from Chapter 7 that each estimator has its own distribution, called the sampling distribution. To answer the question, we focus on two key properties of the sampling distribution: bias and variance. For the remainder of this section, we use \(\theta\) (Greek letter “theta”) to denote a population parameter, and \(\hat{\theta}\) (“theta hat”) for an estimator of \(\theta\).

9.1.2. Evaluating Estimators

Bias: Does the Estimator Target the Right Value?

The bias of an estimator measures whether it systematically overestimates or underestimates the parameter of interest. It is mathematically defined as

\[\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta.\]

An estimator \(\hat{\theta}\) is unbiased if \(\text{Bias}(\hat{\theta})=0\), or equivalently, if its expected value equals the parameter it aims to estimate:

\[\mathbb{E}[\hat{\theta}] = \theta.\]

Example 💡: Is the Sample Mean Unbiased?

We know that \(\mathbb{E}[\bar{X}] = \mu\). So the sample mean \(\bar{X}\) is an unbiased estimator of \(\mu\).

Variance: How Precise is the Estimator?

The variance of an estimator quantifies the spread of its sampling distribution—essentially how much the estimator fluctuates from sample to sample. Lower variance indicates greater precision and reliability.

Minimum-Variance Unbiased Estimator (MVUE)

When choosing froom a set of unbiased estimators, we typically prefer the one with the smaller variance, as this reduces the expected “distance” between the estimate and the true parameter. An ideal estimator is called minimum-variance unbiased estimator (MVUE)—that is, an estimator with the smallest possible variance among all unbiased estimators for a given parameter.

Whether an estimator is MVUE depends on the population distribution, and proving this property generally requires advanced theoretical tools. Nonetheless, we will encounter several examples as we explore the properties of key estimators.

Bias-Variance Tradeoff

An unbiased estimator is not always a better choice than a biased estimator. If an estimator is slightly biased but has a significantly lower variance than its unbiased competitor, then there may be situations where the former is more practical. In fact, bias and variance often exhibit a trade-off relationship; reducing bias in an estimator may increase its variance, and vice versa. We should always take the degree of both bias and variance into consideration when choosing an estimator. See Figure Fig. 9.2 for a visual illustratiion:

Comparison of biased versus unbiased estimators — Fig. 9.2 Comparison of biased and unbiabsed estimators

9.1.3. Important Estimators and Their Properties

To solidify the concepts of bias and variance in estimators, let’s examine several common estimators and their properties. In all cases below, suppose \(X_1, X_2, \cdots, X_n\) form an iid sample from the same population.

A. Sample Mean for Population Mean

The sample mean \(\bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i\) serves as an estimator for the population mean \(\mu\). We know that \(\mathbb{E}[\bar{X}] = \mu\), which makes it an unbiased estimator of \(\mu\).

When sampling from a normal distribution, the sample mean is also a minimum-variance unbiased estimator (MVUE).

B. Sample Proportion for True Probability

Suppose identical trials are performed \(n\) times, and whether an event \(A\) occurs or not in each trial is recorded using Bernoulli random variables of the following form:

\[\begin{split}I_i(A) = \begin{cases} 1, & \text{with probability } P(A) \\ 0, & \text{with probability } 1 - P(A) \end{cases}\end{split}\]

for \(i=1,2, \cdots, n\).

Define \(\hat{p} = \frac{1}{n}\sum_{i=1}^n I_i(A)\). Then \(\hat{p}\) is an unbiased estimator for \(P(A)\) because

\[\begin{split}E[\hat{p}] &= E\left[\frac{1}{n}\sum_{i=1}^n I_i(A)\right] = \frac{1}{n}\sum_{i=1}^n E[I_i(A)]\\ &=\frac{1}{n}\sum_{i=1}^n (1 \cdot P(A) + 0 \cdot (1-P(A))) \\ &= \frac{n}{n}P(A) = P(A).\end{split}\]

This result can be used to define an unbiased estimator for an entire probability distribution.

(a) Estimating a PMF

Suppose \(X_1, X_2, \cdots, X_n\) form an iid sample of a discrete population \(X\). For each value \(x \in \text{supp}(X)\), define:

\[\hat{p}_X(x) = \frac{1}{n}\sum_{i=1}^{n}I(X_i = x),\]

where \(I(X_i = x)=1\) if the \(i\)-th sample point equals \(x\), and \(0\) otherwise. Then, by the same logic as the general case, \(\hat{p}_X(x)\) is an unbiased estimator of \(p_X(x)\) for each \(x \in \text{supp}(X)\):

\[E[\hat{p}_X(x)] = p_X(x) = P(X = x).\]

(b) Estimating a CDF

For continuous random variables, we can estimate the cumulative distribution function (CDF) at any \(x \in \text{supp}(X)\) using:

\[\hat{F}_X(x) = \frac{1}{n}\sum_{i=1}^{n}I(X_i \leq x).\]

\(\hat{F}_X(x)\) represents the proportion of observations less than or equal to \(x\). As another special application of the general case, this is an unbiased estimator of the true CDF \(F_X(x)\). That is,

\[E[\hat{F}_X(x)] = P(X \leq x) = F_X(x).\]

C. Sample Variance

The sample variance:

\[S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2\]

is an unbiased estimator of \(\sigma^2\). To see how, let us compute the expected value.

Add and subtract \(\mu\) inside each squared term: \((X_i - \bar{X})^2 = ((X_i - \mu) - (\bar{X} - \mu))^2\).
Using Step 1,

\[\begin{split}E[S^2]=&E\left[\frac{1}{n-1}\sum_{i=1}^n ((X_i - \mu) - (\bar{X} - \mu))^2\right]\\ =&\frac{1}{n-1}\sum_{i=1}^nE[((X_i - \mu) - (\bar{X} - \mu))^2]\\ =&\frac{1}{n-1}\sum_{i=1}^n E[(X_i - \mu)^2 -2(X_i - \mu)(\bar{X} - \mu) + (\bar{X} - \mu)^2]\\ =&\frac{1}{n-1}\sum_{i=1}^n E[(X_i - \mu)^2] -2E[(X_i - \mu)(\bar{X} - \mu)] + E[(\bar{X} - \mu)^2]\\\end{split}\]
The first and third expecations are respectively the variances of \(X_i\) and \(\bar{X}\) by definition.

\[\begin{split}&E[(X_i - \mu)^2] = Var(X_i) = \sigma^2\\ &E[(\bar{X} - \mu)^2] = Var(\bar{X}) = \frac{\sigma^2}{n}\end{split}\]
The expectation \(E[(X_i - \mu)(\bar{X} - \mu)]\) can be simplified to

\[\begin{split}&E[(X_i - \mu)\cdot \frac{1}{n}\sum_{j}(X_j - \mu)]= \frac{1}{n} \sum_{j=1}^n E[(X_i - \mu)(X_j - \mu)]\\ &= \frac{1}{n}E[(X_i - \mu)(X_i - \mu)] = \frac{1}{n}E[(X_i -\mu)^2] = \frac{\sigma^2}{n}\end{split}\]

All terms involving indices \(j\neq i\) disappear in the final steps since \(Cov(X_i, X_j) = E[(X_i-\mu)(X_j - \mu)] =0\) due to their independnce.
Substituting the results back to the final line of Step 2, we can verify that the sum simplifies to \((n-1)\sigma^2\). Therefore,

\[E[S^2] = \frac{1}{n-1} (n-1)\sigma^2 = \sigma^2.\]

This shows why we devide by \(n-1\) instead of \(n\) when computing a sample variance; this is key to making the estimator unbiased.

When sampling from normal populations, the sample variance \(S^2\) is also the MVUE for the population variance \(\sigma^2\).

Additional Exercise 💡: Sample Variance When \(\mu\) is Known

In an unrealistic situation where \(\sigma^2\) is not known but \(\mu\) is, we must incorporate the known information into our variance estimation. Show that

\[\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^{n}(X_i - \mu)^2\]

is an unbiased estimator of \(\sigma^2\).

The Biased Case of Sample Standard Deviation

While the sample variance \(S^2\) is an unbiased estimator of \(\sigma^2\), the sample standard deviation \(S = \sqrt{S^2}\) is a biased estimator of the population standard deviation \(\sigma\).

Its biasedness can be shown using a concept called Jensen’s inequality and the fact that the square root is a concave function. The details are beyond the scope of this course, but you are encouraged to read about the topic independently.

We still use \(S\) as our estimator for \(\sigma\) because the formula is straightforward and intuitive, while the bias is typically small, especially for larger sample sizes.

9.1.4. Brining It All Together

In this chapter, we’ve transitioned from probability theory to statistical inference by exploring the properties of point estimators.

Key Takeaway 📝

Estimators yield estimates intended to approximate a fixed but unknown population parameter. Estimates vary from sample to sample according to their sampling distribution.
The two most important distributional characteristics of an estimator are bias and variance. Unbiased estimators target the correct parameter on average, while low-variance estimators provide more consistent results across samples.
Minimum Variance Unbiased Estimators (MVUEs) have the smallest variance among all unbiased estimators of a target parameter.
We can show that \(\bar{X}\) are \(S^2\) are unbiased estimators of their respective targets, \(\mu\) and \(\sigma^2\). When the population is normal, they are also the MVUEs.