Worksheet 12: Point Estimators and Unbiased Estimation

Learning Objectives 🎯

Master the concept of point estimators and distinguish between parameters and statistics
Apply the definition of bias to evaluate whether an estimator is unbiased
Calculate expected values and variances of estimators mathematically and through simulation
Analyze the trade-offs between bias and variance in estimator selection
Implement R simulations to verify theoretical properties of estimators and compare their performance

Introduction

In previous lessons, we studied sampling distributions and how they describe the variability of statistics computed from samples. We focused on scenarios involving Normal distributions or approximate Normality via the Central Limit Theorem (CLT). Now, we turn our attention specifically to estimating unknown parameters using sample data.

When analyzing data, we often want to estimate unknown numerical characteristics (parameters) of a population. Examples of parameters include:

Population mean \(\mu\)
Population variance \(\sigma^2\)
Probability of success \(p\)
Rate parameter of a Poisson or Exponential distribution \(\lambda\)

A point estimator is a rule or formula that uses sample data to produce a single “best guess” for the unknown parameter.

Key Definitions 📚

Parameter \(\theta\): A fixed, unknown number describing a population characteristic

Point Estimator \(\hat{\theta}\): A statistic computed from a sample, intended to approximate \(\theta\)

Bias: The difference between the expected value of an estimator and the true parameter value:

\[\text{bias}(\hat{\theta}) = E[\hat{\theta}] - \theta\]

Unbiased Estimator: An estimator \(\hat{\theta}\) is unbiased if \(E[\hat{\theta}] = \theta\)

For example, if we wish to estimate the population mean \(\mu\), a natural estimator is the sample mean \(\overline{X}\). Similarly, to estimate a probability of success \(p\), the sample proportion \(\hat{p}\) can be used.

An unbiased estimator does not consistently underestimate or overestimate the parameter it targets. Instead, the estimation errors “balance out” across many samples.

Part 1: Estimating Parameters of the Exponential Distribution

The exponential distribution is commonly used to model waiting times and lifetimes. Understanding the properties of its estimators is crucial for reliable inference.

Note

Recall that for \(X \sim \text{Exponential}(\lambda)\), we have:

Population mean: \(\mu = E[X] = \frac{1}{\lambda}\)
Population variance: \(\sigma^2 = \text{Var}(X) = \frac{1}{\lambda^2}\)

Question 1: Consider a random variable \(X \sim \text{Exponential}(\lambda)\), where \(\lambda > 0\) is the rate parameter. Suppose we take \(n\) independent random samples from this distribution, i.e., \(X_1, X_2, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \text{Exponential}(\lambda)\).

We wish to estimate the parameter \(\mu\) (the mean) and the parameter \(\lambda\) (the rate) using these samples.

a) Mathematical Proof of Unbiasedness

Show mathematically that the sample mean \(\overline{X}\) is an unbiased estimator for \(\mu = 1/\lambda\). In other words, show that \(\text{bias}(\overline{X}) = 0\).

Hint: Use the linearity of expectation and the fact that \(E[X_i] = \mu\) for each observation.

\[E[\overline{X}] = E\left[\frac{1}{n}\sum_{i=1}^{n} X_i\right] =\]

Conclusion about bias:

\[\text{bias}(\overline{X}) = E[\overline{X}] - \mu =\]

b) Simulation Verification

Show by simulation that the sample mean \(\overline{X}\) is an unbiased estimator for \(\mu = 1/\lambda\) when \(\lambda = 10\).

Generate 1500 random samples of size \(n = 25\) from \(\text{Exponential}(\lambda = 10)\) and compute 1500 sample means. Compute and report the following summary statistics:

The average of the sample means = ____
The standard deviation of the sample means = ____
The proportion of sample means that exceed the true mean \(\mu = 1/10\) = ____
The proportion of sample means that are at most \(\mu = 1/10\) = ____

# Set seed for reproducibility
set.seed(350)

# Parameters
lambda <- 10
true_mu <- 1/lambda
n_samples <- 25
n_simulations <- 1500

# Generate samples and compute sample means
sample_means <- replicate(n_simulations, {
  # Generate one sample of size n_samples
  sample_data <- rexp(n_samples, rate = lambda)
  # Compute and return the sample mean
  mean(sample_data)
})

# i. Average of sample means
avg_sample_means <- mean(sample_means)
cat("i. Average of sample means:", avg_sample_means, "\n")

# ii. Standard deviation of sample means
sd_sample_means <- sd(sample_means)
cat("ii. Standard deviation of sample means:", sd_sample_means, "\n")

# iii. Proportion exceeding true mean
prop_exceed <- mean(sample_means > true_mu)
cat("iii. Proportion exceeding mu:", prop_exceed, "\n")

# iv. Proportion at most true mean
prop_at_most <- mean(sample_means <= true_mu)
cat("iv. Proportion at most mu:", prop_at_most, "\n")

# Visualization
library(ggplot2)
ggplot(data.frame(sample_means), aes(x = sample_means)) +
  geom_histogram(aes(y = after_stat(density)), bins = 40,
                 fill = "skyblue", color = "black", alpha = 0.7) +
  geom_density(color = "blue", lwd = 1.2) +
  stat_function(fun = dnorm,
                args = list(mean = true_mu, sd = true_mu/sqrt(n_samples)),
                color = "red", lwd = 1.2, linetype = "dashed") +
  geom_vline(xintercept = true_mu, color = "darkred",
             linetype = "dashed", lwd = 1.2) +
  geom_vline(xintercept = avg_sample_means, color = "darkblue",
             linetype = "solid", lwd = 1.2) +
  labs(title = "Distribution of Sample Means",
       subtitle = paste("Red dashed = True μ, Blue solid = Avg of sample means\n",
                       "Red curve = Theoretical N(μ, σ²/n), Blue curve = Empirical density"),
       x = "Sample Mean", y = "Density") +
  theme_minimal()

c) Interpretation

Using your summary statistics, do you think that the sample means were close to the expected value \(\mu = 1/10\)? Does the sample mean appear to be unbiased? Provide evidence from your simulation to support your conclusion.

d) 🔍 Estimating the Rate Parameter

Next, suppose instead we want to estimate the rate parameter \(\lambda\). Consider the estimator \(\hat{\lambda} = \frac{1}{\overline{X}}\).

Using your simulation data from part b), compute \(\hat{\lambda} = \frac{1}{\overline{X}}\) for each of your 1500 samples. Based on your results, is \(\hat{\lambda}\) likely unbiased as an estimator of \(\lambda\)?

If you think the estimator is biased, approximate that bias using your simulated data. Repeat the bias approximation for different values of \(n\). Do you observe that the bias remains roughly constant, or does it change with \(n\)?

# Compute lambda_hat for each sample mean
lambda_hat <- 1 / sample_means

# True lambda
true_lambda <- 10

# Average of lambda_hat estimates
avg_lambda_hat <- mean(lambda_hat)
cat("Average of lambda_hat:", avg_lambda_hat, "\n")
cat("True lambda:", true_lambda, "\n")

# Approximate bias
approx_bias <- avg_lambda_hat - true_lambda
cat("Approximate bias:", approx_bias, "\n")

# Visualization
ggplot(data.frame(lambda_hat), aes(x = lambda_hat)) +
  geom_histogram(aes(y = after_stat(density)), bins = 40,
                 fill = "lightcoral", color = "black", alpha = 0.7) +
  geom_density(color = "blue", lwd = 1.2) +
  geom_vline(xintercept = true_lambda, color = "red",
             linetype = "dashed", lwd = 1.2) +
  geom_vline(xintercept = avg_lambda_hat, color = "darkblue",
             linetype = "solid", lwd = 1.2) +
  labs(title = "Distribution of Lambda Hat Estimates",
       subtitle = paste("Red dashed = True λ, Blue solid = Avg of estimates\n",
                       "Blue curve = Empirical density (note the bias)"),
       x = "Lambda Hat", y = "Density") +
  theme_minimal()

Extension: Investigating Sample Size Effects 🖥️

Explore how bias changes with sample size

Repeat the bias calculation for \(n = 5, 10, 25, 50, 100\). Create a plot showing how the bias of \(\hat{\lambda}\) changes as \(n\) increases.

# Sample sizes to investigate
sample_sizes <- c(5, 10, 25, 50, 100)
bias_results <- numeric(length(sample_sizes))

for (i in seq_along(sample_sizes)) {
  n <- sample_sizes[i]

  # Generate samples and compute lambda_hat
  lambda_hats <- replicate(1500, {
    sample_data <- rexp(n, rate = 10)
    1 / mean(sample_data)
  })

  # Compute bias
  bias_results[i] <- mean(lambda_hats) - 10
}

# Plot bias vs sample size
ggplot(data.frame(n = sample_sizes, bias = bias_results),
       aes(x = n, y = bias)) +
  geom_line(lwd = 1.2, color = "darkblue") +
  geom_point(lwd = 3, color = "darkred") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray", lwd = 1) +
  labs(title = "Bias of Lambda Hat vs Sample Size",
       subtitle = "Gray dashed line = zero bias (unbiased estimator)",
       x = "Sample Size (n)", y = "Bias") +
  theme_minimal()

Your conclusions about :math:`hat{lambda}` as an estimator:

Part 2: Estimating the Maximum of a Uniform Distribution

When dealing with uniform distributions, estimating the upper bound presents unique challenges. This problem illustrates how intuitive estimators can be biased and how we can correct them.

Question 2: Suppose we have \(n\) i.i.d. random variables from a Uniform distribution:

\[X_1, X_2, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \text{Uniform}(0, \theta),\]

where \(\theta > 0\) is an unknown parameter representing the largest value that the random variables can possibly take on. Suppose we are able to obtain samples from this population and wish to estimate \(\theta\).

A natural estimator is the sample maximum:

\[\mathcal{M} = \max\{X_1, X_2, \ldots, X_n\}.\]

In this exercise, you will show that this estimator is biased but can be corrected to produce an unbiased estimator.

a) Deriving the Distribution of the Sample Maximum

We will work through deriving the distribution of \(\mathcal{M}\) to determine whether it is unbiased.

Step 1: Cumulative Distribution Function

Determine the cumulative distribution function for \(\mathcal{M}\).

Note

For a Uniform(0, θ) random variable X, the CDF is:

\[\begin{split}F_X(x) = \begin{cases} 0 & \text{if } x < 0 \\ \frac{x}{\theta} & \text{if } 0 \leq x \leq \theta \\ 1 & \text{if } x > \theta \end{cases}\end{split}\]

The event \(\{\mathcal{M} \leq x\}\) means that the maximum of the sample is at most \(x\). In other words, if the largest observation is at most \(x\), then all observations must be at most \(x\):

\[P(\mathcal{M} \leq x) = P(\{X_1 \leq x\} \cap \{X_2 \leq x\} \cap \ldots \cap \{X_n \leq x\})\]

Using the independence of the random variables and the known CDF of the \(\text{Uniform}(0, \theta)\) distribution, write out the full piecewise function for the CDF of \(\mathcal{M}\):

\[\begin{split}F_{\mathcal{M}}(x) = P(\mathcal{M} \leq x) = \begin{cases} \text{____} & \text{if } x < 0 \\ \text{____} & \text{if } 0 \leq x \leq \theta \\ \text{____} & \text{if } x > \theta \end{cases}\end{split}\]

Step 2: Probability Density Function

After you have obtained the CDF for \(\mathcal{M}\), determine the corresponding probability density function \(f_{\mathcal{M}}(x)\).

The PDF can be obtained by taking the derivative of the CDF with respect to \(x\):

\[f_{\mathcal{M}}(x) = \frac{d}{dx}F_{\mathcal{M}}(x)\]

For the region \((0, \theta)\), compute the derivative:

\[\begin{split}f_{\mathcal{M}}(x) = \begin{cases} \text{____} & \text{if } 0 < x < \theta \\ \text{____} & \text{otherwise} \end{cases}\end{split}\]

Warning

Common mistake: Don’t forget to consider the full piecewise structure of the PDF. The function should be defined for all values of x.

Step 3: Expected Value

Using the probability density function, calculate the expected value of the estimator \(\mathcal{M}\):

\[E[\mathcal{M}] = \int_{-\infty}^{\infty} x \cdot f_{\mathcal{M}}(x) \, dx =\]

Step 4: Bias Calculation

Using your result from Step 3, determine the bias of your estimator \(\mathcal{M}\):

\[\text{bias}(\mathcal{M}) = E[\mathcal{M}] - \theta =\]

Is :math:`mathcal{M}` an unbiased estimator?

b) Constructing an Unbiased Estimator

Since the estimator \(\mathcal{M}\) is biased, propose a simple modification to \(\mathcal{M}\) that results in an unbiased estimator, and denote this estimator as \(\hat{\theta}\).

Hint: You want to find a constant \(c\) such that \(E[c \cdot \mathcal{M}] = \theta\).

\[\hat{\theta} = \text{____}\]

Verify that your proposed estimator is unbiased:

\[E[\hat{\theta}] =\]

# Simulation to verify the unbiased estimator
set.seed(350)

# True parameter
true_theta <- 20
n_samples <- 10
n_simulations <- 2000

# Generate samples and compute both M and theta_hat
M_values <- replicate(n_simulations, {
  sample_data <- runif(n_samples, min = 0, max = true_theta)
  max(sample_data)
})

# YOUR TURN: Define theta_hat based on your formula from part b
theta_hat <- ___  # Fill in your correction formula

# Compare results
cat("True theta:", true_theta, "\n")
cat("Average of M:", mean(M_values), "\n")
cat("Average of theta_hat:", mean(theta_hat), "\n")
cat("Bias of M:", mean(M_values) - true_theta, "\n")
cat("Bias of theta_hat:", mean(theta_hat) - true_theta, "\n")

# Check proportion exceeding theta
cat("\nProportion of theta_hat exceeding theta:",
    mean(theta_hat > true_theta), "\n")

# Visualization
# Create stacked data frame manually without tidyr
results_df <- data.frame(
  Value = c(M_values, theta_hat),
  Estimator = rep(c("M", "theta_hat"), each = length(M_values))
)

ggplot(results_df, aes(x = Value, fill = Estimator)) +
  geom_histogram(aes(y = after_stat(density)), alpha = 0.6,
                 position = "identity", bins = 40) +
  geom_density(aes(color = Estimator), lwd = 1.2, fill = NA) +
  geom_vline(xintercept = true_theta, color = "red",
             linetype = "dashed", lwd = 1.2) +
  labs(title = "Comparison of M and Corrected Estimator",
       subtitle = "Red line = True θ, Density curves show empirical distributions",
       x = "Estimated Value", y = "Density") +
  scale_fill_manual(values = c("M" = "lightblue", "theta_hat" = "lightgreen")) +
  scale_color_manual(values = c("M" = "darkblue", "theta_hat" = "darkgreen")) +
  theme_minimal()

Important: Unbiased ≠ Symmetric! 🎯

You may notice that more than 50% of your θ̂ values exceed the true θ. This is not evidence of bias!

Unbiased means: E[θ̂] = θ (the average equals θ) ✓
Symmetric would mean: P(θ̂ > θ) = 50%

These are different properties! The distribution of θ̂ is left-skewed because:

It’s derived from M, which has a hard boundary at θ
Multiplying by (n+1)/n shifts the mean to θ but doesn’t change the shape
In left-skewed distributions: mode > median > mean
Therefore: P(θ̂ > θ) ≈ 0.60-0.65 (varies with n)

Check your results: - Is mean(theta_hat) ≈ θ? → Then it’s unbiased! ✓ - Is proportion exceeding θ around 60-65%? → That’s the skewness! ✓

Part 3: Minimum Variance Unbiased Estimators (MVUE)

While unbiasedness is a desirable property, there could be multiple unbiased estimators for the same parameter. Among all unbiased estimators, we typically prefer the one with the smallest variability (variance).

Definition: MVUE 🎯

An estimator \(\hat{\theta}\) is a Minimum Variance Unbiased Estimator (MVUE) if:

It is unbiased: \(E[\hat{\theta}] = \theta\)
It has the smallest variance among all possible unbiased estimators for \(\theta\)

The MVUE is the “best” unbiased estimator in the sense that it provides estimates consistently closest to the true parameter, on average.

Question 3: Suppose you collect data from a population that follows a Normal distribution with mean \(\mu\) and variance \(\sigma^2\). Specifically, consider the population distribution:

\[X \sim N(\mu = 50, \sigma^2 = 25)\]

Two natural and commonly used estimators for the population mean \(\mu\) are:

Estimator A: The sample mean \(\overline{X}\)
Estimator B: The sample median \(\tilde{X}\)

Both estimators intuitively seem plausible, and indeed both are unbiased for the mean when the population is Normal.

a) Simulation Comparison

Generate 2000 independent samples, each of size \(n = 15\), from the distribution \(N(\mu = 50, \sigma^2 = 25)\). Compute the sample mean and median for each of the 2000 independent samples.

Compute and report:

Average of the 2000 sample means = ____
Standard deviation of the 2000 sample means = ____
Average of the 2000 sample medians = ____
Standard deviation of the 2000 sample medians = ____

Were they both close to the true mean :math:`mu = 50` on average? Which estimator had less variability?

# Set seed for reproducibility
set.seed(350)

# Population parameters
true_mu <- 50
true_sigma <- sqrt(25)
sample_size <- 15
n_simulations <- 2000

# Initialize vectors to store estimates
sample_means <- numeric(n_simulations)
sample_medians <- numeric(n_simulations)

# Generate samples and compute both estimators
for (i in 1:n_simulations) {
  sample_data <- rnorm(sample_size, mean = true_mu, sd = true_sigma)
  sample_means[i] <- mean(sample_data)
  sample_medians[i] <- median(sample_data)
}

# Compute summary statistics
cat("=== Sample Mean ===\n")
cat("i. Average of sample means:", mean(sample_means), "\n")
cat("ii. SD of sample means:", sd(sample_means), "\n")

cat("\n=== Sample Median ===\n")
cat("iii. Average of sample medians:", mean(sample_medians), "\n")
cat("iv. SD of sample medians:", sd(sample_medians), "\n")

# Comparison
cat("\n=== Comparison ===\n")
cat("Variance of sample means:", var(sample_means), "\n")
cat("Variance of sample medians:", var(sample_medians), "\n")
cat("Ratio (Var(Median)/Var(Mean)):",
    var(sample_medians) / var(sample_means), "\n")

# Visualization
# Create stacked data frame manually without tidyr
estimator_df <- data.frame(
  Value = c(sample_means, sample_medians),
  Estimator = rep(c("Mean", "Median"), each = length(sample_means))
)

# Calculate theoretical SDs for overlay
theoretical_sd_mean <- true_sigma / sqrt(sample_size)
theoretical_sd_median <- 1.253 * true_sigma / sqrt(sample_size)  # Median efficiency factor

ggplot(estimator_df, aes(x = Value, fill = Estimator)) +
  geom_density(alpha = 0.5, lwd= 1.2) +
  stat_function(fun = dnorm,
                args = list(mean = true_mu, sd = theoretical_sd_mean),
                aes(color = "Mean (theoretical)"), lwd = 1, linetype = "dashed") +
  stat_function(fun = dnorm,
                args = list(mean = true_mu, sd = theoretical_sd_median),
                aes(color = "Median (theoretical)"), lwd = 1, linetype = "dashed") +
  geom_vline(xintercept = true_mu, color = "red",
             linetype = "dashed", lwd = 1.2) +
  labs(title = "Distribution of Sample Mean vs Sample Median",
       subtitle = paste("Red line = True μ =", true_mu,
                       "\nSolid curves = Empirical, Dashed curves = Theoretical N(μ, σ²/n)"),
       x = "Estimated Value", y = "Density") +
  scale_fill_manual(values = c("Mean" = "skyblue", "Median" = "lightcoral"),
                   name = "Estimator") +
  scale_color_manual(values = c("Mean (theoretical)" = "darkblue",
                                "Median (theoretical)" = "darkred"),
                    name = "Theoretical") +
  theme_minimal()

Your observations:

What to Look For 🔍

When comparing the two estimators, pay attention to:

Centering: Are both distributions centered near the true μ = 50? (This checks unbiasedness)
Spread: Which distribution is narrower? (This indicates lower variance)
Theoretical curves: Do the dashed lines match the empirical densities? (This validates theory)

The estimator with lower variance gives more precise estimates - this is the MVUE!

b) Practical Implications

In practice, we will never have access to 2000 samples; instead, we will only have a single sample of size \(n\). If you have a good reason to assume that the population you sampled from is Normal or nearly Normal, which estimator would you prefer for estimating the central tendency? Use your exploration in part a) to justify your answer.

Your recommendation and justification:

Note

Important Consideration: While unbiased estimators are generally desirable, sometimes an estimator with slight bias but much smaller variability can provide better estimates overall. In practice, small bias may be an acceptable tradeoff for reduced uncertainty. This leads to concepts like Mean Squared Error (MSE) which we may explore in future lessons.

Key Takeaways

Summary 📝

Unbiased estimators satisfy \(E[\hat{\theta}] = \theta\), meaning they hit the true parameter value on average across many samples
Bias can sometimes be corrected through algebraic adjustments, as seen with the sample maximum for Uniform distributions
Simulation provides powerful verification of theoretical properties and helps build intuition about estimator behavior
Among unbiased estimators, prefer lower variance - the MVUE balances unbiasedness with minimum variability
For Normal populations, the sample mean is MVUE and outperforms the sample median in terms of efficiency
R simulations enable exploration of estimator properties across different sample sizes and parameter values, revealing insights that may be difficult to derive theoretically