9.3. Precision of a Confidence Interval and Sample Size Calculation

In statistics, it’s not enough to simply estimate a parameter—we must also quantify our uncertainty. In Chapter 9.1, we established that confidence intervals provide this crucial perspective by offering a range of plausible values for population parameters. Now we’ll focus on making these intervals as informative as possible through thoughtful planning.

A confidence interval is only as informative as its width. Consider the single-sample confidence interval for a population mean when the population standard deviation σ is known:

\[\bar{x}\;\pm\;z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}\]

The term that appears after the ± symbol is called the margin of error (ME):

\[\text{ME}=z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}\]

This margin of error directly affects the precision and practical utility of our interval:

  • A wide interval signals low precision—we know the parameter lies somewhere in a broad range, but we can’t pinpoint it well.

  • A narrow interval signals high precision—we’ve effectively narrowed down the plausible values.

Our goal in this chapter is to develop a quantitative approach to this relationship: given a desired level of precision (margin of error), how large must our sample size be?

Road Map 🧭

Fill Content

9.3.1. Controlling the Margin of Error

The margin of error consists of three components, each affecting the width of our confidence interval:

  1. Confidence level (reflected in \(z_{\alpha/2}\)): Higher confidence requires wider intervals. For example, a 99% confidence interval will be wider than a 95% interval for the same data.

  2. Population variability (σ): Greater natural variability in the population necessitates wider intervals. When individual observations vary widely, we need more data to estimate the center with precision.

  3. Sample size (n): Larger samples provide narrower intervals according to the “square-root law”—doubling the sample size reduces the margin of error by a factor of \(\sqrt{2}\) (approximately 1.41).

Relationship between margin of error and sample size

Fig. 9.5 Figure 9.3: The inverse square-root relationship between sample size and margin of error. As n increases, ME decreases, but with diminishing returns.

This relationship between sample size and margin of error is crucial for research planning. The square-root relationship means that to halve your margin of error, you must quadruple your sample size—an important consideration when balancing precision against the cost and effort of data collection.

9.3.2. Determining Required Sample Size

Suppose a researcher wants to estimate a population mean μ with a specified level of precision. Specifically, they want the estimate \(\bar{x}\) to be within E units of the true mean μ with \((1-\alpha)100\%\) confidence.

Setting \(\text{ME} \leq E\) and solving for n gives us:

\[n \geq \left(\frac{z_{\alpha/2}\,\sigma}{E}\right)^2\]

Since sample size must be an integer, we always round up to the next whole number to ensure we meet or exceed our precision requirement.

Practical Considerations for Sample Size Planning

When planning a study, several practical factors affect sample size determination:

  • Unknown population standard deviation: If σ is unknown (the most common case), use a planning value. This could be: - A standard deviation from a pilot study or similar previous research - An educated guess based on the expected range (approximately range/4) - A conservative upper bound when uncertainty is high

  • Confidence level selection: Higher confidence (e.g., 99% vs. 95%) substantially increases the required sample size. Choose a level that balances confidence with practical constraints.

  • Non-response and data quality: Always build in a small cushion (5-10%) for potential data loss due to non-response, measurement errors, or outliers.

  • Resource constraints: Balance statistical ideals against practical limitations of time, budget, and participant availability.

Example 💡 - Planning a Nutrition Study

A nutritionist wants to estimate the mean daily sodium intake of adults with a margin of error of 50 mg at 95% confidence. Previous studies suggest σ ≈ 230 mg.

To determine the required sample size:

\[n = \left(\frac{z_{0.025}\,\cdot\,230}{50}\right)^2 = \left(\frac{1.96\,\cdot\,230}{50}\right)^2 \approx 81.7\]

Rounding up, the nutritionist should plan to sample 82 adults. See below for an example R implementation of the solution:

# Sample size calculation in R
z   <- qnorm(0.975)      # 95% two-sided confidence
sig <- 230               # planning standard deviation
E   <- 50                # desired margin of error

n_required <- ceiling((z*sig/E)^2)
n_required  # 82

9.3.3. The Square-Root Law: Balancing Precision and Resources

The sample size formula reveals a quadratic relationship between precision and sample size—doubling precision (halving E) requires quadrupling the sample size. This has profound implications for research planning:

  • Going from a margin of error of 50 mg to 25 mg would require not 82 but 328 participants

  • Increasing precision from ±5% to ±1% requires 25 times more data

  • The law of diminishing returns applies: each additional increment of precision costs progressively more

When planning studies, researchers must carefully consider whether the benefit of increased precision justifies the additional resource expenditure. Sometimes, a slightly wider confidence interval is an acceptable compromise when balanced against practical constraints.

Cost of precision curve showing the nonlinear relationship

Fig. 9.6 Figure 9.4: The nonlinear relationship between precision and required sample size. As the desired margin of error shrinks, the required sample size grows exponentially.

9.3.4. Assumptions Underlying the Formula

The sample size formula rests on three key assumptions:

  1. Simple Random Sample: The data must represent a simple random sample from the target population, ensuring independence and representativeness.

  2. Known Population Standard Deviation: The formula assumes σ is known. In practice, we typically use an estimate or planning value.

  3. Normality Condition: Either the population follows a normal distribution, or the sample size is large enough for the Central Limit Theorem to ensure that \(\bar{X}\) is approximately normally distributed.

If these conditions are not met, the actual coverage probability of the resulting confidence interval may differ from the nominal level, potentially requiring larger samples or different approaches.

9.3.5. The Role of Sample Size in Statistical Inference

Sample size plays a pivotal role in statistical inference beyond just controlling the margin of error:

  • Statistical significance: Larger samples provide greater power to detect effects

  • Practical significance: Narrow confidence intervals help distinguish between statistically significant but practically meaningless effects

  • Study reliability: Adequate sample sizes reduce the risk of both false positives and false negatives

By carefully planning sample size, researchers ensure their studies are neither underpowered (wasting resources and ethical goodwill) nor overpowered (detecting trivial effects).

9.3.6. Looking Ahead: Next Steps in Confidence Interval Construction

In this chapter, we’ve focused on determining sample size when planning a study. This knowledge forms the foundation for more sophisticated inference techniques:

  • Lecture 9.3 will derive the full confidence interval expression when σ is known, including validity checks.

  • Lecture 9.4 will cover one-sided confidence bounds for situations where we care only about an upper or lower limit.

  • Lecture 9.5 will introduce the Student’s t distribution for the more realistic case where σ is unknown and must be estimated from the data.

Each of these advances builds upon the fundamental relationship between sample size, confidence level, and precision explored in this chapter.

Key Takeaways 📝

  • The margin of error quantifies the precision of a confidence interval.

  • Three factors control precision: confidence level, population variability, and sample size.

  • Required sample size can be calculated using: \(n \geq \left(\frac{z_{\alpha/2}\,\sigma}{E}\right)^2\)

  • The square-root law means that doubling precision requires quadrupling the sample size.

  • Sample size planning is a critical step in research design, ensuring studies are both statistically sound and resource-efficient.

Exercises

  1. A battery manufacturer wants to estimate the mean battery voltage with a margin of error of 0.05 V at 99% confidence. Past data suggest σ = 0.21 V. Calculate the required sample size.

  2. A researcher originally planned to estimate a population mean with a margin of error of 5 units. Due to budget constraints, they can only collect half the planned sample size. How will this affect their margin of error?

  3. A study measuring adult blood pressure reported a 95% confidence interval of 120 ± 3 mmHg based on a sample of 100 adults. If the researchers had wanted to reduce the margin of error to 2 mmHg while maintaining the same confidence level, how many participants would they have needed?

  4. In practice, σ is almost never known in advance. Describe two approaches for obtaining a planning value for σ, and discuss the strengths and limitations of each approach.

  5. Why is it generally not cost-effective to seek extremely narrow confidence intervals (e.g., reducing margin of error below 1% of the estimate)? Provide both statistical and practical justifications for your answer.