headcommand {slideentry {0}{0}{1}{1/1}{}{0}} headcommand {beamer@framepages {1}{1}} headcommand {slideentry {0}{0}{2}{2/2}{}{0}} headcommand {beamer@framepages {2}{2}} headcommand {slideentry {0}{0}{3}{3/3}{}{0}} headcommand {beamer@framepages {3}{3}} headcommand {slideentry {0}{0}{4}{4/4}{}{0}} headcommand {beamer@framepages {4}{4}} headcommand {slideentry {0}{0}{5}{5/5}{}{0}} headcommand {beamer@framepages {5}{5}} headcommand {slideentry {0}{0}{6}{6/6}{}{0}} headcommand {beamer@framepages {6}{6}} headcommand {slideentry {0}{0}{7}{7/7}{}{0}} headcommand {beamer@framepages {7}{7}} headcommand {slideentry {0}{0}{8}{8/8}{}{0}} headcommand {beamer@framepages {8}{8}} headcommand {slideentry {0}{0}{9}{9/9}{}{0}} headcommand {beamer@framepages {9}{9}} headcommand {slideentry {0}{0}{10}{10/10}{}{0}} headcommand {beamer@framepages {10}{10}} headcommand {beamer@partpages {1}{10}} headcommand {beamer@subsectionpages {1}{10}} headcommand {beamer@sectionpages {1}{10}} headcommand {beamer@documentpages {10}} headcommand {gdef inserttotalframenumber {10}} ull hypothesis is not rejected, even though it is false • Statistical Power: The probability that a statistical test will correctly reject a false null hypothesis

Key Concepts Review 📚

We defined the idea of a meaningful alternative hypothesis representing the specific effect or difference that a study needs to detect. We illustrated these concepts visually, exploring how changes in:

  • Significance level \((\alpha)\)

  • Sample size \((n)\)

  • Variability \((\sigma)\)

affect power and errors. Furthermore, we calculated statistical power explicitly and learned to determine the sufficient sample size required to achieve a desired level of power for a specified alternative.

Building upon these foundational concepts, we now proceed to perform an actual test of significance for a single unknown population mean.

The Hypothesis Testing Framework

Recall that a hypothesis is a clearly stated mathematical claim about a population parameter. A hypothesis test formally assesses whether observed sample data provide sufficient evidence to support or contradict this claim. Specifically, a test evaluates two competing statements:

  • Null hypothesis \((H_0)\): Represents the existing belief or status quo

  • Alternative hypothesis \((H_a)\): Represents a claim we suspect to be true or wish to establish

The Test Statistic (Known σ)

To objectively assess evidence against the null hypothesis, we rely on a carefully constructed numerical measure called a test statistic. This numeric measure assesses how closely our observed sample aligns with the claim made in the null hypothesis \((H_0)\).

Specifically, for a test concerning the unknown population mean \((\mu)\), when the population standard deviation \((\sigma)\) is known, we use the following standardized test statistic, denoted \(Z_{TS}\):

\[Z_{TS} = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}\]

This test statistic has several important elements:

  • \(\bar{X}\): The sample mean, serving as the point estimator of the unknown population mean

  • \(\mu_0\): The hypothesized value under the null hypothesis, representing the status quo or baseline assumption we are testing against

  • \(\sigma/\sqrt{n}\): The standard error, which quantifies the typical amount we expect our sample mean to vary from sample to sample due to randomness alone

The computed value of the test statistic measures how far, in standard error units, our observed sample mean \((\bar{x})\) lies from the hypothesized mean \((\mu_0)\).

Distribution of the Test Statistic

The test statistic itself is a random variable that would change from sample to sample. To objectively assess the observed magnitude of the test statistic, we must understand its probability distribution under the assumption that the null hypothesis is true.

If the following conditions are satisfied:

  1. The sample data are collected as a simple random sample (SRS)

  2. Either the underlying population distribution is approximately Normal, or the Central Limit Theorem (CLT) justifies normality due to sufficiently large sample size

Then, under these conditions, the test statistic \(Z_{TS}\) follows a standard Normal distribution.

The p-value

This known distribution enables us to calculate the probability (the p-value) of observing a test statistic at least as extreme as ours, thus providing a clear, consistent method to interpret the strength of evidence against the null hypothesis.

Understanding the p-value 🎯

The p-value is a direct measure of how strongly the data contradict or support the null hypothesis:

  • If the p-value is small (less than or equal to our significance level \(\alpha\)), we conclude that the data provide strong evidence against the null hypothesis and thus reject \(H_0\)

  • If the p-value is large (greater than \(\alpha\)), we conclude that there is insufficient evidence to reject \(H_0\)

We now proceed to carefully explore each step of conducting a hypothesis test, demonstrating clearly how to interpret and communicate our statistical findings.

Part 1: Simulating Test Statistics and P-values

In this question, you will simulate and explore the behavior of the z-test statistic \((Z_{TS})\) and the p-value for testing a claim about the population mean.

Question 1a: Simulation When Null Hypothesis is True

Suppose you are conducting a hypothesis test for the population mean, where the hypotheses are stated as follows:

\[H_0: \mu \leq 100\]
\[H_a: \mu > 100\]

Assume the following conditions:

  • The true population mean is exactly 100 (i.e., the null hypothesis \(H_0\) is true)

  • The population is Normal with known standard deviation \(\sigma = 15\)

  • Your sample size is \(n = 25\)

  • The probability of Type I error is fixed at \(\alpha = 0.05\)

To explore the distribution of your test statistic under these conditions, you will run a simulation in R. You will repeat your experiment 1500 times. Each repetition involves drawing a simple random sample (SRS) of size \(n = 25\) from the specified population and computing the test statistic.

Task (i): Generate Sample Means

Obtain 1500 simple random samples from \(N(\mu = 100, \sigma = 15)\) and compute 1500 sample means.

# Set the number of simulations (SRS) and sample size for each simulation
SRS   <- 1500      # number of simulated experiments (repeats)
n     <- 25        # sample size in each simulation
mu0   <- 100       # hypothesized population mean under H0
sigma <- 15        # known population standard deviation

mu <- 100

# Generate SRS*n random samples from N(mu0, sigma^2) under H0
data.vec <- rnorm(SRS * n, mean = mu, sd = sigma)
data.mat <- matrix(data.vec, nrow = SRS)  # each row corresponds to one simulation

# Calculate the sample mean for each simulation (each row)
avg <- apply(data.mat, 1, mean)

Task (ii): Calculate the z-Test Statistic

For each of the 1500 simulations, compute the test statistic:

\[z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]
z_stat <- (avg - mu0) / (sigma / sqrt(n))

Task (iii): Create Histogram with Critical Value

Construct a histogram of the 1500 \(z_{TS}\) values. Superimpose the smooth kernel density and the theoretical standard Normal distribution density curve. Add a vertical dashed line marking the critical cutoff (rejection threshold) for a one-tailed test at significance level \(\alpha = 0.05\).

# Calculate summary statistics for the test statistics
xbar_z <- mean(z_stat)
s_z    <- sd(z_stat)

cat("Sample Size (per simulation):", n, "\n")
cat("Number of Simulations:", SRS, "\n")
cat("Mean of Test Statistics:", xbar_z, "\n")
cat("Standard Deviation of Test Statistics:", s_z, "\n")

# Compute the cutoff value for a one-tailed test at alpha = 0.05
alpha <- 0.05
cutoff <- qnorm(1 - alpha)   # approximately 1.645

cat("Cutoff value for alpha =", alpha, ":", cutoff, "\n")

# Plot the histogram of the simulated test statistics using ggplot2
ggplot(data.frame(z_stat = z_stat), aes(x = z_stat)) +
geom_histogram(aes(y = after_stat(density)), bins = 30,
                fill = "grey", color = "black") +
geom_density(color = "red", linewidth = 1) +
# Overlay the theoretical standard Normal density
stat_function(fun = dnorm, args = list(mean = 0, sd = 1),
                color = "blue", linewidth = 1) +
# Add a vertical line for the cutoff point
geom_vline(xintercept = cutoff, linetype = "dashed",
            color = "darkgreen", linewidth = 1) +
ggtitle("Distribution of z-Test Statistics Under H0") +
xlab("Test Statistic (Z)") +
ylab("Density")

Task (iv): Calculate Proportion Exceeding Critical Value

Calculate the proportion of times that the test statistic exceeds the critical value \(z_{0.05}\).

Your Answer:

Proportion (Null True scenario):

Task (v): Compute and Visualize P-values

Compute the p-values for each of the 1500 test statistics. For this upper-tailed test, the p-value is:

\[\text{p-value} = P(Z > z_{TS})\]

Create a histogram of the p-values and calculate the proportion of simulated experiments where the p-value is less than 0.05.

p_values <- pnorm(z_stat, lower.tail = FALSE)
# Summarize the p-values
mean_p_value <- mean(p_values)
cat("Average p-value across simulations:", mean_p_value, "\n")

# Compute cutoff for significance (alpha = 0.05)
alpha <- 0.05

# Plot the histogram of the simulated p-values using ggplot2
ggplot(data.frame(p_values = p_values), aes(x = p_values)) +
geom_histogram(aes(y = after_stat(density)), bins = 30,
                fill = "grey", color = "black") +
geom_density(color = "red", linewidth = 1) +
# Add vertical dashed line for alpha = 0.05 cutoff
geom_vline(xintercept = alpha, linetype = "dashed",
            color = "darkgreen", linewidth = 1) +
ggtitle("Distribution of p-values (H0 false: true mean = 105, H0: mu=100)") +
xlab("p-value") +
ylab("Density")

# Determine the proportion of p-values below alpha (rejecting H0)
mean(p_values < alpha)

Your Answer:

Proportion of p-values < 0.05 (Null True scenario):

Interpretation Question:

What do you notice about the distribution of p-values when the null hypothesis is true? Why does the proportion of p-values less than 0.05 approximately equal \(\alpha\)?

Question 1b: Simulation When Null Hypothesis is False

Now repeat the entire simulation process, but this time assume the true population mean is \(\mu = 105\) (so the null hypothesis \(H_0: \mu \leq 100\) is false).

Your Answer:

Proportion (Null False scenario):

Task (ii): P-values When Null is False

Compute the p-values for each of the 1500 test statistics and create a histogram. Calculate and report the proportion of simulated experiments where the p-value is less than 0.05 (i.e., the proportion of experiments that correctly reject the null hypothesis).

Your Answer:

Proportion of p-values < 0.05 (Null False scenario):

Question 1c: Comparing the Two Scenarios

a) Distribution of Test Statistics:

What did you observe about the distribution of test statistics under each scenario? What differences did you observe in the distribution of the test statistics when the null hypothesis was true \((\mu = 100)\) versus when it was false \((\mu = 105)\)? How did changing the true mean affect the proportion of test statistics exceeding the cutoff value?

b) Distribution of P-values:

Describe and clearly explain how the shape and pattern of simulated p-values differ when the null hypothesis is true (mean = 100) versus when the null hypothesis is false (mean = 105). Why do you observe these differences, and what does this illustrate about the concepts of Type I error and statistical power in hypothesis testing?

Hint: Calculate the power if the alternative was \(\mu_a = 105\).

Your Explanation:

The t-Test When σ is Unknown

In real-world scenarios, the population standard deviation \((\sigma)\) is typically not known. As we discussed previously when exploring confidence intervals, this means we must estimate the population standard deviation from our sample data, using the sample standard deviation \((s)\).

However, estimating the standard deviation introduces an additional layer of uncertainty. Even if our original data come from a population that is exactly normal, our test statistic no longer follows a standard Normal distribution. Instead, because we now rely on an estimated variability, the test statistic has greater variability and consequently follows a distribution with heavier tails: a Student’s t-distribution.

The t-Test Statistic

Specifically, when performing hypothesis tests for an unknown population mean \((\mu)\), and when estimating \(\sigma\) by \(s\), the correct form of the test statistic is:

\[t_{TS} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]

This statistic follows a Student’s t-distribution with degrees of freedom \((df)\) equal to \(n - 1\):

\[t_{TS} \sim t_{df = n-1}\]

Properties of the t-Distribution

The Student’s t-distribution differs from the Normal distribution primarily in having heavier tails. This reflects the increased uncertainty from estimating \(\sigma\):

Key Properties of the t-Distribution 📊

  • Heavier tails: The probability of observing more extreme values is higher compared to the Normal distribution, appropriately accounting for the uncertainty introduced by estimating variability

  • Degrees of freedom (df): The shape of the t-distribution depends on the degrees of freedom, defined as \(df = n - 1\). As sample size \(n\) increases, the t-distribution approaches a standard Normal distribution because the estimate of the standard deviation becomes more precise

  • Symmetric and bell-shaped: Like the normal distribution, centered at 0

Relationship Between Confidence Intervals and Hypothesis Tests

Hypothesis testing and confidence intervals (or confidence bounds) are closely connected concepts. Both approaches use the same underlying principles and assumptions, and they complement each other in interpreting data.

Connection Between CIs and Hypothesis Tests 🔗

Two-sided test (\(H_0: \mu = \mu_0\)) corresponds to a two-sided confidence interval:

\[\left( \bar{x} - t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}, \ \bar{x} + t_{\alpha/2, n-1} \frac{s}{\sqrt{n}} \right)\]

Upper-tailed test (\(H_a: \mu > \mu_0\)) corresponds to a lower confidence bound:

\[\left( \bar{x} - t_{\alpha, n-1} \frac{s}{\sqrt{n}}, \ \infty \right)\]

Lower-tailed test (\(H_a: \mu < \mu_0\)) corresponds to an upper confidence bound:

\[\left( -\infty, \ \bar{x} + t_{\alpha, n-1} \frac{s}{\sqrt{n}} \right)\]

Thus, confidence intervals or bounds can intuitively represent the plausible values for \(\mu\). If the null value \(\mu_0\) is not plausible (not included in the interval or lies beyond the bound), the hypothesis test naturally rejects the null hypothesis.

In practice, confidence intervals (or bounds) provide valuable context. They give a range of reasonable values for the parameter based on the data, enhancing the interpretability of hypothesis testing conclusions.

Part 2: EPA Ozone Concentration Analysis

The Environmental Protection Agency (EPA) regulates ozone concentrations due to potential harmful health effects. Historically, an 8-hour ozone concentration above 70 parts per billion (ppb) is considered unhealthy.

The airquality dataset in base R includes a two-hour daily average (from 1:00 PM to 3:00 PM) rather than an 8-hour average. We’ll use 70 ppb as a benchmark to evaluate ozone concentrations recorded in New York during the summer of 1973.

Research Question: Conduct a hypothesis test to determine if the mean two-hour ozone concentration during the summer months exceeds the EPA threshold.

Question 2a: Assumptions and Exploratory Analysis

Task (i): Clean and Filter Data

Clean the airquality dataset and partition into summer months (June, July, August) and other.

data(airquality)
head(airquality)
airquality_clean <- airquality[complete.cases(airquality),]
airquality_clean$Season <- ifelse(airquality_clean$Month == 6 | airquality_clean$Month == 7 | airquality_clean$Month ==8, "Summer", "Other")

Task (ii): Compute Descriptive Statistics

Compute and clearly report the mean, standard deviation, and sample size for summer ozone concentrations using tapply. Label them xbars, ns, and sds.

Your Summary Table:

Table 11 Descriptive Statistics for Summer Ozone Concentrations

Statistic

Value

Sample Size (n)

Mean (\(\bar{x}\))

Median (\(\tilde{x}\))

Standard Deviation (s)

Task (iii): Create Diagnostic Plots

Construct a histogram with a normal density curve and a QQ-plot to visually check the normality assumption.

n_bins <- max(round(sqrt(ns["Summer"])+2),5)
# Compute the normal density for each observation based on its group
airquality_clean$normal.density <- ifelse(
airquality_clean$Season == "Summer",
dnorm(airquality_clean$Ozone, mean = xbars["Summer"], sd = sds["Summer"]),
dnorm(airquality_clean$Ozone, mean = xbars["Other"], sd = sds["Other"])
)

airquality_clean$intercept <- ifelse(airquality_clean$Season == "Summer",
                                    xbars["Summer"],
                                    xbars["Other"])
airquality_clean$slope <- ifelse(airquality_clean$Season == "Summer",
                                sds["Summer"],
                                sds["Other"])


ggplot(airquality_clean, aes(x = Ozone)) +
geom_histogram(aes(y = after_stat(density)),
                bins = n_bins, alpha = 0.6, position = "identity", color = "black") +
geom_density(col = "red", linewidth = 1) +
geom_line(aes(y = normal.density), col = "blue", linewidth = 1) +
facet_wrap(~Season) +
labs(title = "Histogram and Density of Ozonne Levels by Season",
    x = "Ozone (ppb)",
    y = "Density") +
theme(legend.position = "none")

Task (iv): Evaluate Normality Assumption

Clearly interpret the plots and comment on the appropriateness of using the t-distribution for your hypothesis test.

Your Assessment:

Histogram Analysis:

QQ-Plot Analysis:

Overall Conclusion about Normality:

Appropriateness of t-test:

Question 2b: Full Hypothesis Testing Procedure

Regardless of your conclusions regarding the assumptions, perform the full four-step hypothesis testing procedure utilizing the t.test function in R. Use \(\alpha = 0.05\).

Step 1: Identify the Parameter of Interest

Your Answer ✍️

Define the parameter clearly. For example: “Let \(\mu\) represent…”

Step 2: State the Null and Alternative Hypotheses

State the hypotheses symbolically:

\[H_0:\]
\[H_a:\]

Your hypotheses in words:

Step 3: Calculate the Test Statistic and P-value

Use R’s t.test() function to perform the hypothesis test.

Your Results:

Table 12 Hypothesis Test Results

Statistic

Value

Test Statistic (\(t_{TS}\))

Degrees of Freedom (df)

P-value

Significance Level (\(\alpha\))

0.05

Step 4: Formal Conclusion

Provide a formal conclusion using the template from class.

Conclusion Template 📝

“At the \(\alpha = 0.05\) significance level, we [reject / fail to reject] the null hypothesis. There [is / is not] sufficient evidence to conclude that [state conclusion in context].”

Your Formal Conclusion:

Question 2c: Manual Verification

Verify that the t.test agrees with the formulas by computing the test statistic manually. Compute the p-value by utilizing the pt function in R. Show your work for computing the test statistic below and write the p-value as a probability statement.

Manual Calculation of Test Statistic:

\[t_{TS} = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\]

Your work:

\[t_{TS} =\]

P-value as a Probability Statement:

Question 2d: Confidence Bound

Compute the appropriate 95% one-sided confidence bound related to your stated hypothesis. Determine clearly if the null hypothesis value (70 ppb) is above or below this bound and explain explicitly how this relates to your hypothesis test conclusion.

Which type of confidence bound is appropriate?

For an upper-tailed test \((H_a: \mu > \mu_0)\), we need a lower confidence bound.

Formula:

\[\left( \bar{x} - t_{\alpha, df} \frac{s}{\sqrt{n}}, \ \infty \right)\]

Your Answer:

95% Lower Confidence Bound:

Relationship to Hypothesis Test:

Is 70 ppb above or below this bound?

How does this relate to your hypothesis test conclusion?

Question 2e: Power Calculation

Calculate the power associated with an alternative of \(\mu_a = 120\) ppb, which is the EPA standard for being considered very unhealthy.

Power Calculation:

Your Answer:

Power for \(\mu_a = 120\) ppb:

Interpretation:

What does this power value tell you about the test’s ability to detect very unhealthy ozone levels?

Key Takeaways

Summary 📝

  • Test statistics measure how far the sample evidence is from the null hypothesis, in units of standard error

  • P-values quantify the probability of observing data as extreme as (or more extreme than) what we observed, assuming the null hypothesis is true

  • When the null hypothesis is true, p-values follow a uniform distribution between 0 and 1, and approximately \(\alpha\) proportion will be less than \(\alpha\) (Type I error rate)

  • When the null hypothesis is false, p-values tend to be small (concentrated near 0), with the proportion less than \(\alpha\) representing the statistical power

  • The z-test is used when \(\sigma\) is known; the test statistic follows a standard normal distribution

  • The t-test is used when \(\sigma\) is unknown and estimated by \(s\); the test statistic follows a t-distribution with \(df = n-1\)

  • Confidence bounds provide an alternative perspective on hypothesis tests: if the null value falls outside the confidence bound/interval, we reject the null hypothesis

  • Statistical power depends on sample size, effect size, significance level, and variability. Larger samples and larger effect sizes lead to higher power

  • Always check assumptions (randomness, normality/CLT) before conducting inference procedures

Four-Step Hypothesis Testing Procedure 🎯

Step 1: Clearly identify the parameter of interest

Step 2: State null and alternative hypotheses symbolically

Step 3: Calculate test statistic and p-value (report df if using t-test)

Step 4: State formal conclusion in context at the specified \(\alpha\) level

Computational Skills Developed 💻

In this worksheet, you practiced:

  • Simulating sampling distributions of test statistics and p-values

  • Using rnorm(), rnorm(), and replicate() for Monte Carlo simulations

  • Calculating test statistics and p-values manually and with R functions

  • Using t.test() for one-sample hypothesis tests

  • Using pt() and qt() for t-distribution calculations

  • Creating diagnostic plots for assumption checking

  • Computing one-sided confidence bounds

  • Calculating statistical power for detecting specific alternatives

  • Visualizing distributions with ggplot2