STAT 350 — Exam 2 — Spring 2025

Exam Information

Course: STAT 350 — Introduction to Statistics
Semester: Spring 2025
Total Points: 105
Time Allowed: 60 minutes

Problem	Total Possible	Topic
Problem 1 (True/False, 2 pts each)	12	CLT, Experimental Design, Confidence Intervals, Power
Problem 2 (Multiple Choice, 3 pts each)	15	Experimental Design, CLT, Sampling Distributions, Power
Problem 3	24	One-Sample t-test, Confidence Bound
Problem 4	23	Paired t-test
Problem 5	31	Two-Sample Independent t-test, Confidence Interval
Total	105

Problem 1 — True/False (12 points, 2 points each)

Question 1a (2 pts)

Consider a sequence \(X_1, X_2, \ldots, X_n\) of independent and identically distributed random variables drawn from a population \(f_X(x)\) with finite mean \(\mu\), and finite variance \(\sigma^2 > 0\). Define the sample sum as:

\[S_n = \sum_{i=1}^{n} X_i,\]

where the subscript \(n\) explicitly indicates that the sum is over \(n\) random variables.

True or False: According to the central limit theorem (CLT), the standardized sample sum

\[\frac{(S_n - n\mu)}{\sqrt{n} \cdot \sigma}\]

is exactly distributed as a standard Normal distribution regardless of the shape of \(f_X(x)\), provided \(n \geq 30\).

Question 1.2 (2 pts)

In a randomized block design (RBD), treatments are randomly assigned to experimental units within distinct blocks.

True or False: This is done to balance rather than mitigate or remove the impact of extraneous variables on experimental results.

Question 1.3 (2 pts)

Denote \(\tau_n = \dfrac{1}{\sqrt{n}} \, SD(X_1 + X_2 + \cdots + X_n)\), where the \(X_i\)’s are independent and identically distributed with finite variance \(\sigma^2\).

True or False: Then it follows that \(\tau_3 > \tau_4\).

Question 1.4 (2 pts)

A researcher is calculating the sample size needed for a confidence interval with a fixed margin of error. The population standard deviation is known.

True or False: The required sample size increases as the confidence coefficient decreases.

Question 1.5 (2 pts)

The functions pnorm() and pt() are available functions in R (RStudio),

True or False: they directly provide critical values for constructing confidence intervals and bounds.

Question 1.6 (2 pts)

When conducting a hypothesis test in which the alternative hypothesis is true,

True or False: the probability of rejecting the null hypothesis increases by taking a larger sample size.

Problem 2 — Multiple Choice (15 points, 3 points each)

Question 2.1 (3 pts)

Which of the following techniques is primarily used to control or reduce variability arising from extraneous factors in experimental design?

Blocking (B) Replication (C) Randomization (D) Realism (E) All of the above

Question 2.2 (3 pts)

Time, resources, and practicality influence statistical studies. Which of the following is essential for drawing generalizable conclusions from a statistical investigation?

Selecting a representative sample
Having a sufficiently large sample size
Using appropriate well-established statistical methods
Clearly defining the population of interest
All of the above are essential for drawing generalizable conclusions

Question 2.3 (3 pts)

You used Glassgate, a platform where salaries are posted anonymously, to look up salary information for a small-sized consulting firm. You found salary data from 13 verified junior data scientists.

Which of the following statements is NOT always true regarding the sampling distribution of the average salary computed from these 13 junior data scientists?

(A) If the population distribution has a strong positive skew, the Central Limit Theorem cannot be applied.

(B) Without additional information about the population distribution, the exact sampling distribution cannot be determined.

(C) The mean of the sampling distribution, \(\mu_{\bar{X}}\), is equal to the population mean \(\mu_X\), regardless of sample size.

(D) The standard deviation of the sampling distribution \(\sigma_{\bar{X}}\), is always smaller than the population standard deviation \(\sigma_X\) when the sample size exceeds 2.

(E) The central limit theorem ensures that the sampling distribution of \(\bar{X}\) is approximately Normal.

Question 2.4 (3 pts)

A delivery company, CargoSwift Logistics, operates small vans that regularly transport packages between warehouses. Each trip includes a fixed set of loading equipment (metal securing racks, crates, and straps) weighing exactly 30 lbs. The remaining cargo consists of sixteen individual packages, each with weights independently and identically distributed as follows:

\[X_i \sim \text{Uniform}(a = 44,\ b = 56), \quad i = 1, 2, \ldots, 16.\]

Recall that \(E[X_i] = \dfrac{a+b}{2}\) and \(\text{Var}(X_i) = \dfrac{(b-a)^2}{12}\).

The total weight \(T\) for a typical truckload of 16 packages is given by:

\[T = 30 + \sum_{i=1}^{16} X_i.\]

CargoSwift’s delivery vans have a maximum safe weight load capacity of 850 lbs. Select the correct code to calculate the approximate probability that a randomly selected van containing 16 packages would exceed the safe weight load.

pt(1.4434, df = 15, lower.tail = FALSE)
pnorm(1.4434, lower.tail = FALSE)
pt(3.6084, df = 15, lower.tail = FALSE)
pnorm(3.6084, lower.tail = FALSE)
pt(5.7735, df = 15, lower.tail = FALSE)
pnorm(5.7735, lower.tail = FALSE)
punif(850, min = 704, max = 896, lower.tail = FALSE)

Question 2.5 (3 pts)

A nutritionist conducts a study to test whether a new dietary program significantly reduces cholesterol levels from a known baseline of 200 mg/dL. They collect data from 25 participants and observe a sample standard deviation of 18 mg/dL. They plan to perform a lower-tailed hypothesis test at significance level \(\alpha = 0.05\).

Select the correct lines of R code to calculate the approximate power of this test for detecting a reduction in the mean cholesterol to 190 mg/dL:

# (A)
cutoff <- 200 + qt(0.05, df=24, lower.tail=FALSE)*(18/sqrt(25))
pt((cutoff - 190)/(18/sqrt(25)), df=24, lower.tail=TRUE)

# (B)
cutoff <- 190 - qt(0.95, df=24, lower.tail=FALSE)*(18/sqrt(25))
pt((cutoff - 200)/(18/sqrt(25)), df=24, lower.tail=TRUE)

# (C)
cutoff <- 200 - qnorm(0.95, lower.tail=FALSE)*(18/sqrt(25))
pnorm((cutoff - 190)/(18/sqrt(25)), lower.tail=FALSE)

# (D)
cutoff <- 200 + qnorm(0.05, lower.tail=FALSE)*(18/sqrt(25))
pnorm((cutoff - 190)/(18/sqrt(25)), lower.tail=FALSE)

# (E)
cutoff <- 200 - qt(0.05, df=24, lower.tail=FALSE)*(18/sqrt(25))
pt((cutoff - 190)/(18/sqrt(25)), df=24, lower.tail=TRUE)

Problem 3 (24 points) — SUV Fuel Efficiency

Problem 3 Setup

A car manufacturer advertises that its new compact SUV averages 40 miles per gallon (mpg). A consumer advocacy group wants to evaluate this claim and believes the true average may be lower. They obtain 54 SUVs from the same model year and measure their fuel efficiency under combined city/highway driving, simulating what an average driver might encounter. After recording each vehicle’s mileage under these controlled yet representative conditions, the group finds a sample mean of 37.8 mpg. The population’s standard deviation is unknown. They plan to conduct a hypothesis test at a 3% significance level to assess the manufacturer’s claim.

Question 3a (3 pts)

Identify the parameter of interest. Clearly state its symbolic notation and define it briefly in the context of this scenario.

Question 3b (4 pts)

Write the null and alternative hypotheses clearly in symbolic notation.

Question 3c (8 pts)

The consumer advocacy group reports the corresponding confidence bound at 38.43. Using this confidence bound and the provided critical values, deduce the value of the sample standard deviation. Provide your answer rounded to four decimal places.

qnorm(0.03, lower.tail=FALSE)      # = 1.88
qt(0.03,   df=53, lower.tail=FALSE) # = 1.92
qnorm(0.015, lower.tail=FALSE)     # = 2.17
qt(0.015,  df=53, lower.tail=FALSE) # = 2.23

Question 3d (2 pts)

Is the reported confidence bound of 38.43 in part (c) an upper bound or a lower bound? Please mark the correct option.

Upper Bound (B) Lower Bound

Question 3e (7 pts)

Using the results of parts (c) and (d), obtain the result of the hypothesis test and write the formal contextual conclusion.

Problem 4 (23 points) — Special Agent Gibbs: Veterans Mental Stability

Problem 4 Setup

Special Agent Gibbs decided to pursue his career in academia specializing in national security, post-traumatic stress, and investigation strategies. As part of his research, Gibbs requested access to a sensitive dataset containing information on veterans. Due to privacy and security considerations, the custodians of the dataset could not release it directly to Gibbs. Instead, they provided Gibbs with detailed descriptions of the available variables and asked him to submit clearly defined research questions. Their analyst team would then run analyses on the secure data and provide Gibbs with appropriate statistical summaries, test statistics, and supporting details. Assume none of the population standard deviations are known.

Question 4a (3 pts)

Five of Gibbs’ research questions happen to be lower-tailed hypotheses, \(H_a: \mu < \mu_0\). Which one of the following test statistics would be most likely to reject \(H_0\)? Assume the same degrees of freedom for all five test statistics.

\(t_{ts} = -2.25\) (B) \(t_{ts} = -1.02\) (C) \(t_{ts} = 0.02\) (D) \(t_{ts} = 1.56\) (E) \(t_{ts} = 3.02\)

Question 4b (3 pts)

Gibbs believes that veterans’ financial status may vary by different factors, such as location, household composition, and health status, so he asked the analyst team if they can control these extraneous variables. Without direct access to the dataset himself, which of the following strategies is most appropriate for the analyst team to handle these extraneous variables based on Gibbs’ request?

Discard any cases which are deemed extreme by Gibbs to obtain a consistent dataset.

(B) Partition the data into blocks (or strata) aligned with these extraneous variables, thereby reducing their confounding influence during analysis.

(C) Randomly select a small number of veterans broadly representative of the entire population, simplifying the analysis.

(D) Ignore extraneous variables since a small bias is acceptable if it results in reduced variance.

(E) Control all extraneous variables from the beginning by requiring veterans to live according to randomly assigned conditions.

Question 4c (3 pts)

Gibbs identified two variables: social isolation level (SIL, categorical) and mental stability score (MSS, numerical). SIL has four categories: socially active, somewhat socially active, somewhat isolated, and completely isolated. Specifically, Gibbs wants to see if the mean difference in MSS between somewhat isolated and completely isolated groups is greater than 20 points.

The analysts, acting on Gibbs’ request, paired 50 veterans from the somewhat isolated group with 50 veterans from the completely isolated group. Pairing was done based on demographics and veteran history to control possible extraneous factors. After matching, the difference in mental stability scores was computed for each pair as \(D = \text{Somewhat Isolated} - \text{Completely Isolated}\).

Which of the following hypothesis testing procedures is appropriate to answer Gibbs’ question?

One-sample \(t\)-test
Two-sample Independent \(t\)-test
Two-sample Matched Pairs \(t\)-test
None of the above

Question 4d (4 pts)

State clearly the null and alternative hypotheses using the appropriate mathematical notations.

Question 4e (3 pts)

Gibbs wants to find the \(p\)-value of a test statistic, \(t_{ts} = 2.14\). Which of the following R code statements returns the correct \(p\)-value?

# (A) pt(2.14, df = 50, lower.tail = FALSE)
# (B) pt(2.14, df = 50, lower.tail = TRUE)
# (C) pt(2.14, df = 49, lower.tail = FALSE)
# (D) pt(2.14, df = 49, lower.tail = TRUE)
# (E) pt(2.14, df = 99, lower.tail = FALSE)
# (F) pt(2.14, df = 99, lower.tail = TRUE)

Question 4f (7 pts)

The calculated \(p\)-value is 0.01868. At a significance level of \(\alpha = 0.02\), state your formal decision and conclusion in the context of the problem.

Problem 5 (31 points) — Delivery Courier Mileage: California vs. Non-California

Problem 5 Setup

The rapid growth of food delivery services has dramatically increased the number of two-wheeled couriers on city streets. While this surge has benefited local businesses and consumers, it has also led to increased traffic congestion and safety concerns, prompting some cities to consider stricter delivery regulations (MassLive, 2025).

California cities, in particular, are often at the forefront of adopting innovative transportation policies and stricter environmental and safety regulations. To better understand the potential impact of these regulations and inform future policy decisions, a food delivery analytics firm seeks to compare the total monthly mileage traveled by two-wheeled delivery couriers employed by a major third-party platform. Mileage is measured as the combined total distance traveled by all couriers within each city over a four-week period in late summer 2024. Using the summary statistics below, the firm aims to determine if there is a significant difference in the average total monthly mileage between California and non-California cities.

Statistic	California cities	Non-California cities
\(n\)	11	11
\(\bar{x}\)	752,962	824,387.6
\(s\)	697,033.6	918,850.2

Question 5a (4 pts)

State the assumptions necessary for the hypothesis test on the average total monthly mileage between California and non-California cities.

Question 5b (4 pts)

As part of your summer internship with the food delivery analytics firm, you have been asked to determine if there is a significant difference in average total monthly mileage between California and non-California cities. Using \(\alpha = 0.03\), perform the first two steps of the four-step hypothesis test procedure.

Question 5c (8 pts)

Calculate the appropriate test statistic for comparing the mean total mileage between California and non-California cities. Clearly show the steps and formula used.

Question 5d (3 pts)

Without additional assumptions, which of the following represents the most appropriate degrees of freedom for the test statistic found in part (c)?

10 (B) 20 (C) 18.646 (D) None of the above

Question 5e (3 pts)

Select the correct R code and output that provides the correct critical value for constructing a 97% confidence interval. Assume \(\nu\) represents the correct degrees of freedom if appropriate.

# (A) qnorm(p=0.03/2, lower.tail = FALSE)   -> 2.17009
# (B) qt(p=0.03/2, df=nu, lower.tail = FALSE) -> 2.349237
# (C) 2*qnorm(p=0.03, lower.tail = FALSE)   -> 3.761587
# (D) 2*qt(p=0.03, df=nu, lower.tail = FALSE) -> 4.004843

Question 5f (6 pts)

Using the summary statistics provided and the critical value, calculate the 97% confidence interval for the difference in mean total monthly mileage (California minus non-California). Clearly show all necessary formulas and steps.

Question 5g (3 pts)

Provide an accurate interpretation of the confidence interval in context, explaining what it indicates about the difference in average total monthly mileage between California and non-California cities.