STAT 350 — Exam 2 — Fall 2025

Exam Information

Course: STAT 350 — Introduction to Statistics
Semester: Fall 2025
Total Points: 105
Time Allowed: 60 minutes

Problem	Total Possible	Topic
Problem 1 (True/False, 2 pts each)	12	CLT, Sampling Distributions, Experimental Design, Inference
Problem 2 (Multiple Choice, 3 pts each)	15	Estimation Bias, Estimator Properties, Two-Sample Procedures, Power
Problem 3	27	Two-Sample Paired t-test
Problem 4	28	One-Sample z-test, Power Analysis, Sample Size
Problem 5	23	Sampling Distributions, Normal Population
Total	105

Problem 1 — True/False (12 points, 2 points each)

Question 1.1 (2 pts)

Assume that \(X_1, X_2, \cdots, X_n\) is a random sample from a Cauchy distribution, which has an undefined expected value and variance (i.e., \(E[X_i]\) and \(\text{Var}(X_i)\) are not finite).

True or False: Performing a one-sample hypothesis test for the population mean (\(\mu\)) is valid if the sample size (\(n\)) is sufficiently large.

Question 1.2 (2 pts)

The lengths of pregnancies are normally distributed with a mean of 268 days and a standard deviation of 15 days.

True or False: Then the probability that a randomly selected pregnant woman’s pregnancy length is less than 265 is larger than the probability that the mean pregnancy length of a random sample of 40 pregnant women is less than 265.

Question 1.3 (2 pts)

In experimental design, researchers often encounter extraneous variables that may influence the response variable alongside the factor of interest.

True or False: In a Randomized Block Design (RBD), blocks are used to control the factor of interest, while randomization controls extraneous variables.

Question 1.4 (2 pts)

A researcher conducts a hypothesis test and obtains a \(p\)-value of 0.03.

True or False: This means there is a 3% probability that the null hypothesis is true.

Question 1.5 (2 pts)

A 95% confidence interval for the population mean \(\mu\) is constructed from sample data.

True or False: A sample mean \(\bar{x}\) falls within the 95% confidence interval for all possible samples from the population.

Question 1.6 (2 pts)

A researcher wants to estimate the average income in a city with diverse neighborhoods.

True or False: In a stratified random sample, the city is divided into neighborhoods (strata), and then a few complete neighborhoods are randomly selected and all residents within those neighborhoods are surveyed.

Problem 2 — Multiple Choice (15 points, 3 points each)

Question 2.1 (3 pts)

A student proposes an estimator for an unknown parameter \(\theta\). The estimator \(\hat{\theta}\) has expected value \(E[\hat{\theta}] = \left(\dfrac{n}{n-1}\right)\theta + 5\), where \(n\) is the sample size. What is the exact bias (for finite \(n\)) and the asymptotic bias as \(n \to \infty\) of this estimator?

Exact bias \(= 5\); Asymptotic bias \(= 0\)
Exact bias \(= \left(\dfrac{1}{n-1}\right)\theta\); Asymptotic bias \(= 5\)
Exact bias \(= \left(\dfrac{1}{n-1}\right)\theta + 5\); Asymptotic bias \(= 5\)
Exact bias \(= \left(\dfrac{1}{n-1}\right)\theta + 5\); Asymptotic bias \(= 0\)
Exact bias \(= \left(\dfrac{n}{n-1}\right)\theta + 5\); Asymptotic bias \(= \theta\)

Question 2.2 (3 pts)

Three estimators, \(\hat{\theta}_A\), \(\hat{\theta}_B\), \(\hat{\theta}_C\), are constructed for an unknown target parameter \(\theta\), and their sampling distributions are visualized in the graph below. Which of the following statements is TRUE about the estimators?

Graph showing sampling distributions of three estimators A, B, and C for a target parameter theta marked by a dashed vertical line. Curve A (red, wide and flat) is centered well to the left of theta, indicating high variance and negative bias. Curve B (green, tall and narrow) is centered to the right of theta, indicating low variance but positive bias. Curve C (blue, moderate width) is centered at theta, indicating unbiased with moderate variance.

(A) \(\hat{\theta}_B\) is preferred over \(\hat{\theta}_A\) because \(\hat{\theta}_B\) has a smaller variance.

(B) \(\hat{\theta}_A\) is preferred over \(\hat{\theta}_C\) if \(\hat{\theta}_A\) has a smaller bias.

(C) \(\hat{\theta}_C\) is preferred over \(\hat{\theta}_A\) even if \(\hat{\theta}_A\) has a smaller bias.

(D) On repeated samples, \(\hat{\theta}_A\) values hardly vary around the true parameter.

The best estimate can be determined only after obtaining realized values.

Question 2.3 (3 pts)

Two fertilizers are tested on different plots to compare their effects on crop yield. Summary statistics: \(n_1 = 22\), \(s_1 = 19.5\) kg/hectare and \(n_2 = 20\), \(s_2 = 4.7\) kg/hectare. Which statistical inference procedure is most appropriate for comparing mean yields?

One-sample \(t\)-procedures
Two-sample paired \(t\)-procedures
Pooled two-sample independent \(t\)-procedures
Welch two-sample independent \(t\)-procedures

Question 2.4 (3 pts)

A researcher wants to test the effect of a new type of feed on the weight gain of chickens. They have 100 chickens, but they are housed in 10 different coops (10 chickens per coop). The researcher knows that conditions (like temperature and lighting) vary slightly between coops, which might affect weight gain. To account for this, the researcher randomly assigns 5 chickens within each coop to the new feed and the other 5 to the standard feed.

Which experimental design technique is demonstrated by separating the chickens by coop before assigning the feed?

Completely Randomized Design (B) Randomized Block Design (C) Simple Random Sample (D) Matched Pairs Design (E) Stratified Random Sampling

Question 2.5 (3 pts)

Chloride deposits are markers for early Mars’ aqueous past with important implications for our understanding of Mars’ climate and habitability. Purdue scientists are in the process of investigating high-resolution image surfaces of 33 chloride deposits from the southern highlands of Mars. Researchers from a different university have claimed that the mean diameter of chloride deposits is 1650 m with the standard deviation of 779.42 m. Studies of geological features suggest that the diameters of natural deposits tend to follow approximately symmetric distributions. Based on the Central Limit Theorem and assuming the other researchers’ claim correctly describes the population, which of the following statements is incorrect?

(A) The standard deviation of the sampling distribution of \(\bar{X}\) for Purdue investigations should be 779.42 m.

(B) The mean of the sampling distribution of \(\bar{X}\) for Purdue investigations is 1650 m.

(C) The sampling distribution of \(\bar{X}\) for Purdue investigations is approximately normal.

We cannot assume the population distribution of deposit diameters is exactly normal.

Problem 3 (27 points) — Coffee Roasting Machine Moisture Test

Problem 3 Setup

A coffee company is testing a new, faster roasting machine. The old machine roasts beans to a target mean moisture level of 8.0%. The company suspects the new machine (N) produces a different mean moisture level than the old machine (O).

They conduct an experiment. They take 16 batches of the same type of green coffee bean. For each batch, they split it in half, roasting one half with the new machine and the other half with the old machine. The moisture level for each roasted half is recorded.

The company calculates the difference for each batch: \(D = \text{Moisture}_N - \text{Moisture}_O\). The data for the 16 differences yields a sample mean difference of 0.25% and a standard deviation of differences of 0.60%. The researchers have verified that the distribution of differences is approximately normal.

Question 3a (2 pts)

Which testing procedure is appropriate for this experiment?

Two-sample independent \(t\)-test
Two-sample paired \(t\)-test

Question 3b (4 pts)

Explain what characteristic(s) in the experimental design motivated your choice of testing procedure in part (a).

Question 3c (2 pts)

Provide the first two steps of the four-step hypothesis testing procedure.

Question 3d (10 pts)

Calculate the test statistic for this test. Show your work.

Question 3e (3 pts)

Select the appropriate code to compute the \(p\)-value below.

# (A) pt(test_statistic, df=15, lower.tail=TRUE)
# (B) 2*pt(abs(test_statistic), df=15, lower.tail=TRUE)
# (C) 2*pt(abs(test_statistic), df=15, lower.tail=FALSE)
# (D) pt(test_statistic, df=15, lower.tail=FALSE)
# (E) pt(test_statistic, df=25.8734, lower.tail=TRUE)
# (F) 2*pt(abs(test_statistic), df=25.8734, lower.tail=TRUE)
# (G) 2*pt(abs(test_statistic), df=25.8734, lower.tail=FALSE)
# (H) pt(test_statistic, df=25.8734, lower.tail=FALSE)

Question 3f (6 pts)

The \(p\)-value for the correct test was found to be 0.1805. Using a significance level of \(\alpha = 0.1\), state your formal decision and write a conclusion in the context of the problem.

Note on p-value ⚠️

The \(p\)-value reported in the solution key (0.1805) does not match the computed test statistic. With \(t_\text{TS} = 1.6667\) and \(df = 15\), the correct \(p\)-value is \(2 \times P(T_{15} > 1.6667) = \mathbf{0.1163}\). Both values lead to the same decision (fail to reject at \(\alpha = 0.10\)), so the conclusion is unaffected. The solution below uses the key’s reported value of 0.1805.

Problem 4 (28 points) — Cranberry Farm pH Test

Problem 4 Setup

Jamie owns a small cranberry farm that primarily grows the Stevens variety, which is known for being sweeter and less tart than Early Black. Recently, he planted a small patch of Early Black cranberries for his daughter, who loves tart berries. After a few years, some regular customers have claimed that the Stevens cranberries have become more tart.

Jamie wants to test whether the presence of Early Black cranberries has caused an increase in tartness of his cranberries. Industry standards indicate that pH measurements from Stevens cranberry batches have an average pH of 2.6 with a standard deviation of 0.3. Research indicates that most people can detect a pH change of 0.2 (lower pH = more tart).

Question 4a (2 pts)

Provide the first two steps of the four-step hypothesis testing procedure.

Question 4b (2 pts)

Identify the mean and standard deviation of the sampling distribution of \(\bar{X}\) under the null and alternative hypotheses to detect a pH change of 0.2. Since the sample size is currently unknown you may use \(n\) to represent it.

Question 4c (10 pts)

Calculate the minimum sample size required to achieve 90% statistical power for detecting a pH difference of 0.2 in the direction that would indicate increased tartness at \(\alpha = 0.04\). Show your work and clearly identify both: (i) the critical \(z\)-value for your significance level, and (ii) the critical \(z\)-value for your desired power.

qnorm(0.01, lower.tail = FALSE)  # 2.326348
qnorm(0.02, lower.tail = FALSE)  # 2.053749
qnorm(0.04, lower.tail = FALSE)  # 1.750686
qnorm(0.05, lower.tail = FALSE)  # 1.644854
qnorm(0.1,  lower.tail = FALSE)  # 1.281552
qnorm(0.2,  lower.tail = FALSE)  # 0.8416212

Question 4d (8 pts)

Assume Jamie collects a random sample of 25 cranberry batches and measures the average pH of each batch. Determine the cutoff pH value (\(\bar{x}_\text{cutoff}\)) corresponding to a significance level \(\alpha = 0.04\). Assume the population standard deviation remains \(\sigma = 0.3\) (unchanged from the industry standard). Show your work.

qnorm(0.04, lower.tail = FALSE)  # 1.750686

Question 4e (3 pts)

Which of the following R code correctly computes the statistical power of the test?

# (A) pnorm((cutoff-2.6)/0.3,  lower.tail=TRUE)
# (B) pnorm((cutoff-2.6)/0.06, lower.tail=TRUE)
# (C) pnorm((cutoff-2.6)/0.06, lower.tail=FALSE)
# (D) pnorm((cutoff-2.4)/0.3,  lower.tail=FALSE)
# (E) pnorm((cutoff-2.4)/0.06, lower.tail=TRUE)
# (F) pnorm((cutoff-2.4)/0.06, lower.tail=FALSE)

Question 4f (3 pts)

Which of the following interventions will improve the statistical power of the test, assuming all other factors remain constant? Select all that apply.

Plant more Early Black cranberries on the farm.
Randomly sample more bags of cranberries from the farm.
Move Early Black bushes to a greenhouse with its own beehive.
Use a more precise pH measuring device.

Problem 5 (23 points) — South African Dairy Production

Problem 5 Setup

South Africa contributes significantly to the global dairy market. The most productive provinces are Limpopo and Mpumalanga. Limpopo cows average 27.3 liters of milk a day, with a standard deviation of 2.4 liters. For Mpumalanga cows, the mean daily production is 25.0 liters, with a standard deviation of 3.2 liters. Assume that the milk production for these provinces follow normal distributions.

Question 5a (2 pts)

True or False: We require sample sizes of at least 30 for the sampling distribution of the average daily milk production in both provinces to be approximately Normal.

Question 5b (8 pts)

A random sample of 20 Limpopo cows was selected for a study and an average of 31 liters of milk per day was recorded. How many standard deviations is this average away from its population mean?

Question 5c (10 pts)

A Mpumalanga farmer has 20 cows. There is a 50% chance each day that the total daily production from this herd is at most how many liters? Justify your answer.

Question 5d (3 pts)

A Limpopo farmer has 20 cows. What is the probability that the average milk production for this herd exceeds 38 liters a day?

# (A) pnorm((38-27.3)/2.4,      lower.tail=FALSE)
# (B) 1-pnorm((38-25)/3.2,      lower.tail=TRUE)
# (C) 1-pnorm((38-27.3)/0.5367, lower.tail=TRUE)
# (D) pt((38-27.3)/0.5367, df=19, lower.tail=FALSE)