STAT 350 — Exam 2 — Spring 2024

Exam Information

Course: STAT 350 — Introduction to Statistics
Semester: Spring 2024
Total Points: 105
Time Allowed: 60 minutes

Problem	Total Possible	Topic
Problem 1 (True/False, 2 pts each)	12	Sampling Distributions, Inference, Experimental Design
Problem 2 (Multiple Choice, 3 pts each)	9	Experimental Design, CLT, Two-Sample Inference
Problem 3	32	Paired t-test, Power Analysis
Problem 4	20	CLT, Sampling Distribution of \(\bar{X}\)
Problem 5	32	One-Sample t-test, Confidence Bound
Total	105

Problem 1 — True/False (12 points, 2 points each)

Question 1.1 (2 pts)

If a simple random sample is taken from a normally distributed population,

True or False: then the distribution of the sample means follows a normal distribution, regardless of the sample size.

Question 1.2 (2 pts)

If a simple random sample of size 2 or greater is taken from a normally distributed population,

True or False: then the variance of the sample mean is always greater than the population variance.

Question 1.3 (2 pts)

In a simulation run where differences arise from a normal distribution in a paired sample procedure, and 92% confidence intervals are constructed for the mean difference across 1000 independent sets of paired samples,

True or False: exactly 80 of these intervals will not contain the true mean difference.

Question 1.4 (2 pts)

When the significance level (\(\alpha\)) of a statistical test is reduced while holding all other factors constant,

True or False: the power of the test increases.

Question 1.5 (2 pts)

In a two-sample independent t-test using the Welch procedure to account for unequal variances between groups,

True or False: the test statistic is assumed to adhere to an exact t-distribution, provided the assumption of normality holds.

Question 1.6 (2 pts)

In a completely randomized experimental design,

True or False: random assignment of experimental units to treatments helps to minimize potential biases by helping to distribute extraneous variables more evenly across treatment groups.

Problem 2 — Multiple Choice (9 points, 3 points each)

Question 2.1 (3 pts)

In a randomized block design, when we block experimental units based on a specific characteristic, the primary objective is to:

(A) Increase the variability arising from extraneous variables by grouping similar experimental units into blocks, thereby enhancing the detection of treatment effects.

(B) Decrease the variability arising from extraneous variables by grouping similar experimental units into blocks, thereby enhancing the detection of treatment effects.

(C) To allocate treatments to experimental units across blocks in a manner that conceals the treatment identities from both the participants and researchers.

(D) Equalize the allocation of treatments to experimental units within each block to facilitate the administrative convenience of the experiment.

(E) Balance the number of experimental units across blocks to primarily focus on the uniformity of treatment application without direct concern for extraneous or confounding variables.

Question 2.2 (3 pts)

Suppose a simple random sample of size 400 is taken from a skewed population with a known population mean of 200 units and a population standard deviation of 50 units. Which of the following statements is TRUE regarding the standard deviation of the sample mean?

(A) The standard deviation of the sample mean is equal to the population standard deviation, which is 50 units.

(B) The standard deviation of the sample mean cannot be accurately determined from the given information due to the population’s skewed distribution.

(C) The standard deviation of the sample mean, indicative of the sampling distribution’s variability, amounts to 0.125 units.

(D) The standard deviation of the sample mean is as large as 2500 units, indicative of the sampling distribution’s total variability.

(E) The standard deviation of the sample mean, indicative of the sampling distribution’s variability, amounts to 2.5 units.

Question 2.3 (3 pts)

When estimating the difference between two population means using confidence intervals, if a researcher incorrectly uses a pooled variance estimator under the false assumption of equal variances, despite the populations having unequal variances, how does this affect the margin of error for the confidence interval?

(A) The margin of error is unaffected, as the pooled estimator adjusts for variance differences.

(B) The margin of error decreases, reflecting an underestimated standard error due to the assumption violation.

(C) The margin of error increases, reflecting an overestimated standard error due to the assumption violation.

(D) The margin of error may inaccurately reflect the true variability, underestimating or overestimating it based on the sample sizes and actual variances.

(E) The margin of error becomes zero, indicating a failure of the pooled estimator to account for variance differences.

Problem 3 (32 points) — Marshmallow Study: Paired Design and Power

Problem 3 Setup

According to the famous Stanford marshmallow study, children’s ability to wait longer for rewards is positively correlated with their educational achievements. However, subsequent research has identified several factors, such as family background, home environment, and cultural differences, that might influence both a child’s waiting time for rewards and their educational achievements.

Question 3a (3 pts)

Which of the following statements is false?

(A) The positive correlation ensures causation because the marshmallow experiment is conducted before observing their educational achievements.

The positive correlation might be insignificant after considering lurking variables.
The underlying factors are lurking variables in the famous marshmallow study.
None of the above

Question 3b (3 pts)

A researcher is investigating the connection between the birth order of children and their ability to wait longer for rewards. The study will involve 15 randomly chosen households within the Greater Lafayette area, each with at least two children. For each household, the first and second child will be included in the experiment.

In the study, which of the following variables is NOT indirectly controlled by the paired design?

Parent’s occupation
Child’s birth order
Cultural background
Home environment

Question 3c (3 pts)

The researcher believes that the first child will wait longer for the rewards than the second child. Drawing on previous research, the standard deviation of the waiting time difference between the first and second children is established at 37 seconds, and this research consistently indicates that these differences follow a normal distribution. The researchers set a significance level of \(\alpha = 0.05\) and assert that a meaningful difference between the first and second child waiting times would need to exhibit a true difference of 15 seconds. This threshold is determined with the understanding that such a difference would be substantively meaningful.

Select the researcher’s hypotheses. Define \(D = \text{First Child Wait Time} - \text{Second Child Wait Time}\).

\(H_0: \mu_D \leq 0;\ H_a: \mu_D > 0\)
\(H_0: \mu_D \geq 0;\ H_a: \mu_D < 0\)
\(H_0: \mu_D = 0;\ H_a: \mu_D \neq 0\)
None of the above

Question 3d (8 pts)

In the power graph below, clearly label and shade in the region on the graph that signifies the Type II error \(\beta\). Additionally, provide the values of \(\Delta_0\) and \(\Delta_a\), representing the mean difference under the null hypothesis and the meaningful difference for the alternative hypothesis, respectively.

Blank power analysis graph showing two overlapping normal curves centered at unlabeled Delta_0 and Delta_a values. The cutoff value d-bar_cutoff is marked with a dashed vertical line. Students must label Delta_0, Delta_a, and shade the Type II error beta region.

Question 3e (8 pts)

Select the appropriate critical value for determining the cutoff value and calculate the cutoff value \(\bar{d}_\text{cutoff}\). Your answer must indicate both the critical value and the cutoff value.

qnorm(0.01, lower = FALSE)
[1] 2.326348

qnorm(0.025, lower = FALSE)
[1] 1.959964

qnorm(0.05, lower = FALSE)
[1] 1.644854

qnorm(0.95, lower = FALSE)
[1] -1.644854

Question 3f (7 pts)

Utilize the cutoff value calculated in part (e) to calculate the power of the test. Clearly set up the probability to be calculated and show the mathematical steps required to obtain the power of the test and select the correct code and output for computing the power of this test from the table below.

pnorm(1.239448,  lower.tail = FALSE)   # [1] 0.1075898
pnorm(1.239448,  lower.tail = TRUE)    # [1] 0.8924102
pnorm(0.07472524, lower.tail = FALSE)  # [1] 0.4702167
pnorm(0.07472524, lower.tail = TRUE)   # [1] 0.5297833
pnorm(0.3898356, lower.tail = FALSE)   # [1] 0.3483291
pnorm(0.3898356, lower.tail = TRUE)    # [1] 0.6516709

Problem 4 (20 points) — Sampling Distribution of Undergraduate Weights

Problem 4 Setup

Suppose weights of undergraduate students from Lumia University come from a minor positively skewed population with an average weight of 180 lb and with a standard deviation of 20 lb. A researcher randomly selects a sample of 40 undergraduate students from this population.

Question 4a (4 pts)

What is the distribution of the mean weights of the 40 undergraduate students? Clearly specify the name of the distribution and its parameters.

Question 4b (8 pts)

What is the probability of the mean weight of the 40 undergraduate students being less than 175 lb? Clearly set up the probability to be calculated and show the mathematical steps required to obtain the probability. You may use the following R output in your calculations.

pnorm(-0.25, lower.tail = TRUE)    # 0.4012937
pnorm(-0.25, lower.tail = FALSE)   # 0.5987063
pnorm(-1.58, lower.tail = TRUE)    # 0.05705343
pnorm(-1.58, lower.tail = FALSE)   # 0.9429466
pnorm(-2.53, lower.tail = TRUE)    # 0.005703126
pnorm(-2.53, lower.tail = FALSE)   # 0.9942969

Question 4c (8 pts)

What is the 99th percentile of the mean weight of the 40 undergraduate students? You may use the following R output in your calculations. Show your work.

qnorm(0.01/2, lower.tail = FALSE)   # 2.575829
qnorm(0.01,   lower.tail = FALSE)   # 2.326348
qnorm(0.04/2, lower.tail = FALSE)   # 2.053749
qnorm(0.04,   lower.tail = FALSE)   # 1.750686

Problem 5 (32 points) — Pharmaceutical Regulation Hypothesis Test

Problem 5 Setup

For all pharmaceutical companies producing pills of Medicine A, federal regulations mandate that the ratio of the average weight to a fixed field standard value must not exceed 1.2. For Medicine A, this fixed field standard is 210 mg.

Question 5a (4 pts)

Formulate the government regulation requirement as an inequality using the average weight (\(\mu\)) of the pills. The inequality should be structured to place the average weight of the pills on one side by itself and a numerical value on the other side. How does this value relate to the hypothesis we formulate regarding the average weight of the pills?

Question 5b (4 pts)

Placebo-Potion Pharmaceuticals was selected during the initial screening to conduct a formal hypothesis test, with \(n = 150\), \(\alpha = 0.01\), on whether the average pill weight exceeds the regulatory limit, indicating failure to meet the regulation. State the appropriate null and alternative hypotheses.

Question 5c (16 pts)

Placebo-Potion Pharmaceuticals collected an SRS from their production lines accordingly, and found that the sample mean was 256.5 mg and the sample standard deviation was 25 mg.

Part i (6 pts): Compute the appropriate test statistic. Show all work.

Part ii (3 pts): Select the appropriate code to compute the \(p\)-value from the table below.

# (A) pnorm(test_statistic)
# (B) pt(test_statistic, df = 150)
# (C) pt(test_statistic, df = 149)
# (D) pnorm(test_statistic, lower.tail = FALSE)
# (E) pt(test_statistic, df = 150, lower.tail = FALSE)
# (F) pt(test_statistic, df = 149, lower.tail = FALSE)
# (G) pnorm(abs(test_statistic), lower.tail = FALSE)
# (H) pt(abs(test_statistic), df = 150, lower.tail = FALSE)
# (I) pt(abs(test_statistic), df = 149, lower.tail = FALSE)

Part iii (9 pts): The \(p\)-value was found to be 0.0145 using the appropriate test statistic and code from above. State the decision and provide a formal conclusion about Placebo-Potion Pharmaceuticals’ compliance with the regulation based on this hypothesis test.

Question 5d (6 pts)

In addition to the hypothesis test, the company was instructed to establish a confidence region for the true mean weight of the pills. Based on the results of the hypothesis test, which confidence region selection aligns most closely with those findings?

Part i (3 pts) — Type:

Confidence interval
Lower confidence bound
Upper confidence bound

Part ii (3 pts) — Interval/Bound Values:

(251.17, 261.83)
(253.17, 259.83)
\((251.6997,\ \infty)\)
\((253.6997,\ \infty)\)
\((-\infty,\ 261.3003)\)