Final Exam — Spring 2026: Worked Solutions

Exam Information

Course: STAT 350 — Introduction to Statistics
Semester: Spring 2026
Total Points: 150 + 15 (Extra Credit) = 165
Time Allowed: 120 minutes
Coverage: Cumulative (Chapters 1–13); primary emphasis on Chapters 12–13, with Chapters 1–7 weighted more heavily than Chapters 8–11 among the earlier material

Problem	Total Possible	Topic
Problem 1 (True/False, 2 pts each)	20	Linear Transformations, Independence, Poisson, Uniform, PDF Interpretation, ANOVA Methods, F-test, Regression Units, CI Duality, Normality Assumption
Problem 2 (Multiple Choice, 3 pts each)	18	Venn Diagrams, Exponential Memoryless, Conditional Normal, Bonferroni, ANOVA Assumptions, Scatter Plots/Correlation
Problem 3	24	Piecewise PDF/CDF, Variance of Transformation
Problem 4	26	Total Probability, Bayes’ Rule, Binomial Distribution
Problem 5	40	One-Way ANOVA, Tukey HSD
Problem 6	37	Simple Linear Regression, LINE Assumptions, Prediction, Confidence Interval
Total	150 (+ 15 Extra Credit)

Exam PDF

Solution Key PDF

—

Problem 1 — True/False (20 points, 2 pts each)

Question 1.1 (2 pts)

A sensor records temperatures in Celsius. A data analyst converts every observation to Fahrenheit using \(F = 1.8C + 32\).

The sample standard deviation of the Fahrenheit data is exactly 1.8 times the sample standard deviation of the Celsius data.

Question 1.2 (2 pts)

A mechanical engineer classifies each manufactured part as exactly one of three grades: \(A\) (premium), \(B\) (standard), or \(C\) (substandard). The historical probability for the classification to each grade are \(P(A) = 0.3\), \(P(B) = 0.6\), and \(P(C) = 0.1\).

Events \(A\) and \(C\) are dependent.

Question 1.3 (2 pts)

During an NFL season, a sports analytics team tracks all reported injuries sustained by a single team per game quarter, including those not immediately apparent to viewers such as minor strains and aggravations recorded on the official injury report. It has been historically observed that the team averages approximately 1.4 reported injuries per quarter across all games. However, detailed records reveal that injury rates in the 4th quarter are consistently higher than in the 1st quarter, as cumulative fatigue increases injury risk throughout a game.

A single \(\text{Poisson}(\lambda = 1.4)\) model applied uniformly across all four quarters would violate an assumption of the Poisson process.

Question 1.4 (2 pts)

Two CNC machines produce bolts whose diameters (in mm) follow continuous uniform distributions. Machine A produces bolts with diameters following \(\text{Uniform}(9.5, 10.5)\) and Machine B produces bolts with diameters following \(\text{Uniform}(9.8, 10.8)\).

The probability that Machine A produces a bolt with a diameter between 9.8 and 10.0 is the same as the probability that Machine B does.

Question 1.5 (2 pts)

A quality engineer models the lifespan of a sensor component using a continuous distribution. She computes \(f_X(500) = 0.003\), where \(f_X\) is the PDF of the lifespan in hours.

Therefore the engineer is certain that a lifespan of 500 hours is very rare.

Question 1.6 (2 pts)

A one-way ANOVA with 5 groups produces a significant \(F\)-test.

If the researcher wants to compare all 10 possible pairs of means, Dunnett’s method is more appropriate than Tukey’s method.

Question 1.7 (2 pts)

In a one-way ANOVA, if the between-group variability is large relative to the within-group variability,

then the \(F\)-test statistic will tend to be large, giving more evidence against the null hypothesis that all population means are equal.

Question 1.8 (2 pts)

A biostatistician plans to fit a simple linear regression line to predict male adults’ height using their femur bone length. Both variables are measured in inches in the original data. Before fitting a regression line, the unit has changed to millimeters for universal applications in medical fields.

The \(p\)-value of a regression slope remains constant even after the unit change.

Question 1.9 (2 pts)

Suppose that \((-3, 4)\) is a 95% confidence interval for \(\beta_0\) in a simple linear regression. For some constant \(c\), we perform hypothesis testing \(H_a\colon \beta_0 \neq c\) at \(\alpha = 0.05\).

We reject the null hypothesis if \(c\) is within the confidence interval.

Question 1.10 (2 pts)

A materials engineer fits a simple linear regression to predict tensile strength (\(Y\)) from carbon content (\(X\)) in steel alloys. Before checking residuals, the engineer examines the distribution of \(Y\) values alone and finds them to be strongly right-skewed.

The normality assumption of the simple linear regression model is violated.

—

Problem 2 — Multiple Choice (18 points, 3 pts each)

Question 2.1 (3 pts)

A software quality team reviews 200 applications for two categories of defects before deployment. Let \(M = \{\text{has memory leak issues}\}\) and \(B = \{\text{has performance bottleneck issues}\}\).

The Venn diagram shows the number of applications in each region.

Venn diagram of software defect classification (n = 200). M (Memory leaks) circle contains 40 in M-only, 20 in the intersection with B. B (Bottlenecks) circle contains 30 in B-only. Outside both circles: 110 (Neither).

Which of the following counts is computed correctly? (\(|\cdot|\) denotes size of set)

1. \(|M' \cap B'| = 110\)
1. \(|M' \cap B| = 180\)
1. \(|M \cap B'| = 30\)
1. \(|M \cup B'| = 150\)

Question 2.2 (3 pts)

A semiconductor fabrication line experiences random equipment faults. The time (in hours) between faults follows an Exponential distribution with rate \(\lambda = 0.5\) per hour. The line has been running fault-free for at least 6 hours.

What is the probability it continues to run fault-free for at least two more hours?

1. 0.0183
1. 0.0498
1. 0.1353
1. 0.3679
1. 0.6321

Question 2.3 (3 pts)

The response time (in milliseconds) of a web application follows a Normal distribution with \(\mu = 250\) and \(\sigma = 20\). A request is classified as “slow” if it takes more than 230 ms.

Given that a request is slow, what is the probability it takes more than 280 ms? Fractions are shown for readability. In R, these would be written using the / operator.

1. pnorm(280, mean = 250, sd = 20, lower.tail = FALSE)
1. pnorm(280, mean = 250, sd = 20, lower.tail = FALSE) / pnorm(230, mean = 250, sd = 20, lower.tail = TRUE)
1. (pnorm(280, mean = 250, sd = 20, lower.tail = TRUE) - pnorm(230, mean = 250, sd = 20, lower.tail = TRUE)) / pnorm(230, mean = 250, sd = 20, lower.tail = FALSE)
1. pnorm(280, mean = 250, sd = 20, lower.tail = FALSE) / pnorm(230, mean = 250, sd = 20, lower.tail = FALSE)

Question 2.4 (3 pts)

A one-way ANOVA with 4 groups is significant, and the researcher wants to compare all possible pairs of means using Bonferroni. What is the Bonferroni-adjusted significance level for each comparison if the family-wise error rate is set to 0.05?

1. 0.0500
1. 0.0250
1. 0.0125
1. 0.0083
1. 0.0050

Question 2.5 (3 pts)

Which of the following is not required for a traditional one-way ANOVA?

1. Independent random samples from each population
1. Equal population variances across groups
1. Each of the \(k\) populations is normally distributed, or sample means are approximately normally distributed
1. Equal sample sizes in all groups

Question 2.6 (3 pts)

A dataset consists of one response variable and four explanatory variables (Variables 1–4). For each explanatory variable, a scatter plot is drawn against the response variable. Select the two explanatory variables whose sample correlation coefficient \(r\) with the response variable is closest to zero.

Four scatter plots of explanatory variables (X1–X5) against response Y. Variable 1 shows a strong positive linear trend. Variable 2 shows a strong positive linear trend with moderate spread. Variable 3 shows a U-shaped (quadratic) pattern with no linear trend. Variable 4 shows a scattered cloud with no clear linear pattern.

1. Variables 1 & 2
1. Variables 2 & 3
1. Variables 2 & 4
1. Variables 3 & 4
1. None — all four variables have strong sample correlations with the response variable.

—

Problem 3 Setup

A utility company, Earl Energy, is known for long customer service wait times. Let \(X\) denote the waiting time (in hours) until a customer is connected to the next available representative. The probability density function (pdf) and cumulative distribution function (cdf) of \(X\) are given below.

\[\begin{split}f_X(x) = \begin{cases} 0, & x < 0 \\[4pt] \dfrac{1}{2}\,x^2\,e^{-x}, & x \geq 0 \end{cases}\end{split}\]

\[\begin{split}F_X(x) = \begin{cases} 0, & x < 0 \\[4pt] 1 - \dfrac{1}{2}\,e^{-x}\,(x^2 + 2x + 2), & x \geq 0 \end{cases}\end{split}\]

Problem 3 — Piecewise PDF/CDF (24 points)

Question 3a (10 pts)

What is the probability that a customer waits for more than 30 minutes?

Question 3b (14 pts)

Find the variance of a rate \(\dfrac{1}{X}\) given that \(E\!\left[\dfrac{1}{X}\right] = 0.5\).

—

Problem 4 Setup

A software team uses Claude Code to assist with code commits. Each commit is independently classified as either routine or novel. A routine commit is routed to Configuration A, and a novel commit is routed to Configuration B. Each commit independently has a 20% probability of being novel (Configuration B) and an 80% probability of being routine (Configuration A).

During a particular week, the team makes 25 commits. Each commit is automatically and independently tested for bugs. The probability that a Configuration A commit contains a bug is 0.05, and the probability that a Configuration B commit contains a bug is 0.30.

Problem 4 — Total Probability, Bayes’ Rule, Binomial (26 points)

Question 4a (10 pts)

A single commit is selected at random from the week’s 25 commits. What is the probability that it contains a bug?

Question 4b (10 pts)

A single commit from the week is found to contain a bug. What is the probability it was handled by Configuration B?

Question 4c (6 pts)

Find the expected number and standard deviation of bugs found in Configuration B during the week.

—

Problem 5 Setup

An agronomist wants to compare the average plant height increase (in cm) produced by four fertilizers. A random sample of plants was assigned to each fertilizer treatment. The summary information is given below.

Group	\(n_i\)	\(\bar{x}_i\)	\(s_i\)
Fertilizer 1	10	17.80	2.05
Fertilizer 2	10	20.70	2.31
Fertilizer 3	10	19.25	2.18
Fertilizer 4	10	24.00	2.42

Problem 5 — One-Way ANOVA, Tukey HSD (40 points)

Question 5a (2 pts)

Using the summary statistics, assess whether the equal variance (homogeneity of variance) assumption appears reasonable. Show your work and state your conclusion clearly. For the rest of the problem, assume all other ANOVA assumptions are satisfied.

Question 5b (14 pts)

Complete the ANOVA table below. The Factor Sum of Squares is 211.2687, the Error Sum of Squares is 181.3266, and the \(p\)-value is \(3.351353 \times 10^{-6}\).

Solution

With \(k = 4\) groups and \(N = 40\) total observations:

\[\bar{\bar{x}} = \frac{10(17.80) + 10(20.70) + 10(19.25) + 10(24.00)}{40} = \frac{817.5}{40} = 20.4375\]

Degrees of freedom:

Factor: \(k - 1 = 3\)
Error: \(N - k = 36\)
Total: \(N - 1 = 39\)

Mean Squares:

\[\text{MSA} = \frac{\text{SSA}}{k-1} = \frac{211.2687}{3} = 70.4229\]

\[\text{MSE} = \frac{\text{SSE}}{N-k} = \frac{181.3266}{36} = 5.0369\]

F-statistic:

\[F = \frac{\text{MSA}}{\text{MSE}} = \frac{70.4229}{5.0369} = 13.9815\]

Table 28 ANOVA Table
Source	df	Sum of Squares	Mean Square	\(F\)	\(\Pr(>F)\)
Factor	3	211.2687	70.4229	13.9815	\(3.35 \times 10^{-6}\)
Error	36	181.3266	5.0369
Total	39	392.5953

Question 5c (4 pts)

Provide the first two steps of the four-step one-way ANOVA hypothesis testing procedure.

Question 5d (3 pts)

Which of the following R code statements returns the correct \(p\)-value?

1. pf(F_ts/2, df1=4, df2 = 36, lower.tail = FALSE)
1. pf(F_ts, df1=3, df2 = 36, lower.tail = FALSE)
1. pf(F_ts, df1=4, df2 = 37, lower.tail = TRUE)
1. pf(F_ts, df1=3, df2 = 36, lower.tail = TRUE)
1. 2*pf(F_ts, df1=4, df2 = 40, lower.tail = FALSE)

Question 5e (8 pts)

The calculated \(p\)-value is \(3.35 \times 10^{-6}\). At a significance level of \(\alpha = 0.05\), state your formal decision and conclusion in the context of the problem.

Question 5f (4 pts)

Based on your conclusion in part (e), is it appropriate to proceed to pairwise comparisons such as Tukey’s HSD? Briefly explain.

Question 5g (5 pts)

The following Tukey HSD results were obtained. Construct a graphical display based on these results, and briefly state which fertilizer appears to have the largest population mean plant height increase and provide justification.

Comparison	diff	lwr	upr	p adj
Fertilizer 2 − Fertilizer 1	2.9000	0.1969	5.6031	0.0306
Fertilizer 3 − Fertilizer 1	1.4500	−1.2531	4.1531	0.4807
Fertilizer 4 − Fertilizer 1	6.2000	3.4969	8.9031	0.0000
Fertilizer 3 − Fertilizer 2	−1.4500	−4.1531	1.2531	0.4807
Fertilizer 4 − Fertilizer 2	3.3000	0.5969	6.0031	0.0118
Fertilizer 4 − Fertilizer 3	4.7500	2.0469	7.4531	0.0002

—

Problem 6 Setup

A driving school wants to estimate the monthly car insurance premium for teenage drivers who are at least 16 but under 20 years old (all with minimum-coverage policies). They randomly selected 100 teen drivers and recorded each driver’s monthly premium (in dollars) and age (in years) at enrollment. Preliminary analysis indicates a linear relationship between monthly premium (\(y\)) and age (\(x\)). The school plans to fit a simple linear regression model to provide statistical estimates of monthly premiums based on age.

\(S_{xx} = 63.797\)	\(S_{xy} = -538.3375\)	\(S_{yy} = 24069\)
\(\bar{x} = 17.5535\)	\(\bar{y} = 204.2009\)	\(n = 100\)

Problem 6 — Simple Linear Regression (37 points)

Question 6a (10 pts)

The simple linear regression model requires four assumptions. Not all assumptions are needed at every stage of the analysis pipeline.

State the four assumptions.
For each assumption, identify the stage at which it is first required: model fitting/estimation, statistical inference, or prediction intervals.
Explain why prediction intervals are not robust to the violation of the assumption identified in ii.

Question 6b (8 pts)

Assuming all assumptions are met, compute the slope \(b_1\) and the intercept \(b_0\). Write the fitted regression line \(\hat{y}\).

Question 6c (8 pts)

Predict the monthly premium for a 17-year-old teen and a 13-year-old teen, respectively. Discuss the statistical validity of these predictions.

Question 6d (8 pts)

Construct a 95% confidence interval for the mean monthly premium of all 17-year-old drivers. Use the R output below, along with the summary statistics from the problem introduction and your fitted regression model.

Residual standard error: 14.12 on 98 degrees of freedom
Multiple R-squared: 0.1887, Adjusted R-squared: 0.1804
F-statistic: 22.8 on 1 and 98 DF, p-value: 6.296e-06

`qt(0.025, 98, lower.tail=FALSE)` `[1] 1.984467`	`qf(0.025, 1, 98, lower.tail=FALSE)` `[1] 5.181823`
`qt(0.05, 98, lower.tail=FALSE)` `[1] 1.660551`	`qf(0.05, 1, 98, lower.tail=FALSE)` `[1] 3.938111`

Question 6e (3 pts)

Which of the following statements is reasonable regarding an interval estimate for a new response \(x^*\)?

1. The confidence interval for \(y^*\) becomes wider if \(x^*\) moves farther away from the sample mean \(\bar{x}\).
1. The confidence interval for \(y^*\) becomes narrower if \(x^*\) moves farther away from the sample mean \(\bar{x}\).
1. The prediction interval for \(y^*\) becomes wider if \(x^*\) moves farther away from the sample mean \(\bar{x}\).
1. The prediction interval for \(y^*\) becomes narrower if \(x^*\) moves farther away from the sample mean \(\bar{x}\).