STAT 350 — Exam 1 — Fall 2024

Exam Information

Course: STAT 350 — Introduction to Statistics
Semester: Fall 2024
Total Points: 105
Time Allowed: 60 minutes

Problem	Total Possible	Topic
Problem 1 (True/False, 2 pts each)	12	Data Types, Sampling, Poisson, CDF, z-Scores, Normal
Problem 2 (Multiple Choice, 3 pts each)	15	Probability, Random Variables, Distributions
Problem 3	20	Normal Distribution
Problem 4	26	Discrete Probability, Classification
Problem 5	32	Piecewise PDF, CDF, Expected Value
Total	105

Exam PDF

Solution PDF

The questions below reproduce the Fall 2024 Exam 1 in full accessible text. Each problem is followed by a complete worked solution. Point values reflect the actual exam.

Problem 1: True/False (12 points, 2 points each)

Indicate the correct answer by completely filling in the appropriate circle. If you indicate your answer by any other way, you may be marked incorrect.

Question 1.1 (2 pts)

Employees in a certain UPS branch collected the types of mail customers brought for a month. They plan to present the data appropriately to the manager and discuss how to utilize empty space efficiently.

T or F: A histogram is appropriate to use because the variable is categorical.

Question 1.2 (2 pts)

A hardware manufacturer is about to ship 20,000 of its products to a client. To estimate the defect rate of this shipment, they randomly selected 100 products for a last-minute inspection. For each product, they assign a value of 0 if the product is good and 1 if it is defective. The defect rate is then calculated as the average of these 0’s and 1’s.

T or F: If the company had the resources to inspect all 20,000 products, the defect rate calculated using all 20,000 products would represent a sample statistic.

Question 1.3 (2 pts)

Suppose the number of visitors to a mall follows a Poisson distribution with an average rate of 45 visitors per 30 minutes.

T or F: In this mall, the variance in the number of visitors arriving between 2:00 PM and 3:00 PM is equal to the variance in the number of visitors arriving between 3:00 PM and 5:00 PM.

Question 1.4 (2 pts)

Let \(V\) be a random variable with a probability density function \(f_V(v)\) that is nonzero only on the interval \([-5, -2)\). Let \(F_V(\cdot)\) denote the cumulative distribution function (CDF) of \(V\).

T or F: Then, \(F_V(c) = 1\) holds for any \(c > 0\).

Question 1.5 (2 pts)

A student scored 85 on two different math exams. For Exam 1, the mean score is 75 with a standard deviation of 5, and for Exam 2, the mean score is 70 with a standard deviation of 10.

T or F: The student performed better on Exam 1 compared to Exam 2.

Question 1.6 (2 pts)

For the figure below,

Two overlapping normal distribution curves on the same axis spanning 0 to 100. The blue curve is tall and narrow, centered near x = 50 with a small standard deviation. The red curve is short and wide, also centered near x = 50 but with a much larger standard deviation. Both curves are symmetric and unimodal.

T or F: the blue normal distribution has more area underneath its curve than the red normal distribution does.

Problem 2: Multiple Choice (15 points, 3 points each)

Indicate the correct answer by completely filling in the appropriate circle. If you indicate your answer by any other way, you may be marked incorrect. For each question, there is only one correct option letter choice.

Question 2.1 (3 pts)

The number of customers arriving at a UPS branch during working hours follows a Poisson distribution with an average rate of 4 customers per hour. Let \(X\) denote the number of customers arriving between 9:00 AM and 10:00 AM and let \(Y\) denote the number of customers arriving between 10:30 AM and 12:00 PM.

What is the conditional probability that exactly 3 customers arrive between 10:30 AM and 12:00 PM, given that 6 customers arrived between 9:00 AM and 10:00 AM?

(A) \(P(Y = 3 \mid X = 6) = 0\)
(B) \(P(Y = 3 \mid X = 6) = 0.0093\)
(C) \(P(Y = 3 \mid X = 6) = 0.0892\)
(D) \(P(Y = 3 \mid X = 6) = 0.1954\)
(E) \(P(Y = 3 \mid X = 6) = 0.8564\)

Question 2.2 (3 pts)

The time between customer arrivals at the same UPS facility follows an exponential distribution with an average of 15 minutes between customer arrivals. Let \(T\) denote the time between customer arrivals. If no customer has arrived in the last 20 minutes, what is the probability that the next customer arrives after waiting more than 15 additional minutes.

(A) \(P(T > 35 \mid T > 20) = 0\)
(B) \(P(T > 35 \mid T > 20) = 0.097\)
(C) \(P(T > 35 \mid T > 20) = 0.2636\)
(D) \(P(T > 35 \mid T > 20) = 0.3679\)
(E) \(P(T > 35 \mid T > 20) = 0.6321\)

Question 2.3 (3 pts)

Suppose \(X \sim \text{Binomial}(n = 10,\; p = 0.1)\) and \(Y \sim \text{Binomial}(n = 10,\; p = 0.9)\).

Which statement is not always true about \(X\) and \(Y\)?

(A) The mode of \(X\) is less than the mode of \(Y\).
(B) \(\text{SD}(X) - |\sqrt{\text{Var}(Y)}| = 0\)
(C) \(P(X = 1 \cap Y = 8) = 0.1943\)
(D) \(E[X^2] = (10)(0.1)(0.9) + [(10)(0.1)]^2\)
(E) \(P(X = 1) = P(Y = 9)\)

Question 2.4 (3 pts)

Suppose \(X\) is a random variable with \(E[e^X] = 2\) and \(\text{Var}(e^X) = 5\), and \(Y\) is a random variable independent of \(X\), satisfying \(E(Y) = -10\), \(\text{Var}(Y) = 3\). What is \(E\!\left[(e^X - 3Y)^2\right]\)?

(A) 1056
(B) 240
(C) 1024
(D) -752
(E) None of the above

Question 2.5 (3 pts)

The figure below shows the shape of the distribution for two continuous random variables \(X\) and \(Y\).

Two side-by-side histograms. Left panel titled "Distribution of X": right-skewed histogram with the tallest bars on the left side and a long right tail; y-axis labeled Probability up to 1.2. Right panel titled "Distribution of Y": roughly symmetric, bell-shaped histogram; y-axis labeled Probability up to 0.08.

Which of the following statements is TRUE about the random variable \(X\)?

(A) The mean is a better measure of central tendency than median.
(B) The distance between \(Q_3\) and the median is narrower than the distance between \(Q_1\) and the median.
(C) IQR is a robust (resistant) measure of the spread.
(D) The distribution is negatively skewed with one peak.
(E) The mode will have the largest value among all the measures of central tendency.

Free Response Questions 3–5

Show all work, clearly label your answers, and use four decimal places.

Problem 3 (20 points)

Problem 3 Setup

The stated speed limit on I-65 is 65 mph. The speeds of vehicles along a certain stretch of I-65 follow an approximately normal distribution with a mean of 71 mph and a standard deviation of 8 mph.

Let \(V\) denote the speed of a random vehicle on I-65.

\[V \sim \text{Normal}(\mu = 71,\; \sigma = 8).\]

Question 3a (2 pts)

What is the probability that the speed of a vehicle on this stretch of I-65 is below \(\mu + 3\sigma\)?

Question 3b (2 pts)

Calculate the z-score for the stated speed limit of 65 mph.

Question 3c (8 pts)

What is the probability that a vehicle’s speed is between 61 mph and 71 mph on this stretch of I-65?

Question 3d (8 pts)

State patrol officers will issue radar tickets to vehicles whose speeds are in the top 4% of this distribution. What is the speed cutoff for issuing tickets?

Problem 4 (26 points)

Problem 4 Setup

Kristin, a data science major, is working on a term project to build a predictive model that can classify images of handwritten digits (0–4).

Six example images of handwritten digits. Top row: label=0 showing a handwritten zero, label=4 showing a handwritten four, label=1 showing a handwritten one. Bottom row: label=1, label=3, label=1.

She has a dataset containing 1600 images, each displaying a single digit. Kristin divided the dataset into a training set of 1000 images and a test set of 600 images. The training set is used to teach the model, while the test set is used to evaluate its performance.

After training, Kristin used the test set to create a confusion matrix, which shows the number of correctly and incorrectly classified images. In the matrix below, rows indicate the actual labels (ground truth), and columns represent the predicted labels made by the model:

Table 6 Confusion Matrix
True Label	Predicted Label
True Label	Digits	0	1	2	3	4	Total
0	107	0	0	1	8	116
1	0	117	1	0	4	122
2	0	4	92	11	1	108
3	3	1	15	112	1	132
4	4	0	0	4	114	122
Total	114	122	108	128	128	600

Reading the Table: The highlighted cell with the value 117 indicates that the model correctly predicted the digit ‘1’ for 117 images that had True Label as ‘1’. This number represents the model’s accurate classifications for the digit ‘1’ in the test set.

All questions below refer to the data presented in the confusion matrix (table).

Question 4a (3 pts)

Define the events:

\(E_1 = \{\text{true label is 4}\}\)
\(E_2 = \{\text{true label is 1 or 2}\}\)
\(E_3 = \{\text{predicted label is 0}\}\)

Which of the following statements is TRUE?

(A) Two events \(E_1\) and \(E_3\) are mutually exclusive.
(B) \(P(E_1 \cap E_3) = P(E_1)\,P(E_3)\).
(C) Two events \(E_1\) and \(E_2\) are disjoint.
(D) \(P(E_2 \cup E_3) > P(E_2) + P(E_3)\).

The following events are used in Questions 4b–4e. Kristin wants to know if the model performs better than random guessing at classifying images of the digit three.

\(T_3 = \{\text{true label is 3}\}\)
\(P_3 = \{\text{predicted label is 3}\}\)

Question 4b (5 pts)

What is the probability that a randomly selected image has the true label three?

Question 4c (5 pts)

What is the probability that a randomly selected image is predicted to be three?

Question 4d (8 pts)

What is the probability that an image of digit three is correctly predicted to be three?

Question 4e (5 pts)

Are the events \(T_3\) and \(P_3\) independent? State your answer and provide a mathematical justification.

Problem 5 (32 points)

Problem 5 Setup

Robust-ish Devices Inc. manufactures devices whose lifetimes are divided into three distinct phases: early failure, stable operation, and wear-out.

Phase 1 (Early Failure): During the first year (\(0 \leq x \leq 1\)), the device has a constant likelihood of failing due to manufacturing defects, meaning the probability density function (pdf) for the device’s lifetime is constant in this interval.
Phase 2 (Stable Operation): After surviving the early failure phase, the device operates reliably with virtually no chance of failure for the next 4 years (\(1 < x \leq 5\)), meaning the pdf is zero during this phase, as the device is highly reliable.
Phase 3 (Wear Out): Beyond 5 years (\(x > 5\)), the device enters a wear-out phase where the likelihood of failure increases over time. The lifetime is modeled by an exponentially decaying function, meaning the chance of the device surviving much longer decreases, and the risk of failure increases as the device ages.

The probability density function for \(X\) (the lifetime of the device) is given by the following piecewise function:

\[\begin{split}f_X(x) = \begin{cases} 1 - e^{-5/16} & 0 \leq x \leq 1 \\[4pt] \dfrac{1}{16}\,e^{-x/16} & x \geq 5 \\[4pt] 0 & \text{otherwise} \end{cases}\end{split}\]

Question 5a (10 pts)

Verify that \(f_X(x)\) is a valid probability density function.

The cumulative distribution function is partially given below:

\[\begin{split}F_X(x) = \begin{cases} 0 & x \leq 0 \\[4pt] \left(1 - e^{-5/16}\right) x & 0 \leq x \leq 1 \\[4pt] 1 - e^{-5/16} & 1 \leq x \leq 5 \\[4pt] [\text{Unknown}] & x \geq 5 \end{cases}\end{split}\]

Question 5b (10 pts)

Determine the missing value of the cumulative distribution function (CDF) \(F_X(x)\), which is partially given above.

Question 5c (4 pts)

Determine the probability that the device lasts longer than 1 year.

Question 5d (8 pts)

Find the 25th percentile for the lifetime of devices manufactured by Robust-ish Devices Inc..