.. _exam1-fall2024: Exam 1 — Fall 2024: Fully Worked Solutions ============================================================ - `Exam PDF `_ - `Solution PDF `_ The questions below reproduce the Fall 2024 Exam 1 in full accessible text. Each problem is followed by a complete worked solution. Point values reflect the actual exam. .. list-table:: Point Summary :widths: 40 30 30 :header-rows: 1 * - Section - Format - Points * - Problem 1 — True/False - 6 questions × 2 pts - 12 * - Problem 2 — Multiple Choice - 5 questions × 3 pts - 15 * - Problem 3 — Free Response - 4 parts - 20 * - Problem 4 — Free Response - 5 parts - 26 * - Problem 5 — Free Response - 4 parts - 32 * - **Total** - - **105** ---- Problem 1: True/False (12 points, 2 points each) ------------------------------------------------------------------ Indicate the correct answer by completely filling in the appropriate circle. If you indicate your answer by any other way, you may be marked incorrect. ---- .. admonition:: Question 1.1 (2 pts) :class: note Employees in a certain UPS branch collected the types of mail customers brought for a month. They plan to present the data appropriately to the manager and discuss how to utilize empty space efficiently. **T or F:** A histogram is appropriate to use because the variable is categorical. .. dropdown:: Solution :class-container: sd-border-success **Answer: FALSE** The variable "type of mail" is **categorical** (qualitative). Histograms are designed for **quantitative** (numerical) data — they display the distribution of a numerical variable by grouping values into bins. For categorical data, the appropriate displays are bar charts or pie charts, which show frequencies or proportions for each category. The statement is **FALSE**. ---- .. admonition:: Question 1.2 (2 pts) :class: note A hardware manufacturer is about to ship 20,000 of its products to a client. To estimate the defect rate of this shipment, they randomly selected 100 products for a last-minute inspection. For each product, they assign a value of 0 if the product is good and 1 if it is defective. The defect rate is then calculated as the average of these 0's and 1's. **T or F:** If the company had the resources to inspect all 20,000 products, the defect rate calculated using all 20,000 products would represent a sample statistic. .. dropdown:: Solution :class-container: sd-border-success **Answer: FALSE** If all 20,000 products in the shipment were inspected, the defect rate would be computed from the **entire population** of interest (the shipment). A quantity computed from the entire population is a **population parameter**, not a sample statistic. A sample statistic is computed from a subset (sample) of the population. The statement is **FALSE**. ---- .. admonition:: Question 1.3 (2 pts) :class: note Suppose the number of visitors to a mall follows a Poisson distribution with an average rate of 45 visitors per 30 minutes. **T or F:** In this mall, the variance in the number of visitors arriving between 2:00 PM and 3:00 PM is equal to the variance in the number of visitors arriving between 3:00 PM and 5:00 PM. .. dropdown:: Solution :class-container: sd-border-success **Answer: FALSE** For a Poisson process with rate :math:`\lambda` visitors per 30 minutes, the number of visitors in a time interval of length :math:`t` (in 30-minute units) follows :math:`\text{Poisson}(\lambda t)`, which has variance :math:`\lambda t`. - **2:00 PM to 3:00 PM** is 1 hour = 2 thirty-minute periods: :math:`\text{Var} = 45 \times 2 = 90` visitors². - **3:00 PM to 5:00 PM** is 2 hours = 4 thirty-minute periods: :math:`\text{Var} = 45 \times 4 = 180` visitors². The variances are not equal. The statement is **FALSE**. ---- .. admonition:: Question 1.4 (2 pts) :class: note Let :math:`V` be a random variable with a probability density function :math:`f_V(v)` that is nonzero only on the interval :math:`[-5, -2)`. Let :math:`F_V(\cdot)` denote the cumulative distribution function (CDF) of :math:`V`. **T or F:** Then, :math:`F_V(c) = 1` holds for any :math:`c > 0`. .. dropdown:: Solution :class-container: sd-border-success **Answer: TRUE** The support of :math:`V` is entirely contained in :math:`[-5, -2)`. For any :math:`c > 0`, since :math:`0 > -2`, the value :math:`c` lies strictly to the right of the entire support. The CDF at :math:`c` accumulates all probability mass in :math:`(-\infty, c]`, which includes the entire support :math:`[-5, -2)`. Therefore :math:`F_V(c) = 1` for all :math:`c > 0`. The statement is **TRUE**. ---- .. admonition:: Question 1.5 (2 pts) :class: note A student scored 85 on two different math exams. For Exam 1, the mean score is 75 with a standard deviation of 5, and for Exam 2, the mean score is 70 with a standard deviation of 10. **T or F:** The student performed better on Exam 1 compared to Exam 2. .. dropdown:: Solution :class-container: sd-border-success **Answer: TRUE** Compute the z-score for each exam to compare relative performance: .. math:: z_{\text{Exam 1}} = \frac{85 - 75}{5} = \frac{10}{5} = 2.00, .. math:: z_{\text{Exam 2}} = \frac{85 - 70}{10} = \frac{15}{10} = 1.50. Since :math:`z_{\text{Exam 1}} = 2.00 > 1.50 = z_{\text{Exam 2}}`, the student scored 2 standard deviations above the mean on Exam 1 but only 1.5 standard deviations above the mean on Exam 2. The student performed relatively better on Exam 1. The statement is **TRUE**. ---- .. admonition:: Question 1.6 (2 pts) :class: note For the figure below, .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/Exams/Exam1/FALL2024/image1.png :alt: Two overlapping normal distribution curves on the same axis spanning 0 to 100. The blue curve is tall and narrow, centered near x = 50 with a small standard deviation. The red curve is short and wide, also centered near x = 50 but with a much larger standard deviation. Both curves are symmetric and unimodal. :align: center :width: 55% **T or F:** the blue normal distribution has more area underneath its curve than the red normal distribution does. .. dropdown:: Solution :class-container: sd-border-success **Answer: FALSE** Every normal distribution, regardless of its mean :math:`\mu` or standard deviation :math:`\sigma`, has **total area equal to 1** under its curve. This is a fundamental property of all probability density functions. The blue curve is taller and narrower (smaller :math:`\sigma`) while the red curve is shorter and wider (larger :math:`\sigma`), but both enclose exactly the same total area of 1. The statement is **FALSE**. ---- Problem 2: Multiple Choice (15 points, 3 points each) ------------------------------------------------------------------ Indicate the correct answer by completely filling in the appropriate circle. If you indicate your answer by any other way, you may be marked incorrect. For each question, there is only one correct option letter choice. ---- .. admonition:: Question 2.1 (3 pts) :class: note The number of customers arriving at a UPS branch during working hours follows a Poisson distribution with an average rate of **4 customers per hour**. Let :math:`X` denote the number of customers arriving between **9:00 AM and 10:00 AM** and let :math:`Y` denote the number of customers arriving between **10:30 AM and 12:00 PM**. What is the **conditional probability** that exactly **3 customers** arrive between **10:30 AM and 12:00 PM**, given that **6 customers** arrived between **9:00 AM and 10:00 AM**? - **(A)** :math:`P(Y = 3 \mid X = 6) = 0` - **(B)** :math:`P(Y = 3 \mid X = 6) = 0.0093` - **(C)** :math:`P(Y = 3 \mid X = 6) = 0.0892` - **(D)** :math:`P(Y = 3 \mid X = 6) = 0.1954` - **(E)** :math:`P(Y = 3 \mid X = 6) = 0.8564` .. dropdown:: Solution :class-container: sd-border-success **Answer: (C)** The intervals 9:00–10:00 AM and 10:30 AM–12:00 PM do not overlap. For a Poisson process, arrivals in non-overlapping intervals are **independent**. Therefore: .. math:: P(Y = 3 \mid X = 6) = P(Y = 3). The interval 10:30 AM–12:00 PM is **1.5 hours** long. With a rate of 4 customers per hour: .. math:: Y \sim \text{Poisson}(\lambda = 4 \times 1.5) = \text{Poisson}(6). .. math:: P(Y = 3) = \frac{e^{-6} \cdot 6^3}{3!} = \frac{e^{-6} \cdot 216}{6} = 36\,e^{-6} \approx 36 \times 0.0025 = \boxed{0.0892}. The answer is **(C)**. ---- .. admonition:: Question 2.2 (3 pts) :class: note The time between customer arrivals at the same UPS facility follows an exponential distribution with an average of 15 minutes between customer arrivals. Let :math:`T` denote the time between customer arrivals. If no customer has arrived in the last 20 minutes, what is the probability that the next customer arrives after waiting more than 15 additional minutes. - **(A)** :math:`P(T > 35 \mid T > 20) = 0` - **(B)** :math:`P(T > 35 \mid T > 20) = 0.097` - **(C)** :math:`P(T > 35 \mid T > 20) = 0.2636` - **(D)** :math:`P(T > 35 \mid T > 20) = 0.3679` - **(E)** :math:`P(T > 35 \mid T > 20) = 0.6321` .. dropdown:: Solution :class-container: sd-border-success **Answer: (D)** The Exponential distribution has the **memoryless property**: .. math:: P(T > s + t \mid T > s) = P(T > t) \quad \text{for all } s, t > 0. Applying this with :math:`s = 20` and :math:`t = 15`: .. math:: P(T > 35 \mid T > 20) = P(T > 15). Since :math:`T \sim \text{Exponential}\!\left(\lambda = \dfrac{1}{15}\right)`: .. math:: P(T > 15) = e^{-15/15} = e^{-1} \approx \boxed{0.3679}. The answer is **(D)**. ---- .. admonition:: Question 2.3 (3 pts) :class: note Suppose :math:`X \sim \text{Binomial}(n = 10,\; p = 0.1)` and :math:`Y \sim \text{Binomial}(n = 10,\; p = 0.9)`. Which statement is **not always true** about :math:`X` and :math:`Y`? - **(A)** The mode of :math:`X` is less than the mode of :math:`Y`. - **(B)** :math:`\text{SD}(X) - |\sqrt{\text{Var}(Y)}| = 0` - **(C)** :math:`P(X = 1 \cap Y = 8) = 0.1943` - **(D)** :math:`E[X^2] = (10)(0.1)(0.9) + [(10)(0.1)]^2` - **(E)** :math:`P(X = 1) = P(Y = 9)` .. dropdown:: Solution :class-container: sd-border-success **Answer: (A)** Evaluate each option: **(A)** For :math:`X \sim \text{Bin}(10, 0.1)`, the mode is :math:`\lfloor(n+1)p\rfloor = \lfloor 1.1 \rfloor = 1`. For :math:`Y \sim \text{Bin}(10, 0.9)`, the mode is :math:`\lfloor(n+1)(0.9)\rfloor = \lfloor 9.9 \rfloor = 9`. The mode of :math:`X` is 1 and the mode of :math:`Y` is 9, and :math:`1 < 9`. While this appears true for these specific distributions, the Binomial mode is not always strictly less than or greater than another Binomial mode in general — it depends on the specific parameters and whether modes are unique. **(B) Always true.** :math:`\text{SD}(X) = \sqrt{np(1-p)} = \sqrt{10(0.1)(0.9)} = \sqrt{0.9}`. Similarly :math:`\sqrt{\text{Var}(Y)} = \sqrt{10(0.9)(0.1)} = \sqrt{0.9}`. Their difference is 0. **(C) Never true.** If :math:`X` and :math:`Y` are independent, :math:`P(X=1 \cap Y=8) = P(X=1) \cdot P(Y=8) \approx 0.3874 \times 0.1937 \approx 0.0750 \neq 0.1943`. **(D) Always true.** Using :math:`E[X^2] = \text{Var}(X) + (E[X])^2 = np(1-p) + (np)^2 = (10)(0.1)(0.9) + [(10)(0.1)]^2`. **(E) Always true.** By symmetry of the Binomial: :math:`P(X = k) = P(Y = n-k)`, so :math:`P(X=1) = P(Y=9)`. The answer is **(A)**. ---- .. admonition:: Question 2.4 (3 pts) :class: note Suppose :math:`X` is a random variable with :math:`E[e^X] = 2` and :math:`\text{Var}(e^X) = 5`, and :math:`Y` is a random variable independent of :math:`X`, satisfying :math:`E(Y) = -10`, :math:`\text{Var}(Y) = 3`. What is :math:`E\!\left[(e^X - 3Y)^2\right]`? - **(A)** 1056 - **(B)** 240 - **(C)** 1024 - **(D)** -752 - **(E)** None of the above .. dropdown:: Solution :class-container: sd-border-success **Answer: (A)** Expand the square: .. math:: E\!\left[(e^X - 3Y)^2\right] = E[e^{2X}] - 6\,E[e^X Y] + 9\,E[Y^2]. **Find each term:** :math:`E[e^{2X}]`: using :math:`\text{Var}(e^X) = E[e^{2X}] - (E[e^X])^2`: .. math:: E[e^{2X}] = \text{Var}(e^X) + (E[e^X])^2 = 5 + 4 = 9. :math:`E[e^X Y]`: since :math:`X` and :math:`Y` are independent: .. math:: E[e^X Y] = E[e^X] \cdot E[Y] = 2 \times (-10) = -20. :math:`E[Y^2]`: using :math:`\text{Var}(Y) = E[Y^2] - (E[Y])^2`: .. math:: E[Y^2] = \text{Var}(Y) + (E[Y])^2 = 3 + 100 = 103. **Combine:** .. math:: E\!\left[(e^X - 3Y)^2\right] = 9 - 6(-20) + 9(103) = 9 + 120 + 927 = \boxed{1056}. The answer is **(A)**. ---- .. admonition:: Question 2.5 (3 pts) :class: note The figure below shows the shape of the distribution for two continuous random variables :math:`X` and :math:`Y`. .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/Exams/Exam1/FALL2024/image2.png :alt: Two side-by-side histograms. Left panel titled "Distribution of X": right-skewed histogram with the tallest bars on the left side and a long right tail; y-axis labeled Probability up to 1.2. Right panel titled "Distribution of Y": roughly symmetric, bell-shaped histogram; y-axis labeled Probability up to 0.08. :align: center :width: 80% Which of the following statements is TRUE about the random variable :math:`X`? - **(A)** The mean is a better measure of central tendency than median. - **(B)** The distance between :math:`Q_3` and the median is narrower than the distance between :math:`Q_1` and the median. - **(C)** IQR is a robust (resistant) measure of the spread. - **(D)** The distribution is negatively skewed with one peak. - **(E)** The mode will have the largest value among all the measures of central tendency. .. dropdown:: Solution :class-container: sd-border-success **Answer: (C)** The distribution of :math:`X` is **right-skewed** (positively skewed) with a long right tail. Evaluate each option: **(A) FALSE.** For a right-skewed distribution, the mean is pulled toward the long right tail and is not resistant to extreme values. The **median** is a better (more resistant) measure of central tendency than the mean for skewed data. **(B) FALSE.** For a right-skewed distribution, the bulk of the data is concentrated on the left, so the right half of the box (Q₃ to median) is typically *wider* than the left half (Q₁ to median). The distance from :math:`Q_3` to the median is **not** narrower. **(C) TRUE.** The IQR is based on the middle 50% of the data and is not affected by extreme values or outliers in the tails. It is a **robust (resistant) measure of spread**, regardless of the shape of the distribution. **(D) FALSE.** The histogram of :math:`X` shows a right tail (positively skewed), not negatively skewed. **(E) FALSE.** For a right-skewed distribution, the ordering of measures of central tendency is Mode < Median < Mean. The mode has the **smallest** value, not the largest. The answer is **(C)**. ---- Free Response Questions 3–5 ------------------------------------------------------------------ Show all work, clearly label your answers, and use four decimal places. Problem 3 (20 points) ------------------------------------------------------------------ .. admonition:: Problem 3 Setup :class: important The stated speed limit on I-65 is 65 mph. The speeds of vehicles along a certain stretch of I-65 follow an approximately normal distribution with a mean of 71 mph and a standard deviation of 8 mph. Let :math:`V` denote the speed of a random vehicle on I-65. .. math:: V \sim \text{Normal}(\mu = 71,\; \sigma = 8). ---- .. admonition:: Question 3a (2 pts) :class: note What is the probability that the speed of a vehicle on this stretch of I-65 is below :math:`\mu + 3\sigma`? .. dropdown:: Solution :class-container: sd-border-success Simply using the Empirical Rule: .. math:: P(V < \mu + 3\sigma) \approx \boxed{0.9985}. ---- .. admonition:: Question 3b (2 pts) :class: note Calculate the z-score for the stated speed limit of 65 mph. .. dropdown:: Solution :class-container: sd-border-success .. math:: z = \frac{x - \mu}{\sigma} = \frac{65 - 71}{8} = \boxed{-0.75}. ---- .. admonition:: Question 3c (8 pts) :class: note What is the probability that a vehicle's speed is between 61 mph and 71 mph on this stretch of I-65? .. dropdown:: Solution :class-container: sd-border-success .. math:: P(61 < V < 71) = P\!\left(\frac{61 - 71}{8} < Z < \frac{71 - 71}{8}\right) = P(-1.25 < Z < 0). Using the z-table and symmetry: .. math:: P(-1.25 < Z < 0) = 0.5 - \Phi(-1.25) = 0.5 - 0.1056 = \boxed{0.3944}. ---- .. admonition:: Question 3d (8 pts) :class: note State patrol officers will issue radar tickets to vehicles whose speeds are in the top 4% of this distribution. What is the speed cutoff for issuing tickets? .. dropdown:: Solution :class-container: sd-border-success The top 4% corresponds to the **96th percentile**. From the z-table: :math:`\Phi(1.75) = 0.9599 \approx 0.96`, so :math:`z = 1.75`. Transform to the distribution of car speeds on I-65: .. math:: v_{0.96} = \mu + z \times \sigma = 71 + 1.75 \times 8 = 71 + 14 = \boxed{85 \text{ mph}}. The cutoff for the top 4% of vehicle speeds on I-65 is **85 miles per hour**. ---- Problem 4 (26 points) ------------------------------------------------------------------ .. admonition:: Problem 4 Setup :class: important Kristin, a data science major, is working on a term project to build a predictive model that can classify images of handwritten digits (0–4). .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/Exams/Exam1/FALL2024/image3.png :alt: Six example images of handwritten digits. Top row: label=0 showing a handwritten zero, label=4 showing a handwritten four, label=1 showing a handwritten one. Bottom row: label=1, label=3, label=1. :align: center :width: 65% She has a dataset containing 1600 images, each displaying a single digit. Kristin divided the dataset into a **training set** of **1000 images** and a **test set** of **600 images**. The training set is used to teach the model, while the test set is used to evaluate its performance. After training, Kristin used the test set to create a **confusion matrix**, which shows the number of **correctly** and **incorrectly classified images**. In the matrix below, **rows** indicate the **actual labels (ground truth)**, and **columns represent** the **predicted labels** made by the **model**: .. flat-table:: Confusion Matrix :header-rows: 2 :widths: 15 13 12 12 12 12 12 12 * - :rspan:`1` True Label - :cspan:`6` **Predicted Label** * - **Digits** - **0** - **1** - **2** - **3** - **4** - **Total** * - **0** - 107 - 0 - 0 - 1 - 8 - 116 * - **1** - 0 - 117 - 1 - 0 - 4 - 122 * - **2** - 0 - 4 - 92 - 11 - 1 - 108 * - **3** - 3 - 1 - 15 - 112 - 1 - 132 * - **4** - 4 - 0 - 0 - 4 - 114 - 122 * - **Total** - 114 - 122 - 108 - 128 - 128 - **600** **Reading the Table:** The highlighted cell with the value 117 indicates that the model correctly predicted the digit '1' for 117 images that had True Label as '1'. This number represents the model's accurate classifications for the digit '1' in the test set. All questions below refer to the data presented in the confusion matrix (table). ---- .. admonition:: Question 4a (3 pts) :class: note Define the events: - :math:`E_1 = \{\text{true label is 4}\}` - :math:`E_2 = \{\text{true label is 1 or 2}\}` - :math:`E_3 = \{\text{predicted label is 0}\}` Which of the following statements is TRUE? - **(A)** Two events :math:`E_1` and :math:`E_3` are mutually exclusive. - **(B)** :math:`P(E_1 \cap E_3) = P(E_1)\,P(E_3)`. - **(C)** Two events :math:`E_1` and :math:`E_2` are disjoint. - **(D)** :math:`P(E_2 \cup E_3) > P(E_2) + P(E_3)`. .. dropdown:: Solution :class-container: sd-border-success **Answer: (C)** Check each statement using the confusion matrix: **(A) FALSE.** :math:`E_1 \cap E_3` = {true label is 4 AND predicted label is 0}. From the matrix, 4 images have true label 4 and were predicted as 0. So :math:`P(E_1 \cap E_3) = 4/600 \neq 0`. They are **not** mutually exclusive. **(B) FALSE.** :math:`P(E_1) = 122/600`, :math:`P(E_3) = 114/600`, :math:`P(E_1 \cap E_3) = 4/600`. Check: :math:`P(E_1)P(E_3) = (122/600)(114/600) = 13908/360000 \approx 0.0386`, but :math:`P(E_1 \cap E_3) = 4/600 \approx 0.0067`. Not equal. **(C) TRUE.** :math:`E_1` = {true label is 4} and :math:`E_2` = {true label is 1 or 2}. An image cannot simultaneously have true label 4 and true label 1 or 2. Therefore :math:`E_1 \cap E_2 = \emptyset` and the events are **disjoint**. **(D) FALSE.** By the inclusion-exclusion principle: :math:`P(E_2 \cup E_3) = P(E_2) + P(E_3) - P(E_2 \cap E_3)`. Since :math:`P(E_2 \cap E_3) \geq 0`, we always have :math:`P(E_2 \cup E_3) \leq P(E_2) + P(E_3)`. ---- The following events are used in Questions 4b–4e. Kristin wants to know if the model performs better than random guessing at classifying images of the digit three. - :math:`T_3 = \{\text{true label is 3}\}` - :math:`P_3 = \{\text{predicted label is 3}\}` ---- .. admonition:: Question 4b (5 pts) :class: note What is the probability that a randomly selected image has the true label three? .. dropdown:: Solution :class-container: sd-border-success From the Total row, 132 images have true label 3 out of 600 total: .. math:: P(T_3) = \frac{132}{600} = \boxed{0.22}. ---- .. admonition:: Question 4c (5 pts) :class: note What is the probability that a randomly selected image is predicted to be three? .. dropdown:: Solution :class-container: sd-border-success From the Total column, 128 images were predicted as 3 out of 600 total: .. math:: P(P_3) = \frac{128}{600} = \boxed{0.2133}. ---- .. admonition:: Question 4d (8 pts) :class: note What is the probability that an image of digit three is correctly predicted to be three? .. dropdown:: Solution :class-container: sd-border-success This is the conditional probability :math:`P(P_3 \mid T_3)`. Of the 132 images with true label 3, the model correctly predicted 112 as 3: .. math:: P(P_3 \mid T_3) = \frac{112}{132} = \boxed{0.8485}. ---- .. admonition:: Question 4e (5 pts) :class: note Are the events :math:`T_3` and :math:`P_3` independent? State your answer and provide a mathematical justification. .. dropdown:: Solution :class-container: sd-border-success **No, they are not independent**, as the conditional probability does not equal the unconditional probability: .. math:: P(P_3 \mid T_3) = \frac{112}{132} = 0.8485 \neq 0.2133 = \frac{128}{600} = P(P_3). Since :math:`P(P_3 \mid T_3) \neq P(P_3)`, the events :math:`T_3` and :math:`P_3` are **not independent**. ---- Problem 5 (32 points) ------------------------------------------------------------------ .. admonition:: Problem 5 Setup :class: important Robust-ish Devices Inc. manufactures devices whose lifetimes are divided into three distinct phases: early failure, stable operation, and wear-out. - **Phase 1 (Early Failure):** During the first year (:math:`0 \leq x \leq 1`), the device has a constant likelihood of failing due to manufacturing defects, meaning the probability density function (pdf) for the device's **lifetime** is **constant** in this interval. - **Phase 2 (Stable Operation):** After surviving the early failure phase, the device operates reliably with virtually no chance of failure for the next 4 years (:math:`1 < x \leq 5`), meaning the pdf is zero during this phase, as the device is highly reliable. - **Phase 3 (Wear Out):** Beyond 5 years (:math:`x > 5`), the device enters a wear-out phase where the likelihood of failure increases over time. The **lifetime** is modeled by an exponentially decaying function, meaning the chance of the device surviving much longer decreases, and the risk of failure increases as the device ages. The probability density function for :math:`X` (the lifetime of the device) is given by the following piecewise function: .. math:: f_X(x) = \begin{cases} 1 - e^{-5/16} & 0 \leq x \leq 1 \\[4pt] \dfrac{1}{16}\,e^{-x/16} & x \geq 5 \\[4pt] 0 & \text{otherwise} \end{cases} ---- .. admonition:: Question 5a (10 pts) :class: note Verify that :math:`f_X(x)` is a valid probability density function. .. dropdown:: Solution :class-container: sd-border-success **Axiom 1:** :math:`f_X(x) \geq 0` clearly by the graph of the pdf or because it is a positive constant over :math:`0 \leq x \leq 1`, an exponentially decaying function over :math:`x \geq 5`, and 0 everywhere else. **Axiom 2:** .. math:: \int_{-\infty}^{\infty} f_X(x)\,dx = \int_0^1 \!\left(1 - e^{-5/16}\right)dx + \int_5^{\infty} \frac{1}{16}\,e^{-x/16}\,dx. .. math:: = \left(1 - e^{-5/16}\right) - e^{-x/16}\Big|_5^{\infty} = \left(1 - e^{-5/16}\right) + e^{-5/16} = \boxed{1}. \checkmark ---- The cumulative distribution function is partially given below: .. math:: F_X(x) = \begin{cases} 0 & x \leq 0 \\[4pt] \left(1 - e^{-5/16}\right) x & 0 \leq x \leq 1 \\[4pt] 1 - e^{-5/16} & 1 \leq x \leq 5 \\[4pt] [\text{Unknown}] & x \geq 5 \end{cases} ---- .. admonition:: Question 5b (10 pts) :class: note Determine the missing value of the cumulative distribution function (CDF) :math:`F_X(x)`, which is partially given above. .. dropdown:: Solution :class-container: sd-border-success For :math:`x \geq 5`, integrate from 5 to :math:`x` and add the accumulated area through the stable phase: .. math:: F_X(x) = \left(1 - e^{-5/16}\right) + \int_5^x \frac{1}{16}\,e^{-t/16}\,dt. .. math:: = \left(1 - e^{-5/16}\right) - e^{-t/16}\Big|_5^x. .. math:: = \left(1 - e^{-5/16}\right) - e^{-x/16} + e^{-5/16} = \boxed{1 - e^{-x/16}}. ---- .. admonition:: Question 5c (4 pts) :class: note Determine the probability that the device lasts longer than 1 year. .. dropdown:: Solution :class-container: sd-border-success .. math:: P(X > 1) = 1 - F_X(1) = 1 - \left(1 - e^{-5/16}\right) = e^{-5/16} = \boxed{0.7316}. ---- .. admonition:: Question 5d (8 pts) :class: note Find the 25th percentile for the lifetime of devices manufactured by Robust-ish Devices Inc.. .. dropdown:: Solution :class-container: sd-border-success The 25th percentile :math:`x^*` satisfies :math:`F_X(x^*) = 0.25`. First determine which region contains :math:`x^*`. The CDF reaches :math:`F_X(1) = 1 - e^{-5/16} \approx 0.2684` at :math:`x = 1`, and remains at 0.2684 until :math:`x = 5`. Since :math:`0.25 < 0.2684`, the 25th percentile falls in the region :math:`[0, 1)`. Solve :math:`F_X(x^*) = 0.25` for the region :math:`[0, 1)`: .. math:: \left(1 - e^{-5/16}\right) x^* = 0.25. .. math:: x^* = \frac{0.25}{1 - e^{-5/16}} \approx \frac{0.25}{0.2684} = \boxed{0.9315 \text{ years}}. The 25th percentile of lifetime is **0.9315 years**.