.. _5-7-poisson-distribution: .. raw:: html
Poisson Distribution ====================================== While the binomial distribution helps us count successes in a fixed number of trials, many real-world situations involve counting rare events that occur randomly over time or space. Think about counting phone calls to a help desk during an hour, defects in a length of computer tape, or radioactive particles emitted from a sample. These scenarios share a different structure that leads us to another fundamental distribution in statistics: the Poisson distribution. .. admonition:: Road Map 🧭 :class: important • Identify situations where the **Poisson distribution** applies. • Master the **Poisson probability mass function** and its single parameter :math:`\lambda`. • Derive **expected value and variance** using infinite series techniques. • Understand how **interval length affects** the parameter :math:`\lambda`. From Counting Events to Modeling Rates -------------------------------------------- The Poisson distribution emerges when we count events that occur randomly over continuous intervals of time, space, or volume. Unlike binomial experiments where we have a fixed number of trials, Poisson processes involve counting events within fixed intervals where the number of possible events is theoretically unlimited. What Makes a Process Poisson? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A Poisson process has three essential properties: **1. Stationary and Proportional** * The probability that an event occurs in any interval depends only on the length of that interval, not on where the interval occurs (**stationarity**). * Equal-sized intervals have equal probabilities, and longer intervals have **proportionally** higher probabilities of containing events. For example, suppose we're counting phone calls to a help desk that always has a constant call rate. The probability of receiving exactly one call should be the same whether we look at 9-10 AM or 2-3 PM. Furthermore, if we expect 3 calls per hour on average, we should expect 6 calls per two-hour period. **2. Independent Events** Individual events occur independently of each other. The occurrence of one event doesn't influence when the next event will happen. Additionally, the number of events in non-overlapping intervals are independent of each other. In our phone call example, receiving three calls between 9-10 AM doesn't affect the number of calls received between 10-11 AM. **3.Orderliness (no bunching)** Events cannot occur simultaneously. In sufficiently small intervals, the chance of two or more events occurring is negligible, so events effectively arrive **one at a time**. Examples of Poisson Distributions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Several real-world scenarios approximately follow Poisson distributions: - **Radioactive decay**: The number of alpha particles emitted from uranium-238 in one minute - **Call centers**: The number of calls received during busy hours on any given day - **Quality control**: The number of flaws on a computer tape of fixed length - **Ecology**: The number of dead trees in a square mile of forest - **Traffic**: The number of accidents at an intersection per month The Poisson Distribution ------------------------------------------------------ When events follow a Poisson process, we model the count of events using a Poisson distribution. Definition ~~~~~~~~~~~~~~ A Poisson random variable :math:`X` counts the number of independently occurring events in a fixed interval, where events occur at some **average rate** :math:`\lambda` (lambda) **per interval**. When :math:`X` has a Poisson distribution, we write :math:`X \sim Poisson(\lambda)`. Probability Mass Function ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Poisson PMF gives the probability of observing exactly x events: .. math:: p_X(x) = \frac{e^{-\lambda} \lambda^x}{x!} for :math:`x \in \text{supp}(X) = \{0, 1, 2, 3, ...\}`. Notice several key features of this distribution: - **Single parameter**: Unlike the binomial distribution which has two parameters (:math:`n` and :math:`p`), Poisson has only one parameter :math:`\lambda`. - **Unbounded support**: A Poisson random variable can theoretically take any non-negative integer value, unlike binomial whose support is bounded by :math:`n`. Expected Value and Variance: The Power of :math:`\lambda` ------------------------------------------------------------ One of the most elegant features of the Poisson distribution is that its parameter λ completely determines both the center and spread of the distribution. Expected Value ~~~~~~~~~~~~~~~~~~ To find :math:`E[X]` for a Poisson random variable, we use the definition of expected value: .. math:: E[X] = \sum_{x=0}^{\infty} x \cdot p_X(x) = \sum_{x=0}^{\infty} x \cdot \frac{e^{-\lambda} \lambda^x}{x!} Since the first term (:math:`x = 0`) equals zero, we can start the sum at :math:`x = 1` and cancel one :math:`x` with the factorial: .. math:: E[X] = \sum_{x=1}^{\infty} x \cdot \frac{e^{-\lambda} \lambda^x}{x!} = e^{-\lambda} \sum_{x=1}^{\infty} \frac{\lambda^x}{(x-1)!} Using the substitution :math:`u = x - 1`, we get: .. math:: E[X] = e^{-\lambda} \lambda \sum_{u=0}^{\infty} \frac{\lambda^u}{u!} The sum in brackets is the Taylor series for :math:`e^λ`, which equals 1 when we sum over all terms. Therefore: .. math:: E[X] = e^{-\lambda} \lambda \cdot e^{\lambda} = \lambda Variance ~~~~~~~~~~~ For the variance, we use Var(X) = E[X²] - (E[X])². We already know E[X] = λ, so we need E[X²]: .. math:: E[X^2] = \sum_{x=0}^{\infty} x^2 \cdot \frac{e^{-\lambda} \lambda^x}{x!} Through similar algebraic manipulation (starting at x = 1, canceling factorials, and using substitutions), we can show: .. math:: E[X^2] = \lambda^2 + \lambda Therefore: .. math:: \text{Var}(X) = E[X^2] - (E[X])^2 = (\lambda^2 + \lambda) - \lambda^2 = \lambda Summary of Poisson Distribution Properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For X ~ Poisson(λ): .. math:: &\mu_X = E[X] = \lambda \\ &\sigma_X^2 = \text{Var}(X) = \lambda \\ &\sigma_X = \sqrt{\lambda} The summary reveals a remarkable property of Poisson distribution; knowing λ tells us everything about the distribution's location and spread. Visualizing Poisson Distributions ---------------------------------- The shape of a Poisson distribution changes dramatically with λ: **Small λ (λ = 1)** When events are rare, most probability concentrates at 0 and 1, with a long right tail. The distribution is highly right-skewed. .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter5/lambda1.png :alt: Poisson distributions with λ=1 :align: center :width: 80% λ=1 **Moderate λ (λ = 2.5)** As λ increases, the mode shifts right and the distribution becomes less skewed. More probability spreads across multiple values. .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter5/lambda2.5.png :alt: Poisson distributions with λ=2.5 :align: center :width: 80% λ=2.5 **Large λ (λ = 10)** For larger λ values, the distribution approaches a symmetric, bell-shaped curve centered around λ. The resemblance to a normal distribution becomes quite strong. .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter5/lambda10.png :alt: Poisson distributions with different λ=10 :align: center :width: 80% λ=10 .. admonition:: Example💡: IT Consultant Call Analysis :class: note An IT consultant receives **an average of 3 calls per hour**. We want to model the number of calls using a Poisson distribution and answer several probability questions. **Setting Up the Model** Let X = number of calls the consultant receives in the next hour. - **Stationary and Proportional**: Call rate is constant over time. ✓ - **Independent**: One call doesn't influence when the next occurs. ✓ - **Orderliness**: Calls don't occur simultaneously. ✓ Therefore: :math:`X \sim Poisson(\lambda = 3)`. **Solving Probability Problems** #. Find the probability of exactly one call. .. math:: P(X = 1) = \frac{e^{-3} \cdot 3^1}{1!} = \frac{3e^{-3}}{1} ≈ 0.1494 #. Find the probability of more than one call. Using the complement rule: .. math:: P(X > 1) = 1 - [P(X = 0) + P(X = 1)] = 1 - [e^{-3} + 3e^{-3}] ≈ 0.8009 #. Find the probability of exactly 5 calls in the next **two hours**. Since the rate is 3 calls per hour, over two hours we expect 2 × 3 = 6 calls on average. Let :math:`Y` = number of calls in the next two hours. Then :math:`Y \sim Poisson(\lambda = 6)`. .. math:: P(Y = 5) = \frac{e^{-6} \cdot 6^5}{5!} ≈ 0.1606 #. Find the probability of 1 call in the next hour and 4 calls in the following hour. Let :math:`X_1` count the calls in the first hour and :math:`X_2` the calls in the second hour. Both have an average rate of 3, and since the two intervals are non-overlapping, :math:`X_1` and :math:`X_2` are **indepdendent**. This allows us to **use the special multiplication rule**: .. math:: P(X_1=1 \cap X_2=4) &= P(X_1=1)P(X_2=4)\\ &=(0.8009)\left(\frac{e^{-3}3^4}{4!}\right) \\ &= (0.8009)(0.1680) \approx 0.1346 When to Use Poisson vs. Binomial ----------------------------------------------- Understanding when to use Poisson versus binomial distributions is crucial for proper statistical modeling: **Use Poisson When:** - Counting events over continuous intervals (time, space, volume). - Events are rare relative to opportunity. - The number of potential events is very large or unlimited. - You know the average rate but not the total number of trials. **Use Binomial When:** - Fixed number of independent trials. - Each trial has exactly two outcomes. - Probability of success is constant across trials. - You're counting successes among a known number of attempts. **Poisson as Binomial Approximation** Interestingly, Poisson can approximate binomial when n is large and p is small, with np ≈ λ. This connection highlights how these distributions relate to different aspects of the same underlying counting process. Bringing It All Together --------------------------- The Poisson distribution serves as a fundamental model for understanding random processes in fields ranging from telecommunications and quality control to epidemiology and reliability engineering. Its elegant mathematical properties—particularly the equality of mean and variance—make it both theoretically interesting and practically useful. As we've seen, the key to successful application lies in recognizing when events satisfy the Poisson assumptions and correctly interpreting the rate parameter λ in context. When these conditions are met, the Poisson distribution provides powerful tools for probability calculation and process understanding. .. admonition:: Key Takeaways 📝 :class: important 1. The **Poisson distribution** models counts of rare events occurring over fixed intervals of time, space, or volume. 2. **Three key properties** define Poisson processes: stationary/proportional rates, independence of events, and unique event occurrences. 3. The **single parameter λ** represents both the mean and variance: E[X] = Var(X) = λ. 4. The **PMF formula** :math:`p_X(x) = \frac{e^{-\lambda}\lambda^x}{x!}` applies to unbounded, non-negative integer support. 5. **Parameter scaling**: When changing interval length, multiply λ by the scaling factor (e.g., if λ = 3 per hour, then λ = 6 for two hours). 6. **Distribution shape** evolves from highly right-skewed (small λ) to approximately symmetric (large λ). Exercises ~~~~~~~~~~~~~ 1. **Identifying Poisson Processes**: For each scenario, determine whether it could be modeled using a Poisson distribution. Explain your reasoning. a) The number of typos in a 10-page document b) The number of students who arrive late to a 50-student class c) The number of car accidents at a busy intersection per week d) The number of defective items in a batch of 100 manufactured products 2. **Basic Probability Calculations**: Emails arrive at a server at an average rate of 2.5 per minute. Assume this follows a Poisson distribution. a) What is the probability of receiving exactly 3 emails in the next minute? b) What is the probability of receiving no emails in the next minute? c) What is the probability of receiving more than 4 emails in the next minute? d) What are the mean and standard deviation for the number of emails per minute? 3. **Parameter Scaling**: A call center receives an average of 4 calls per hour during peak times. a) What is the probability of receiving exactly 2 calls in 30 minutes? b) What is the probability of receiving at least 10 calls in 3 hours? c) In how long an interval would you expect to receive exactly 1 call on average? 4. **R Programming**: Use R to analyze a Poisson process with λ = 7. a) Generate 500 random samples and compare the sample mean and variance to theoretical values. b) Find the probability that X is within one standard deviation of its mean. c) Find the 90th percentile of this distribution. 5. **Comparing Distributions**: A quality control inspector examines products for defects. Historical data shows an average of 1.5 defects per product. a) Using the Poisson distribution, what's the probability a product has no defects? b) What's the probability a product has more than 3 defects? c) If you inspect 10 products, what's the expected total number of defects? d) How would your analysis change if defects were extremely rare (λ = 0.1) versus common (λ = 5)? .. Working with Poisson Distributions in R ---------------------------------------- R provides comprehensive functions for working with Poisson distributions, following the same naming convention as other distributions. **The Four Essential R Functions** - **rpois()**: Generates random samples from a Poisson distribution - **dpois()**: Calculates the probability mass function (exact probabilities) - **ppois()**: Calculates the cumulative distribution function (cumulative probabilities) - **qpois()**: Finds quantiles (the inverse of ppois) **Generating Random Samples with rpois()** .. code-block:: r # Generate 10 random values from Poisson(λ = 3) # Each value represents count of events in one interval set.seed(123) random_counts <- rpois(n = 10, lambda = 3) print(random_counts) # Output: 1 4 5 3 1 1 4 5 4 2 # Generate 1000 samples to see the distribution pattern large_sample <- rpois(n = 1000, lambda = 3) # Check that sample mean approximates theoretical mean sample_mean <- mean(large_sample) sample_var <- var(large_sample) theoretical_mean <- 3 # λ = 3 theoretical_var <- 3 # λ = 3 print(paste("Sample mean:", round(sample_mean, 2))) print(paste("Theoretical mean:", theoretical_mean)) print(paste("Sample variance:", round(sample_var, 2))) print(paste("Theoretical variance:", theoretical_var)) **Calculating Exact Probabilities with dpois()** .. code-block:: r # Calculate P(X = 2) when X ~ Poisson(λ = 3) prob_exactly_2 <- dpois(x = 2, lambda = 3) print(paste("P(X = 2) =", round(prob_exactly_2, 4))) # Calculate probabilities for multiple values x_values <- 0:10 probabilities <- dpois(x = x_values, lambda = 3) # Create a probability table prob_table <- data.frame( x = x_values, probability = round(probabilities, 4) ) print(prob_table) # Verify probabilities sum close to 1 (they won't equal exactly 1 # because we're truncating the infinite support) total_prob <- sum(dpois(x = 0:20, lambda = 3)) print(paste("Total probability (x = 0 to 20):", round(total_prob, 6))) **Calculating Cumulative Probabilities with ppois()** .. code-block:: r # Calculate P(X ≤ 5) when X ~ Poisson(λ = 3) prob_at_most_5 <- ppois(q = 5, lambda = 3) print(paste("P(X ≤ 5) =", round(prob_at_most_5, 4))) # Calculate P(X > 5) = 1 - P(X ≤ 5) prob_more_than_5 <- 1 - ppois(q = 5, lambda = 3) # Alternative: use lower.tail = FALSE prob_more_than_5_alt <- ppois(q = 5, lambda = 3, lower.tail = FALSE) print(paste("P(X > 5) =", round(prob_more_than_5, 4))) # Calculate P(2 ≤ X ≤ 6) = P(X ≤ 6) - P(X ≤ 1) prob_between <- ppois(q = 6, lambda = 3) - ppois(q = 1, lambda = 3) print(paste("P(2 ≤ X ≤ 6) =", round(prob_between, 4))) .. code-block:: r # Method 1: Using complement rule with dpois prob_zero <- dpois(x = 0, lambda = 3) prob_one <- dpois(x = 1, lambda = 3) prob_more_than_one <- 1 - (prob_zero + prob_one) print(paste("P(X > 1) =", round(prob_more_than_one, 4))) # Method 2: Using ppois with lower.tail = FALSE prob_more_than_one_alt <- ppois(q = 1, lambda = 3, lower.tail = FALSE) print(paste("P(X > 1) using ppois =", round(prob_more_than_one_alt, 4))) # Method 3: Direct calculation prob_at_most_one <- ppois(q = 1, lambda = 3) prob_more_than_one_direct <- 1 - prob_at_most_one print(paste("P(X > 1) direct =", round(prob_more_than_one_direct, 4))) .. code-block:: r # New parameter for 2-hour interval lambda_2hours <- 2 * 3 # 2 hours × 3 calls/hour # Calculate P(Y = 5) for the 2-hour interval prob_five_calls_2hrs <- dpois(x = 5, lambda = lambda_2hours) print(paste("P(5 calls in 2 hours) =", round(prob_five_calls_2hrs, 4))) # Expected values for comparison print(paste("Expected calls in 1 hour:", 3)) print(paste("Expected calls in 2 hours:", lambda_2hours)) print(paste("Standard deviation in 1 hour:", round(sqrt(3), 3))) print(paste("Standard deviation in 2 hours:", round(sqrt(lambda_2hours), 3))) **Comprehensive Analysis with Visualization** .. code-block:: r # Create comprehensive analysis # 1-hour analysis lambda_1hr <- 3 x_1hr <- 0:12 probs_1hr <- dpois(x_1hr, lambda = lambda_1hr) # 2-hour analysis lambda_2hr <- 6 x_2hr <- 0:18 probs_2hr <- dpois(x_2hr, lambda = lambda_2hr) par(mfrow = c(1, 1)) # Summary statistics cat("\nSummary Statistics:\n") cat("1-hour interval: Mean =", lambda_1hr, ", SD =", round(sqrt(lambda_1hr), 3), "\n") cat("2-hour interval: Mean =", lambda_2hr, ", SD =", round(sqrt(lambda_2hr), 3), "\n") cat("P(exactly 1 call in 1 hour) =", round(dpois(1, lambda_1hr), 4), "\n") cat("P(exactly 5 calls in 2 hours) =", round(dpois(5, lambda_2hr), 4), "\n") cat("P(more than 3 calls in 1 hour) =", round(1 - ppois(3, lambda_1hr), 4), "\n") .. code-block:: r # Calculate P(X = 1) when λ = 3 prob_one_call <- dpois(x = 1, lambda = 3) print(paste("P(X = 1) =", round(prob_one_call, 4))) # Manual calculation for verification manual_calc <- exp(-3) * 3^1 / factorial(1) print(paste("Manual calculation =", round(manual_calc, 4)))