.. _ch2.2-uniform-random-variates:

=========================
Uniform Random Variates
=========================

Every random number you have ever used in a computer program began as a uniform random variate. When you call ``np.random.normal()`` to generate Gaussian samples, the library first generates uniform numbers on :math:`[0, 1)` and then transforms them. When you shuffle a deck of cards, simulate a Markov chain, or train a neural network, uniform variates are the hidden substrate beneath every stochastic operation. The uniform distribution is the computational atom of randomness—the irreducible building block from which all other random variables are constructed.

Yet this foundation rests on a paradox. Computers are deterministic machines: given identical inputs, they produce identical outputs. How can a deterministic device generate numbers that behave randomly? The answer is that it cannot—not truly. What computers produce instead are *pseudo-random* numbers: sequences generated by deterministic algorithms that, while entirely predictable given the algorithm and starting state, pass stringent statistical tests for randomness. Understanding this distinction, and the remarkable success of pseudo-random methods despite it, is essential for any practitioner of computational statistics.

This section explores the theory and practice of uniform random variate generation. We begin with the theoretical result that makes uniform variates so fundamental—the Probability Integral Transform—and then examine the surprisingly deep question of how to generate sequences that *behave* uniformly even when produced by deterministic algorithms. Along the way, we will encounter chaotic dynamical systems that seem random but fail statistical tests, historical generators whose flaws caused years of corrupted research, and the modern algorithms that power today's scientific computing.

.. admonition:: Road Map 🧭
   :class: important

   • **Understand**: Why the uniform distribution is the universal source for all random generation, through the Probability Integral Transform
   • **Explore**: The paradox of computational randomness and what "pseudo-random" really means
   • **Analyze**: Why some deterministic systems (chaotic maps, naive generators) fail to produce acceptable randomness
   • **Master**: The mathematical structure of linear congruential and shift-register generators, including their failure modes
   • **Apply**: Modern generators (Mersenne Twister, PCG64) and best practices for seeds, reproducibility, and parallelism
   • **Connect**: How uniform variates feed into the inverse CDF method of the next section


The Probability Integral Transform
----------------------------------

Before examining how uniform variates are generated, we must understand *why* they occupy such a privileged position. The answer lies in a beautiful theoretical result that establishes a universal correspondence between the uniform distribution and all other continuous distributions.

The Theorem and Its Proof
~~~~~~~~~~~~~~~~~~~~~~~~~

.. admonition:: Theorem: Probability Integral Transform
   :class: note

   Let :math:`X` be a continuous random variable with cumulative distribution function :math:`F_X`. Then:

   1. **Forward direction**: :math:`U = F_X(X) \sim \text{Uniform}(0, 1)`
   2. **Inverse direction**: If :math:`U \sim \text{Uniform}(0, 1)`, then :math:`X = F_X^{-1}(U) \sim F_X`

   where :math:`F_X^{-1}(u) = \inf\{x : F_X(x) \geq u\}` is the generalized inverse (quantile function).

**Proof of Part 1**: Let :math:`U = F_X(X)`. We want to show that :math:`U` is uniformly distributed on :math:`(0, 1)`. For any :math:`u \in (0, 1)`:

.. math::

   P(U \leq u) = P(F_X(X) \leq u)

Since :math:`F_X` is continuous and strictly increasing (on the support of :math:`X`), applying :math:`F_X^{-1}` to both sides of the inequality preserves it:

.. math::

   P(F_X(X) \leq u) = P(X \leq F_X^{-1}(u)) = F_X(F_X^{-1}(u)) = u

Thus :math:`P(U \leq u) = u` for all :math:`u \in (0, 1)`, which is precisely the CDF of :math:`\text{Uniform}(0, 1)`. ∎

**Proof of Part 2**: Let :math:`U \sim \text{Uniform}(0, 1)` and define :math:`X = F_X^{-1}(U)`. For any :math:`x`:

.. math::

   P(X \leq x) = P(F_X^{-1}(U) \leq x) = P(U \leq F_X(x)) = F_X(x)

The second equality uses the fact that :math:`F_X^{-1}(u) \leq x` if and only if :math:`u \leq F_X(x)` (property of generalized inverses). The third equality follows because :math:`U` is uniform. Thus :math:`X` has CDF :math:`F_X`. ∎

Universal Currency of Randomness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Probability Integral Transform has a profound practical consequence: **if we can generate uniform random numbers, we can generate any random variable**. The uniform distribution is the "universal currency" of randomness—any desired distribution can be "purchased" by applying the appropriate inverse CDF to uniform variates.

Consider what this means algorithmically:

- To generate :math:`\text{Exponential}(\lambda)`: compute :math:`-\ln(U)/\lambda`
- To generate :math:`\text{Cauchy}(\mu, \sigma)`: compute :math:`\mu + \sigma \tan(\pi(U - 1/2))`
- To generate any continuous :math:`F`: compute :math:`F^{-1}(U)`

This universality explains why so much effort has been devoted to generating high-quality uniform variates. Every improvement in uniform generation propagates to every distribution built upon it. Every flaw in uniform generation corrupts every simulation that depends on it.

For discrete distributions, the same principle applies with minor modification. If :math:`X` takes values :math:`x_1 < x_2 < \cdots` with probabilities :math:`p_1, p_2, \ldots`, we set :math:`X = x_k` whenever :math:`U \in [F(x_{k-1}), F(x_k))`, where :math:`F(x_0) = 0`. The generalized inverse :math:`F^{-1}(u) = \inf\{x : F(x) \geq u\}` handles both continuous and discrete cases seamlessly.


The Paradox of Computational Randomness
---------------------------------------

With the importance of uniform variates established, we face a fundamental question: how do computers generate them? The honest answer is unsettling—computers cannot generate truly random numbers through algorithmic means. They produce *pseudo-random* numbers: sequences that are entirely deterministic yet pass statistical tests for randomness.

Von Neumann's Confession
~~~~~~~~~~~~~~~~~~~~~~~~

John von Neumann, one of the founding figures of both computer science and Monte Carlo methods, captured this paradox memorably:

   "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number—there are only methods of producing random numbers, and a strict arithmetic procedure of course is not such a method."

Yet von Neumann himself used such "sinful" methods extensively in his computational work. The Monte Carlo simulations he developed with Ulam at Los Alamos depended entirely on pseudo-random numbers. How could a method built on a logical contradiction prove so useful?

The resolution lies in recognizing that we do not need *true* randomness for most purposes—we need *statistical* randomness. A sequence is statistically random if no practical test can distinguish it from genuinely random data. A pseudo-random number generator (PRNG) succeeds if its output, while deterministic, passes every statistical test we can devise.

This is a functional definition: we accept a PRNG if it is not rejected by our tests. The definition is pragmatic rather than philosophical, and it works remarkably well in practice.

Von Neumann's Middle-Square Method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Von Neumann proposed one of the earliest PRNGs in 1946: the **middle-square method**. The algorithm is simple:

1. Start with an :math:`n`-digit number (the seed)
2. Square it to obtain a :math:`2n`-digit number
3. Extract the middle :math:`n` digits as the next number
4. Repeat

For example, with :math:`n = 4` and seed 1234:

.. math::

   1234^2 &= 01522756 \to \text{middle 4 digits} = 5227 \\
   5227^2 &= 27321529 \to \text{middle 4 digits} = 3215 \\
   3215^2 &= 10336225 \to \text{middle 4 digits} = 3362 \\
   &\vdots

The method is elegant but deeply flawed. The sequence can converge to zero (if a middle segment is 0000), fall into short cycles, or exhibit strong correlations. Von Neumann knew these limitations—he called the method a "very crude" approach suitable only for quick calculations. But it established the paradigm: use arithmetic operations to generate sequences that *appear* random.

.. code-block:: python

   def middle_square(seed, n_digits=4, n_samples=20):
       """
       Von Neumann's middle-square method (historical; do not use in practice).

       Parameters
       ----------
       seed : int
           Starting value (n_digits digits).
       n_digits : int
           Number of digits in each generated number.
       n_samples : int
           Number of values to generate.

       Returns
       -------
       list
           Sequence of pseudo-random integers.
       """
       sequence = [seed]
       x = seed

       for _ in range(n_samples - 1):
           x_squared = x ** 2
           # Pad to 2*n_digits, extract middle n_digits
           x_str = str(x_squared).zfill(2 * n_digits)
           start = n_digits // 2
           x = int(x_str[start:start + n_digits])
           sequence.append(x)

           if x == 0:  # Degenerate: will stay at 0
               break

       return sequence

   # Demonstrate the method
   seq = middle_square(1234, n_digits=4, n_samples=15)
   print("Middle-square sequence:", seq)

   # Show it can degenerate
   bad_seq = middle_square(540, n_digits=4, n_samples=15)
   print("Degenerating sequence:", bad_seq)

Running this code reveals the method's instability. Some seeds produce reasonable-looking sequences; others quickly degenerate to zero or short cycles.


Chaotic Dynamical Systems: An Instructive Failure
-------------------------------------------------

Given the limitations of simple arithmetic methods, researchers explored whether *chaotic dynamical systems*—deterministic systems exhibiting sensitive dependence on initial conditions—might serve as random number generators. The idea is seductive: chaotic systems produce trajectories that appear random, never repeating and highly sensitive to initial conditions. Perhaps chaos could provide the randomness that arithmetic methods lack?

The answer, disappointingly, is no. Chaotic systems fail as PRNGs for subtle but important reasons.

The Logistic Map
~~~~~~~~~~~~~~~~

The **logistic map** is perhaps the most famous chaotic system:

.. math::

   X_{n+1} = \alpha X_n (1 - X_n)

For certain values of :math:`\alpha`, particularly :math:`\alpha = 4`, this simple recurrence produces chaotic behavior. Starting from almost any :math:`X_0 \in (0, 1)`, the sequence :math:`X_1, X_2, \ldots` bounces unpredictably around the interval, never settling into a pattern.

Remarkably, for :math:`\alpha = 4`, the stationary distribution of the logistic map is known analytically: it is the **arcsine distribution** with density:

.. math::

   f(x) = \frac{1}{\pi\sqrt{x(1-x)}}, \quad 0 < x < 1

This density concentrates mass near 0 and 1, not uniformly across :math:`[0, 1]`. However, applying the probability integral transform :math:`Y_n = F(X_n) = \frac{1}{2} + \frac{\arcsin(2X_n - 1)}{\pi}` should yield uniformly distributed :math:`Y_n`.

.. code-block:: python

   import numpy as np
   import matplotlib.pyplot as plt

   def logistic_map_generator(x0, alpha=4.0, n_samples=10000):
       """
       Generate pseudo-random numbers using the logistic map.

       Parameters
       ----------
       x0 : float
           Initial value in (0, 1).
       alpha : float
           Logistic map parameter (use 4.0 for chaos).
       n_samples : int
           Number of samples to generate.

       Returns
       -------
       X : ndarray
           Raw logistic map values (arcsine distributed).
       Y : ndarray
           Transformed values (should be uniform).
       """
       X = np.zeros(n_samples)
       X[0] = x0

       for i in range(1, n_samples):
           X[i] = alpha * X[i-1] * (1 - X[i-1])

       # Transform to uniform via arcsine CDF
       Y = 0.5 + np.arcsin(2 * X - 1) / np.pi

       return X, Y

   # Generate samples
   X, Y = logistic_map_generator(0.1, alpha=4.0, n_samples=10000)

   # Visualize
   fig, axes = plt.subplots(1, 3, figsize=(14, 4))

   # Histogram of transformed values
   axes[0].hist(Y, bins=50, density=True, alpha=0.7, edgecolor='black')
   axes[0].axhline(1, color='red', linestyle='--', label='Uniform density')
   axes[0].set_xlabel('Y (transformed)')
   axes[0].set_ylabel('Density')
   axes[0].set_title('Marginal Distribution of Y')
   axes[0].legend()

   # Lag-1 plot
   axes[1].scatter(Y[:-1], Y[1:], alpha=0.3, s=1)
   axes[1].set_xlabel('$Y_n$')
   axes[1].set_ylabel('$Y_{n+1}$')
   axes[1].set_title('Lag-1 Dependence (should fill square)')

   # Lag-100 plot
   axes[2].scatter(Y[:-100], Y[100:], alpha=0.3, s=1)
   axes[2].set_xlabel('$Y_n$')
   axes[2].set_ylabel('$Y_{n+100}$')
   axes[2].set_title('Lag-100 Dependence')

   plt.tight_layout()
   plt.show()

The Failure of Chaos
~~~~~~~~~~~~~~~~~~~~

Running the code above reveals the problem. While the marginal histogram of :math:`Y_n` looks roughly uniform, the lag-1 scatter plot :math:`(Y_n, Y_{n+1})` shows strong structure—the points do *not* fill the unit square uniformly. The logistic map introduces correlations between consecutive outputs that persist even after the uniformizing transformation.

The lag-100 plot :math:`(Y_n, Y_{n+100})` looks better, suggesting that values separated by many iterations are approximately independent. But this comes at an unacceptable cost: to generate :math:`n` independent-looking values, we must compute :math:`100n` iterations of the map. This defeats the purpose of using a fast deterministic generator.

The deeper issue is that chaotic dynamics and statistical independence are different properties. A chaotic system exhibits *sensitive dependence on initial conditions*—nearby starting points diverge exponentially over time. But this says nothing about the joint distribution of :math:`(X_n, X_{n+1})`. The deterministic relationship :math:`X_{n+1} = 4X_n(1-X_n)` creates strong dependencies that no transformation can remove.

The Tent Map: Another Cautionary Tale
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The **tent map** provides another example:

.. math::

   D(x) = \begin{cases} 2x & \text{if } x \leq 1/2 \\ 2(1-x) & \text{if } x > 1/2 \end{cases}

Theoretically, the tent map preserves the uniform distribution: if :math:`X_n \sim \text{Uniform}(0,1)`, then :math:`X_{n+1} = D(X_n) \sim \text{Uniform}(0,1)`. This seems ideal for a PRNG.

In practice, the tent map fails catastrophically on computers. The problem is finite-precision arithmetic. Each application of :math:`D` effectively shifts the binary representation of :math:`x` by one bit—multiplying by 2 and taking the fractional part. After :math:`k` iterations, the :math:`k` least significant bits of the original :math:`x` have been shifted away. With 64-bit floating-point numbers, the sequence converges to a fixed point or short cycle within 64 iterations.

.. code-block:: python

   def tent_map(x0, n_samples=100):
       """Demonstrate tent map degeneration."""
       X = [x0]
       x = x0
       for _ in range(n_samples - 1):
           if x <= 0.5:
               x = 2 * x
           else:
               x = 2 * (1 - x)
           X.append(x)
           if x == 0 or x == 1:  # Fixed point
               break
       return X

   # Watch it degenerate
   seq = tent_map(0.3, n_samples=70)
   print(f"Sequence length before degeneration: {len(seq)}")
   print(f"Final values: {seq[-5:]}")

The lesson is clear: **theoretical properties of continuous dynamical systems do not translate to discrete computer arithmetic**. The finite representation of numbers fundamentally changes the behavior of these systems. Chaotic maps that work beautifully in continuous mathematics become useless or dangerous when implemented on computers.

.. admonition:: Common Pitfall ⚠️
   :class: warning

   **Chaotic maps are not PRNGs**: Despite their apparent randomness, chaotic dynamical systems like the logistic map or tent map fail as pseudo-random number generators. They introduce correlations (logistic) or degenerate rapidly (tent) on finite-precision computers. Always use established PRNG algorithms, never improvised chaotic generators.


Linear Congruential Generators
------------------------------

The first successful class of PRNGs was the **linear congruential generator** (LCG), introduced by D.H. Lehmer in 1948. LCGs dominated random number generation for decades and remain important today for understanding PRNG design—both its successes and its pitfalls.

The Algorithm
~~~~~~~~~~~~~

An LCG produces integers :math:`X_0, X_1, X_2, \ldots` via the recurrence:

.. math::
   :label: lcg-recurrence

   X_{n+1} = (a X_n + c) \mod m

where:

- :math:`m > 0` is the **modulus** (often :math:`2^{32}` or :math:`2^{64}`)
- :math:`a` with :math:`0 < a < m` is the **multiplier**
- :math:`c` with :math:`0 \leq c < m` is the **increment**
- :math:`X_0` is the **seed** (initial state)

The output uniform variates are :math:`U_n = X_n / m \in [0, 1)`.

The appeal of LCGs is computational efficiency: each iteration requires only a multiplication, addition, and modulo operation. On modern hardware, this executes in a few clock cycles. The entire state fits in a single integer.

Period and the Hull-Dobell Theorem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A critical property of any PRNG is its **period**—the number of values generated before the sequence repeats. Since an LCG has state :math:`X_n \in \{0, 1, \ldots, m-1\}`, its period cannot exceed :math:`m`. The question is: when does it achieve the maximum period?

.. admonition:: Theorem: Hull-Dobell (1962)
   :class: note

   The LCG with modulus :math:`m`, multiplier :math:`a`, and increment :math:`c` has period :math:`m` (the maximum) if and only if:

   1. :math:`\gcd(c, m) = 1` (c and m are coprime)
   2. :math:`a - 1` is divisible by all prime factors of :math:`m`
   3. If :math:`m` is divisible by 4, then :math:`a - 1` is divisible by 4

For the common choice :math:`m = 2^{32}`:

- Condition 1 requires :math:`c` to be odd
- Condition 2 requires :math:`a \equiv 1 \pmod{2}`, i.e., :math:`a` odd
- Condition 3 requires :math:`a \equiv 1 \pmod{4}`

Thus any :math:`a \equiv 1 \pmod{4}` with odd :math:`c` achieves full period :math:`2^{32}`.

The Lattice Structure Problem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Achieving full period is necessary but not sufficient for a good generator. LCGs have a fundamental flaw that limits their usefulness: **outputs fall on a lattice in high-dimensional space**.

Consider plotting consecutive pairs :math:`(U_n, U_{n+1})`. From the recurrence:

.. math::

   U_{n+1} = \frac{X_{n+1}}{m} = \frac{aX_n + c}{m} \mod 1 = \frac{a}{m} \cdot m U_n + \frac{c}{m} \mod 1

The points :math:`(U_n, U_{n+1})` lie on lines of the form :math:`y = (a/m) x + c/m \mod 1`. In fact, all such points fall on at most :math:`\sqrt{m}` parallel lines in :math:`[0,1]^2`. In three dimensions, consecutive triples :math:`(U_n, U_{n+1}, U_{n+2})` fall on parallel planes; in :math:`d` dimensions, on parallel hyperplanes.

This lattice structure is not random at all—it is a highly regular geometric pattern masquerading as uniform coverage.

.. figure:: https://pqyjaywwccbnqpwgeiuv.supabase.co/storage/v1/object/public/STAT%20418%20Images/assets/PartII/Chapter2/Monte%20Carlo%20Methods_slide005_img003_0008b89d.png
   :alt: NIST logo representing standards for random number testing
   :align: center
   :width: 40%

   **Standards for Randomness.** The National Institute of Standards and Technology (NIST) maintains rigorous test suites for evaluating random number generators. These tests can detect the lattice structure in LCGs and other statistical anomalies. Production generators must pass hundreds of such tests.

The RANDU Disaster
~~~~~~~~~~~~~~~~~~

The most infamous example of LCG failure is **RANDU**, developed by IBM in the 1960s and widely used on IBM mainframes:

.. math::

   X_{n+1} = 65539 \cdot X_n \mod 2^{31}

RANDU satisfies Hull-Dobell conditions and achieves full period :math:`2^{31}`. It was fast and convenient. It was also catastrophically bad.

The problem lies in an algebraic relationship. Note that :math:`65539 = 2^{16} + 3`. A little algebra shows:

.. math::

   X_{n+2} = 6 X_{n+1} - 9 X_n \mod 2^{31}

This means consecutive triples :math:`(X_n, X_{n+1}, X_{n+2})` satisfy a *linear* relationship. When plotted in 3D, all points fall on exactly **15 parallel planes**—not remotely uniform.

.. code-block:: python

   import numpy as np
   import matplotlib.pyplot as plt
   from mpl_toolkits.mplot3d import Axes3D

   def randu_generator(seed, n_samples):
       """
       The infamous RANDU generator (DO NOT USE for real work).

       Parameters
       ----------
       seed : int
           Odd initial value.
       n_samples : int
           Number of values to generate.

       Returns
       -------
       ndarray
           Pseudo-random values in [0, 1).
       """
       X = np.zeros(n_samples, dtype=np.int64)
       X[0] = seed

       for i in range(1, n_samples):
           X[i] = (65539 * X[i-1]) % (2**31)

       return X / (2**31)

   # Generate samples and plot in 3D
   U = randu_generator(seed=1, n_samples=10000)

   fig = plt.figure(figsize=(12, 5))

   # RANDU: the 15 planes are visible
   ax1 = fig.add_subplot(121, projection='3d')
   ax1.scatter(U[:-2], U[1:-1], U[2:], alpha=0.3, s=1)
   ax1.set_xlabel('$U_n$')
   ax1.set_ylabel('$U_{n+1}$')
   ax1.set_zlabel('$U_{n+2}$')
   ax1.set_title('RANDU: Points Fall on 15 Planes')
   ax1.view_init(elev=10, azim=30)  # Angle to see planes

   # Good generator for comparison
   rng = np.random.default_rng(42)
   V = rng.random(10000)

   ax2 = fig.add_subplot(122, projection='3d')
   ax2.scatter(V[:-2], V[1:-1], V[2:], alpha=0.3, s=1)
   ax2.set_xlabel('$U_n$')
   ax2.set_ylabel('$U_{n+1}$')
   ax2.set_zlabel('$U_{n+2}$')
   ax2.set_title('PCG64: Uniform Coverage')
   ax2.view_init(elev=10, azim=30)

   plt.tight_layout()
   plt.show()

   # Verify the algebraic relationship
   X = (randu_generator(1, 1000) * 2**31).astype(np.int64)
   check = (6 * X[1:-1] - 9 * X[:-2]) % (2**31)
   print(f"X[n+2] = 6*X[n+1] - 9*X[n] mod 2^31: {np.all(check == X[2:])}")

Research conducted with RANDU produced spurious results for years. Any Monte Carlo simulation using three or more consecutive uniform values—integration in 3D, simulating 3D random walks, sampling three-dimensional distributions—was corrupted by this hidden structure. The lesson: **statistical tests matter**, and a generator is only as good as the tests it has passed.

.. admonition:: Example 💡 Visualizing LCG Lattice Structure
   :class: note

   **Given**: The LCG with :math:`X_{n+1} = 69069 X_n \mod 2^{32}` (a better generator than RANDU, but still showing lattice structure).

   **Task**: Visualize the lattice structure in 2D.

   **Analysis**: The multiplier 69069 produces lines with slope :math:`69069/2^{32}`. When we plot :math:`(U_n, U_{n+1})`, points cluster on approximately :math:`\sqrt{2^{32}} \approx 65536` parallel lines—far more than RANDU's 15 planes, but still a lattice.

   **Code**:

   .. code-block:: python

      import numpy as np
      import matplotlib.pyplot as plt

      def lcg_69069(seed, n_samples):
          """LCG with multiplier 69069."""
          X = np.zeros(n_samples, dtype=np.uint64)
          X[0] = seed
          for i in range(1, n_samples):
              X[i] = (69069 * X[i-1]) % (2**32)
          return X / (2**32)

      U = lcg_69069(12345, 10000)

      fig, axes = plt.subplots(1, 2, figsize=(12, 5))

      # Full view: looks OK
      axes[0].scatter(U[:-1], U[1:], alpha=0.5, s=1)
      axes[0].set_xlabel('$U_n$')
      axes[0].set_ylabel('$U_{n+1}$')
      axes[0].set_title('LCG 69069: Full View')

      # Zoomed view: lattice visible
      mask = (U[:-1] < 0.01) & (U[1:] < 0.01)
      axes[1].scatter(U[:-1][mask], U[1:][mask], alpha=0.7, s=10)
      axes[1].set_xlabel('$U_n$')
      axes[1].set_ylabel('$U_{n+1}$')
      axes[1].set_title('LCG 69069: Zoomed (Lattice Visible)')

      plt.tight_layout()
      plt.show()

   **Interpretation**: At full scale, the LCG output looks uniform. But zooming into a small region reveals the underlying lattice structure—points fall on discrete lines rather than filling the region continuously.


Shift-Register Generators
-------------------------

To overcome the limitations of LCGs, researchers developed generators based on different mathematical structures. **Shift-register generators** exploit the binary representation of integers, using bitwise operations rather than arithmetic.

Binary Representation and Bit Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In a computer, an integer :math:`X` is represented as a sequence of binary digits (bits):

.. math::

   X = \sum_{i=0}^{k-1} e_i \cdot 2^i = (e_{k-1}, e_{k-2}, \ldots, e_1, e_0)_2

where each :math:`e_i \in \{0, 1\}`. Shift-register generators manipulate this representation directly using operations like:

- **Left shift** :math:`L`: Shift bits left, inserting 0 on the right. :math:`L(e_1, \ldots, e_k) = (e_2, \ldots, e_k, 0)`
- **Right shift** :math:`R`: Shift bits right, inserting 0 on the left. :math:`R(e_1, \ldots, e_k) = (0, e_1, \ldots, e_{k-1})`
- **XOR** :math:`\oplus`: Bitwise exclusive-or. :math:`e_i \oplus f_i = (e_i + f_i) \mod 2`

These operations are extremely fast on modern processors—often single clock cycles.

The Xorshift Family
~~~~~~~~~~~~~~~~~~~

A simple shift-register generator is the **xorshift**, which combines shifts and XOR:

.. code-block:: python

   def xorshift32(state):
       """One step of xorshift32."""
       state ^= (state << 13) & 0xFFFFFFFF
       state ^= (state >> 17)
       state ^= (state << 5) & 0xFFFFFFFF
       return state

The sequence generated by repeated application has period :math:`2^{32} - 1` (all nonzero 32-bit patterns). More sophisticated variants like **xorshift128+** and **xoshiro256++** achieve better statistical properties and longer periods.

Linear Feedback Shift Registers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The mathematical framework for shift-register generators involves **linear feedback shift registers** (LFSRs). An LFSR of length :math:`k` is defined by a :math:`k \times k` binary matrix :math:`T` operating on :math:`k`-bit states:

.. math::

   X_{n+1} = T \cdot X_n \mod 2

where all operations are in :math:`\mathbb{F}_2` (the field with two elements, where :math:`1 + 1 = 0`).

The matrices used in practice have special structure. For example, the matrix:

.. math::

   T_R = I + R = \begin{pmatrix}
   1 & 1 & 0 & \cdots & 0 \\
   0 & 1 & 1 & \cdots & 0 \\
   \vdots & & \ddots & \ddots & \vdots \\
   0 & 0 & \cdots & 1 & 1 \\
   0 & 0 & \cdots & 0 & 1
   \end{pmatrix}

combines the identity with a right-shift, implementing :math:`X_{n+1} = X_n \oplus (X_n \gg 1)`. Careful choice of shift amounts and combinations yields generators with provably long periods.


The KISS Generator: Combining Strategies
----------------------------------------

Rather than relying on a single generation technique, modern practice often combines multiple generators. The **KISS** generator ("Keep It Simple, Stupid") of Marsaglia and Zaman (1993) exemplifies this approach, combining congruential and shift-register methods.

Components of KISS
~~~~~~~~~~~~~~~~~~

KISS runs three sequences in parallel:

**1. Congruential component** :math:`(I_n)`:

.. math::

   I_{n+1} = (69069 \times I_n + 23606797) \mod 2^{32}

This is a standard LCG with full period :math:`2^{32}`.

**2. First shift-register component** :math:`(J_n)`:

.. math::

   J_{n+1} = (I + L^{15})(I + R^{17}) J_n \mod 2^{32}

where :math:`L^{15}` is a left shift by 15 bits and :math:`R^{17}` is a right shift by 17 bits. Period: :math:`2^{32} - 2^{21} - 2^{11} + 1`.

**3. Second shift-register component** :math:`(K_n)`:

.. math::

   K_{n+1} = (I + L^{13})(I + R^{18}) K_n \mod 2^{31}

Period: :math:`2^{31} - 1`.

**Combined output**:

.. math::

   X_{n+1} = (I_{n+1} + J_{n+1} + K_{n+1}) \mod 2^{32}

Why Combination Works
~~~~~~~~~~~~~~~~~~~~~

The three components have periods that are pairwise coprime (no common factors). By the Chinese Remainder Theorem, the combined generator has period equal to the product:

.. math::

   \text{Period} \approx 2^{32} \times (2^{32} - 2^{21} - 2^{11}) \times (2^{31} - 1) \approx 2^{95}

This exceeds :math:`10^{28}`—far beyond any practical simulation length.

More importantly, combination breaks the lattice structure of the individual components. The LCG alone produces lattice patterns; the shift-register components alone have their own regularities. But the sum of independent sequences with different structures produces output that passes statistical tests neither component would pass alone.

.. code-block:: python

   import numpy as np

   class KISSGenerator:
       """
       The KISS pseudo-random number generator.

       Combines congruential and shift-register generators
       for excellent statistical properties.
       """

       def __init__(self, seed_i=12345, seed_j=34567, seed_k=56789):
           """Initialize with three seeds."""
           self.i = np.uint32(seed_i)
           self.j = np.uint32(seed_j)
           self.k = np.uint32(seed_k) & np.uint32(0x7FFFFFFF)  # 31-bit

       def _step(self):
           """Advance the generator by one step."""
           # Congruential component
           self.i = np.uint32(69069 * self.i + 23606797)

           # First shift-register (xorshift)
           self.j ^= (self.j << 17)
           self.j ^= (self.j >> 15)

           # Second shift-register
           self.k ^= (self.k << 18) & np.uint32(0x7FFFFFFF)
           self.k ^= (self.k >> 13)

           # Combine
           return np.uint32(self.i + self.j + self.k)

       def random(self, size=None):
           """Generate uniform random values in [0, 1)."""
           if size is None:
               return self._step() / 2**32
           else:
               return np.array([self._step() / 2**32 for _ in range(size)])

   # Demonstrate
   kiss = KISSGenerator(seed_i=1, seed_j=2, seed_k=3)
   samples = kiss.random(10)
   print("KISS samples:", samples.round(6))


Modern Generators: Mersenne Twister and PCG
-------------------------------------------

Contemporary statistical software uses generators that have been rigorously tested against extensive test suites. The two most important are the **Mersenne Twister** and the **Permuted Congruential Generator** (PCG).

Mersenne Twister (MT19937)
~~~~~~~~~~~~~~~~~~~~~~~~~~

Developed by Matsumoto and Nishimura in 1997, the Mersenne Twister became the de facto standard for statistical computing. Python's ``random`` module uses it by default.

**Key properties**:

- **Period**: :math:`2^{19937} - 1`, a Mersenne prime—hence the name. This is approximately :math:`10^{6001}`, inconceivably larger than any simulation could exhaust.
- **State size**: 624 × 32-bit integers (19,968 bits). The large state enables the enormous period.
- **Equidistribution**: 623-dimensional equidistribution. Consecutive 623-tuples of outputs are guaranteed to be uniformly distributed in :math:`[0,1)^{623}`.
- **Speed**: Fast, using only bitwise operations and array lookups.

**Limitations**:

- Large state makes it memory-intensive for applications requiring many independent streams.
- Not cryptographically secure: given 624 consecutive outputs, the entire state can be reconstructed, allowing prediction of all future values.
- Fails some newer, more stringent statistical tests (though these failures are rarely significant in practice).

Permuted Congruential Generator (PCG64)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

PCG, developed by Melissa O'Neill in 2014, has become NumPy's default generator. It addresses Mersenne Twister's limitations while maintaining excellent statistical properties.

**Key properties**:

- **Period**: :math:`2^{128}`. While smaller than MT19937's period, this is still :math:`10^{38}`—enough for any practical simulation.
- **State size**: 128 bits. Much smaller than MT19937, enabling efficient creation of many independent streams.
- **Statistical quality**: Passes BigCrush and PractRand test suites. Better than MT19937 on several metrics.
- **Speed**: Faster than MT19937 on modern 64-bit processors.
- **Jumpability**: Can efficiently skip ahead by any number of steps, enabling parallel stream creation without generating intervening values.

**The PCG algorithm** combines a simple LCG with a permutation function that destroys the LCG's lattice structure:

1. Advance the internal state using an LCG
2. Apply a carefully designed permutation to produce the output

The permutation "scrambles" the regularities of the LCG, producing output that passes tests the raw LCG would fail.

.. code-block:: python

   import numpy as np
   from numpy.random import default_rng, Generator, MT19937, PCG64

   # NumPy's default is PCG64
   rng_default = default_rng(42)
   print(f"Default generator: {rng_default.bit_generator}")

   # Explicitly use Mersenne Twister
   rng_mt = Generator(MT19937(42))
   print(f"Mersenne Twister: {rng_mt.bit_generator}")

   # Explicitly use PCG64
   rng_pcg = Generator(PCG64(42))
   print(f"PCG64: {rng_pcg.bit_generator}")

   # All produce high-quality uniform variates
   print("\nSample outputs:")
   print(f"Default: {rng_default.random(5).round(6)}")
   print(f"MT:      {rng_mt.random(5).round(6)}")
   print(f"PCG64:   {rng_pcg.random(5).round(6)}")


Statistical Testing of Random Number Generators
-----------------------------------------------

How do we know a generator is "good"? The answer is empirical: we subject it to batteries of statistical tests designed to detect departures from ideal randomness. A generator is acceptable if it passes all tests in the battery.

Fundamental Tests
~~~~~~~~~~~~~~~~~

**Chi-square test for uniformity**: Divide :math:`[0, 1)` into :math:`k` equal bins. Count the number of samples :math:`O_i` in each bin. Under uniformity, the expected count is :math:`E_i = n/k`. The test statistic:

.. math::

   \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

follows approximately :math:`\chi^2_{k-1}` under the null hypothesis of uniformity.

.. code-block:: python

   import numpy as np
   from scipy import stats

   def chi_square_uniformity_test(samples, n_bins=100):
       """
       Test uniformity using chi-square goodness-of-fit.

       Parameters
       ----------
       samples : ndarray
           Uniform samples in [0, 1).
       n_bins : int
           Number of bins.

       Returns
       -------
       dict
           Test statistic, p-value, and pass/fail at α=0.01.
       """
       observed, _ = np.histogram(samples, bins=n_bins, range=(0, 1))
       expected = len(samples) / n_bins

       chi2_stat = np.sum((observed - expected)**2 / expected)
       p_value = 1 - stats.chi2.cdf(chi2_stat, df=n_bins - 1)

       return {
           'chi2_statistic': chi2_stat,
           'p_value': p_value,
           'pass': p_value > 0.01
       }

   # Test NumPy's generator
   rng = np.random.default_rng(42)
   samples = rng.random(100_000)
   result = chi_square_uniformity_test(samples)
   print(f"Chi-square statistic: {result['chi2_statistic']:.2f}")
   print(f"P-value: {result['p_value']:.4f}")
   print(f"Pass: {result['pass']}")

**Kolmogorov-Smirnov test**: Compares the empirical CDF to the theoretical uniform CDF. More powerful than chi-square for detecting certain departures.

**Serial correlation test**: Computes the correlation between :math:`U_t` and :math:`U_{t+k}` for various lags :math:`k`. For independent samples, correlations should be approximately zero.

**Runs test**: Counts runs of consecutive increasing or decreasing values. Too few runs suggests correlation; too many suggests alternation.

Test Suites
~~~~~~~~~~~

Individual tests can miss problems that batteries of tests catch. Major test suites include:

**Diehard** (Marsaglia, 1995): A collection of 15 tests, once the standard for PRNG evaluation. Now considered somewhat dated.

**TestU01** (L'Ecuyer and Simard, 2007): Comprehensive suite with three batteries:
- SmallCrush: 10 tests, quick screening
- Crush: 96 tests, thorough evaluation
- BigCrush: 160 tests, the gold standard

**PractRand** (Doty-Humphrey): Extremely stringent, running tests at exponentially increasing sample sizes until failure.

Modern generators like PCG64 pass all tests in these suites. Legacy generators like RANDU fail catastrophically.

Comprehensive Diagnostic Visualization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   import numpy as np
   import matplotlib.pyplot as plt
   from scipy import stats

   def comprehensive_prng_diagnostics(rng, n_samples=100_000, generator_name="Generator"):
       """
       Create comprehensive diagnostic plots for a PRNG.

       Parameters
       ----------
       rng : Generator
           NumPy random generator object.
       n_samples : int
           Number of samples to generate.
       generator_name : str
           Name for plot titles.

       Returns
       -------
       fig : Figure
           Matplotlib figure with diagnostic plots.
       """
       samples = rng.random(n_samples)

       fig, axes = plt.subplots(2, 3, figsize=(15, 10))
       fig.suptitle(f'PRNG Diagnostics: {generator_name}', fontsize=14)

       # 1. Histogram (uniformity)
       ax = axes[0, 0]
       ax.hist(samples, bins=50, density=True, alpha=0.7, edgecolor='black')
       ax.axhline(1, color='red', linestyle='--', linewidth=2, label='Uniform')
       ax.set_xlabel('Value')
       ax.set_ylabel('Density')
       ax.set_title('Histogram (should be flat)')
       ax.legend()

       # 2. Lag-1 scatter plot (independence)
       ax = axes[0, 1]
       ax.scatter(samples[:-1], samples[1:], alpha=0.1, s=1)
       ax.set_xlabel('$U_n$')
       ax.set_ylabel('$U_{n+1}$')
       ax.set_title('Lag-1 Plot (should fill square)')
       ax.set_aspect('equal')

       # 3. Running mean (stationarity)
       ax = axes[0, 2]
       running_mean = np.cumsum(samples) / np.arange(1, n_samples + 1)
       ax.plot(running_mean, linewidth=0.5)
       ax.axhline(0.5, color='red', linestyle='--', linewidth=2)
       ax.set_xlabel('Sample')
       ax.set_ylabel('Running Mean')
       ax.set_title('Running Mean → 0.5')
       ax.set_xscale('log')

       # 4. Autocorrelation
       ax = axes[1, 0]
       max_lag = 50
       autocorr = [np.corrcoef(samples[:-k], samples[k:])[0, 1]
                   for k in range(1, max_lag + 1)]
       ax.bar(range(1, max_lag + 1), autocorr, color='steelblue')
       ax.axhline(1.96/np.sqrt(n_samples), color='red', linestyle='--')
       ax.axhline(-1.96/np.sqrt(n_samples), color='red', linestyle='--')
       ax.set_xlabel('Lag')
       ax.set_ylabel('Autocorrelation')
       ax.set_title('Autocorrelation (should be ~0)')

       # 5. P-P plot
       ax = axes[1, 1]
       sorted_samples = np.sort(samples)
       theoretical = np.linspace(0, 1, n_samples)
       empirical = np.arange(1, n_samples + 1) / n_samples
       ax.plot(theoretical, empirical, 'b-', linewidth=1)
       ax.plot([0, 1], [0, 1], 'r--', linewidth=2)
       ax.set_xlabel('Theoretical Quantiles')
       ax.set_ylabel('Empirical Quantiles')
       ax.set_title('P-P Plot (should follow diagonal)')

       # 6. Distribution of gaps
       ax = axes[1, 2]
       gaps = np.diff(np.sort(samples))
       # Gaps between sorted uniforms ~ Exponential(n)
       ax.hist(gaps * n_samples, bins=50, density=True, alpha=0.7, edgecolor='black')
       x = np.linspace(0, 5, 100)
       ax.plot(x, np.exp(-x), 'r-', linewidth=2, label='Exp(1)')
       ax.set_xlabel('Normalized Gap')
       ax.set_ylabel('Density')
       ax.set_title('Gap Distribution')
       ax.legend()

       plt.tight_layout()
       return fig

   # Generate diagnostics
   rng = np.random.default_rng(42)
   fig = comprehensive_prng_diagnostics(rng, n_samples=100_000, generator_name="PCG64")
   plt.show()


Practical Considerations
------------------------

With the theory established, we turn to practical guidance for using random number generators in statistical computing.

Seeds and Reproducibility
~~~~~~~~~~~~~~~~~~~~~~~~~

A **seed** is the initial state of a PRNG. Given the same seed, a PRNG produces exactly the same sequence. This determinism is essential for scientific reproducibility.

**Best practices**:

1. **Always set seeds explicitly**: Don't rely on arbitrary initialization.

   .. code-block:: python

      # Good: explicit seed
      rng = np.random.default_rng(42)

      # Bad: implicit/random initialization
      rng = np.random.default_rng()  # Different each run

2. **Document seeds**: Record the seed used in any analysis.

3. **Use descriptive seeds**: Consider using meaningful numbers (date codes, experiment IDs) that aid documentation.

4. **Separate concerns**: Use different seeds for different parts of a simulation.

Parallel Computing
~~~~~~~~~~~~~~~~~~

Parallel Monte Carlo requires independent random streams for each worker. Using the same seed for all workers produces identical sequences—useless. Using sequential seeds (1, 2, 3, ...) is slightly better but can produce correlated streams with some generators.

The correct approach uses **SeedSequence** to derive independent streams:

.. code-block:: python

   from numpy.random import default_rng, SeedSequence

   def create_parallel_generators(master_seed, n_workers):
       """
       Create independent generators for parallel workers.

       Parameters
       ----------
       master_seed : int
           Master seed for reproducibility.
       n_workers : int
           Number of parallel workers.

       Returns
       -------
       list
           List of independent Generator objects.
       """
       # Create master SeedSequence
       ss = SeedSequence(master_seed)

       # Spawn independent child sequences
       child_seeds = ss.spawn(n_workers)

       # Create generators from child seeds
       return [default_rng(s) for s in child_seeds]

   # Example: 8 workers
   generators = create_parallel_generators(master_seed=42, n_workers=8)

   # Each worker uses its own generator
   for i, rng in enumerate(generators):
       sample = rng.random(3)
       print(f"Worker {i}: {sample.round(4)}")

**PCG64's jumpability** offers an alternative: start all workers from the same seed, but "jump" each worker's generator ahead by a different amount. Since PCG64 can efficiently skip :math:`2^{64}` steps, workers can be spaced far apart in the sequence without computing intervening values.

When Default Generators Suffice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For nearly all statistical applications—Monte Carlo integration, bootstrapping, MCMC, cross-validation, neural network training—NumPy's default PCG64 is excellent. Its :math:`2^{128}` period exceeds any practical simulation, and it passes all standard statistical tests.

**You can trust the default when**:

- Running standard statistical analyses
- Simulation lengths are below :math:`10^{15}` (far beyond typical)
- Results don't require cryptographic security
- Using single-threaded or properly parallelized code

When to Use Specialized Approaches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cryptographic applications**: PRNGs are predictable given enough output. For security (encryption keys, authentication tokens), use Python's ``secrets`` module or hardware random number generators.

.. code-block:: python

   import secrets

   # Cryptographically secure random integer
   secure_int = secrets.randbelow(1000)

   # Secure random bytes
   secure_bytes = secrets.token_bytes(32)

   # Secure random URL-safe string
   secure_token = secrets.token_urlsafe(32)

**Hardware random number generators**: For true randomness (not pseudo-random), some applications use physical phenomena:
- Thermal noise in resistors
- Radioactive decay timing
- Quantum vacuum fluctuations

Linux provides ``/dev/random`` and ``/dev/urandom`` interfaces to hardware entropy sources.

**Quasi-random sequences**: For some integration problems, **quasi-random** (low-discrepancy) sequences like Sobol or Halton outperform pseudo-random sequences. These are deterministic sequences designed to fill space uniformly, achieving :math:`O(n^{-1} \log^d n)` convergence vs Monte Carlo's :math:`O(n^{-1/2})`.

.. admonition:: Common Pitfall ⚠️
   :class: warning

   **Sharing generators across threads**: Never let multiple threads access the same Generator without synchronization. Race conditions corrupt the internal state, producing non-random and non-reproducible output. Always create one generator per thread using ``SeedSequence.spawn()``.


Bringing It All Together
------------------------

Uniform random variates are the foundation of computational randomness. Every random sample, every Monte Carlo estimate, every stochastic simulation ultimately depends on sequences of uniform values that behave as if they were independent draws from :math:`[0, 1)`.

The generation of such sequences is both theoretically subtle and practically well-solved. Theoretical subtlety comes from the paradox of computational randomness—deterministic algorithms cannot produce true randomness—and from the many ways generators can fail (chaotic maps, LCG lattice structure, RANDU's planes). Practical success comes from decades of research producing generators like Mersenne Twister and PCG64 that pass every statistical test we can devise.

The key insights from this section:

1. **The Probability Integral Transform** establishes uniform variates as the universal source: transform uniforms via :math:`F^{-1}` to generate any distribution.

2. **Pseudo-randomness works** despite its logical impossibility, because statistical randomness—passing all practical tests—suffices for simulation.

3. **Generator design matters**: Naive approaches (chaotic maps, simple LCGs) fail in ways that can corrupt research. Always use established generators.

4. **Seeds enable reproducibility**: Set them explicitly, document them, and use ``SeedSequence.spawn()`` for parallel work.

Transition to What Follows
--------------------------

With a supply of high-quality uniform variates in hand, we are ready for the central technique of random variable generation: the **inverse CDF method**. The next section (:ref:`ch2.3-inverse-cdf-method`) shows how to transform uniform variates into samples from any distribution whose quantile function we can compute.

For distributions with closed-form inverse CDFs—exponential, Weibull, Cauchy, Pareto—the method is elegant and efficient. For discrete distributions, we will develop efficient algorithms (binary search, the alias method) to locate the correct outcome quickly. And for distributions without tractable inverse CDFs—most notably the normal—we will see specialized alternatives in subsequent sections.

The uniform variates we now know how to generate are the raw material; the inverse CDF method is the machinery that shapes them into any distributional form we need.


.. admonition:: Key Takeaways 📝
   :class: important

   1. **Core concept**: Uniform variates are the universal source of randomness via the Probability Integral Transform. Any distribution can be generated by applying the inverse CDF to uniform samples.

   2. **Paradox resolved**: Computers generate pseudo-random numbers—deterministic sequences that pass statistical tests for randomness. This suffices for virtually all statistical applications.

   3. **Historical lessons**: Chaotic dynamical systems fail as PRNGs (correlation, degeneration). Linear congruential generators produce lattice structure in high dimensions. RANDU's 15 planes corrupted research for years.

   4. **Modern solutions**: Mersenne Twister (period :math:`2^{19937}-1`, 623-dim equidistribution) and PCG64 (period :math:`2^{128}`, small state, jumpable) pass all standard tests. Use NumPy's defaults confidently.

   5. **Practical wisdom**: Set seeds for reproducibility. Use ``SeedSequence.spawn()`` for parallel computing. Reserve ``secrets`` for cryptographic needs.

   6. **Outcome alignment**: Understanding uniform generation (Learning Outcome 1) provides the foundation for all subsequent simulation techniques—inverse CDF, rejection sampling, MCMC—that transform uniform variates into complex distributions.