.. _homework:

===================
Homework Assignments
===================

.. contents:: Contents
   :local:
   :depth: 1

Overview
========

Homework assignments in STAT 418 are designed to reinforce theoretical understanding through rigorous mathematical derivation and build computational fluency through Python implementation. Each assignment is worth **100 points** and integrates material from the corresponding textbook chapters.

Assignments typically involve a combination of:

* **Hand derivations**: Proving distributional identities, deriving estimator properties, computing moments and MGFs
* **Code implementations**: Sampling algorithms, Monte Carlo estimators, statistical tests
* **Graphing and visualization**: Distribution comparisons, convergence diagnostics, Q-Q plots, histograms
* **Written explanations**: Interpreting results, comparing methods, explaining why certain phenomena occur

Assignment Policies
-------------------

* **Weighting**: Homework constitutes 40% of the final grade
* **Points**: Each assignment is worth 100 points
* **Submissions**: 6–7 assignments throughout the semester; lowest score dropped
* **Cadence**: Approximately 2-week cycles
* **Late Policy**: Up to 3 days late with 20% penalty; no credit after 3 days
* **Collaboration**: Discussion encouraged; all submitted work must be your own
* **AI Tools**: Permitted for debugging, study, converting handwritten work to LaTeX/Word, and resource discovery; prohibited for generating solutions directly; all AI assistance must be disclosed

Recommended Workflow
--------------------

I strongly encourage the following workflow for completing assignments:

.. admonition:: Step-by-Step Approach
   :class: tip

   **1. Do All Hand Derivations on Paper or iPad First**
   
   Work through all mathematical derivations using pencil and paper or a tablet before touching a keyboard. This forces you to think through each step carefully and builds mathematical intuition that typing directly into LaTeX does not provide. There's no substitute for working by hand first.
   
   **2. Use an LLM to Convert Your Handwritten Work to LaTeX or Word**
   
   Once your derivations are complete and you're confident they're correct, use an LLM like ChatGPT or Claude to convert your handwritten work into a nicely formatted document. Simply photograph your work and ask the LLM to typeset it in LaTeX or format it for Word. Review the output carefully—LLMs occasionally misread handwriting or introduce transcription errors.
   
   **3. Add Images and Additional Explanations**
   
   Insert any figures generated by your Python code into the appropriate locations in your document. Add written explanations, interpretations of results, and responses to conceptual questions. Make sure figures have descriptive captions.
   
   **4. Submit Your Python Code as a Separate File**
   
   Your computational work should be in a standalone ``.py`` or ``.ipynb`` file that runs independently and reproduces all results and figures in your PDF.

Submission Requirements
-----------------------

Each homework submission consists of **two separate files**:

1. **Written Work** (PDF): A single PDF containing your derivations, explanations, embedded figures, and responses to all problems. This can be created using:
   
   - LaTeX (recommended for complex math)
   - Microsoft Word or Google Docs with equation editor
   - Handwritten work converted via LLM to LaTeX/Word
   - Any combination of the above
   
   Upload to Gradescope and match pages to problems.

2. **Python Code** (separate file): A single ``.py`` or ``.ipynb`` file containing all your computational work. Your code should:
   
   - Run without errors from top to bottom
   - Be organized by problem number with clear comments
   - Use explicit random seeds for reproducibility
   - Generate all figures that appear in your PDF

.. admonition:: Reproducibility Requirement
   :class: important

   All stochastic code must use explicit seeds (e.g., ``np.random.default_rng(42)``) so that results can be exactly reproduced. When we run your code, it should generate the same figures and numerical results shown in your PDF.


Assignments by Chapter
======================

Part I: Foundations
-------------------

.. toctree::
   :maxdepth: 1
   :caption: Foundations & Simulation

   hw1_distributional_relationships

**Homework 1: Distributional Relationships and Computational Foundations**

Covers Chapters 1.1–1.3. Students prove distributional identities using MGFs and transformation methods, then verify results computationally. Part III explores the CLT empirically, quantifies Monte Carlo error, and demonstrates proper random number generation practices.

*Key topics*: MGF multiplication property, Jacobian transformations, CLT convergence rates, Monte Carlo standard error, reproducibility and parallel streams.

.. 
   Future assignments - uncomment as they are created
   
   Part II: Simulation & Resampling
   --------------------------------
   
   .. toctree::
      :maxdepth: 1
      :caption: Simulation & Resampling
   
      hw2_monte_carlo_methods
      hw3_bootstrap_resampling
   
   **Homework 2: Monte Carlo Methods**
   
   Covers Chapter 2. Students implement inverse CDF sampling, rejection sampling, 
   and variance reduction techniques.
   
   **Homework 3: Bootstrap and Resampling**
   
   Covers Chapters 3–4. Students implement nonparametric and parametric bootstrap,
   construct confidence intervals, and apply cross-validation.
   
   Part III: Bayesian Methods
   --------------------------
   
   .. toctree::
      :maxdepth: 1
      :caption: Bayesian Methods
   
      hw4_bayesian_inference
      hw5_mcmc_methods
   
   **Homework 4: Bayesian Inference**
   
   Covers Chapter 5.1–5.4. Students derive posteriors for conjugate families,
   implement numerical integration, and compare credible vs confidence intervals.
   
   **Homework 5: MCMC Methods**
   
   Covers Chapter 5.5–5.8. Students implement Metropolis-Hastings and Gibbs sampling,
   diagnose convergence, and fit hierarchical models.
   
   Part IV: Modern Methods
   -----------------------
   
   .. toctree::
      :maxdepth: 1
      :caption: Modern Methods
   
      hw6_llm_integration
   
   **Homework 6: LLMs in Data Science Workflows**
   
   Covers Chapters 6–7. Students integrate LLMs for text preprocessing, 
   feature extraction, and responsible AI practices.


Tips for Success
================

Mathematical Derivations
------------------------

* **Always start on paper**: I cannot emphasize this enough—do your derivations by hand first. The physical act of writing helps you think through each step carefully and catch errors early.
* **Start early**: Proofs often require multiple attempts and fresh perspectives; don't leave them for the night before
* **Define notation**: Clearly state what each symbol represents before using it
* **Justify each step**: Cite theorems, state assumptions, explain non-obvious equalities
* **Check special cases**: Verify your result gives sensible answers for known parameter values (e.g., does your MGF equal 1 when t=0?)
* **Work forwards and backwards**: Sometimes it helps to start from the desired result and work backwards
* **Use LLMs for typesetting only**: Have the LLM convert your completed handwritten work to LaTeX—don't ask it to solve the problem for you

Computational Verification
--------------------------

* **Read tips carefully**: Each problem includes specific guidance on SciPy parameterizations and visualization approaches
* **Use the provided utilities**: The ``compare_distributions()`` function handles common visualization patterns
* **Start with small samples**: Debug with n=100 before scaling to n=100,000
* **Print intermediate results**: Verify each step before combining into larger computations
* **Compare to theory**: Always check that sample means/variances match theoretical values
* **Save your figures**: Use ``plt.savefig('problem1.png', dpi=150)`` to save figures for your PDF

Common Pitfalls
---------------

.. admonition:: Watch Out For
   :class: warning

   * **SciPy parameterizations**: Different conventions for scale vs. rate, shape parameters, etc.
   * **Off-by-one errors**: Geometric and Negative Binomial have multiple conventions
   * **Numerical precision**: Use ``np.log1p(x)`` instead of ``np.log(1+x)`` for small x
   * **Random state**: Forgetting to set seeds makes debugging nearly impossible
   * **Vectorization**: Loops over samples are 50–100× slower than vectorized operations