.. _r_datasets: Course Datasets ------------------------------------------------- Primary Course Dataset - AppRating ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The **AppRating dataset** is the central dataset used throughout all computer assignments. This dataset contains app ratings and various metrics that students will analyze using different statistical techniques as they progress through the course. **Fall 2025 Session** - Dataset: `AppRatingFALL2025.csv `_ - Description: `AppRatingDescription.pdf `_ **Winter 2025 Session** - Dataset: `AppRatingWINTER2025.csv `_ - Description: `AppRatingDescription.pdf `_ .. important:: The AppRating dataset is used in **all six computer assignments**. Download the appropriate version for your session at the beginning of the course and use it throughout. Tutorial Support Datasets ~~~~~~~~~~~~~~~~~~~~~~~~~~ These datasets are available in the Computer Assignment Tutorials Data folder and are used for demonstrations and additional practice: **CSV Format Datasets** - `Bikedata_clean.csv `_ - Cleaned bicycle data - `DMS.csv `_ - DMS measurements - `eduproduct.csv `_ - Educational product data - `eg01-23time24.csv `_ - Time series example - `ex07-39mpgdiff.csv `_ - MPG difference data - `furnace.csv `_ - Furnace efficiency data - `helicon_cleaned.csv `_ - Cleaned helicon measurements - `helicon_m.csv `_ - Helicon measurement data - `linebackers.csv `_ - Football linebacker statistics - `loc.csv `_ - Location data - `movies.csv `_ - Movie ratings and information - `studyhabits.csv `_ - Student study habits survey **Text Format Datasets** - `ANOVA paxil.txt `_ - ANOVA example with Paxil data - `linebackers.txt `_ - Text version of linebacker data - `singer1.txt `_ - Singer height data Loading Datasets in R ~~~~~~~~~~~~~~~~~~~~~~ **Loading CSV files:** .. code-block:: r # From local file (after downloading) d <- read.csv("data/helicon_m.csv") # Directly from URL d <- read.csv("https://treese41528.github.io/STAT350/Computer_Assignment_Tutorials/Data/helicon_m.csv") **Loading text files:** .. code-block:: r # Space-separated text file d <- read.table("data/linebackers.txt", header = TRUE) # Or if tab-separated d <- read.table("data/ANOVA paxil.txt", header = TRUE, sep = "\t") # From URL (note the %20 for space in filename) d <- read.table("https://treese41528.github.io/STAT350/Computer_Assignment_Tutorials/Data/ANOVA%20paxil.txt", header = TRUE) Built-in R Datasets Used in Course ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The course also utilizes several built-in R datasets for examples and demonstrations: **Primary Built-in Datasets** - ``iris`` - Fisher's iris flower measurements (150 obs, 5 variables) - ``mtcars`` - Motor Trend car statistics (32 cars, 11 variables) - ``sleep`` - Student sleep data for paired t-tests (20 obs, 3 variables) - ``CO2`` - Carbon dioxide uptake in grass plants (84 obs, 5 variables) - ``AirPassengers`` - Monthly airline passenger numbers (time series) **Additional Built-in Datasets for Practice** - ``chickwts`` - Chicken weights by feed type (ANOVA examples) - ``PlantGrowth`` - Plant growth under different treatments - ``InsectSprays`` - Effectiveness of insect sprays - ``ToothGrowth`` - Tooth growth in guinea pigs - ``faithful`` - Old Faithful geyser eruption data **Loading Built-in Datasets** .. code-block:: r # Load a specific dataset data(iris) # View available datasets data() # Get help on a dataset ?iris # View structure str(iris) head(iris) Data Download and Organization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Recommended Folder Structure:** .. code-block:: text STAT350_Project/ ├── data/ │ ├── AppRating.csv # Your main dataset │ ├── helicon_m.csv # Tutorial datasets │ └── ...other datasets ├── scripts/ │ ├── CA1.R │ ├── CA2.R │ └── ... └── output/ ├── figures/ └── tables/ **Download Instructions:** 1. **Create project structure:** Set up folders as shown above 2. **Download AppRating:** Save your session's version to ``data/`` folder 3. **Download tutorial data:** Save tutorial datasets as needed for each assignment 4. **Set working directory:** Use RStudio Projects or ``setwd()`` to your project folder **Verification After Loading:** Always verify your data after loading: .. code-block:: r # Check structure str(d) # Check dimensions dim(d) # Look for missing values sum(is.na(d)) # Summary statistics summary(d) # First/last few rows head(d) tail(d)