Course Datasets

Primary Course Dataset - AppRating

The AppRating dataset is the central dataset used throughout all computer assignments. This dataset contains app ratings and various metrics that students will analyze using different statistical techniques as they progress through the course.

SPRING 2026 Session

Dataset: AppRatingSPRING2026.csv
Description: AppRatingDescription.pdf

WINTER 2025 Session

Dataset: AppRatingWINTER2025.csv
Description: AppRatingDescription.pdf

Important

The AppRating dataset is used in all five computer assignments. Download the appropriate version for your session at the beginning of the course and use it throughout.

Tutorial Support Datasets

These datasets are available in the Computer Assignment Tutorials Data folder and are used for demonstrations and additional practice:

CSV Format Datasets

Bikedata_clean.csv - Cleaned bicycle data
DMS.csv - DMS measurements
eduproduct.csv - Educational product data
eg01-23time24.csv - Time series example
ex07-39mpgdiff.csv - MPG difference data
furnace.csv - Furnace efficiency data
helicon_cleaned.csv - Cleaned helicon measurements
helicon_m.csv - Helicon measurement data
linebackers.csv - Football linebacker statistics
loc.csv - Location data
movies.csv - Movie ratings and information
studyhabits.csv - Student study habits survey

Text Format Datasets

ANOVA paxil.txt - ANOVA example with Paxil data
linebackers.txt - Text version of linebacker data
singer1.txt - Singer height data

Loading Datasets in R

Loading CSV files:

# From local file (after downloading)
d <- read.csv("data/helicon_m.csv")

# Directly from URL
d <- read.csv("https://treese41528.github.io/STAT350/Computer_Assignment_Tutorials/Data/helicon_m.csv")

Loading text files:

# Space-separated text file
d <- read.table("data/linebackers.txt", header = TRUE)

# Or if tab-separated
d <- read.table("data/ANOVA paxil.txt", header = TRUE, sep = "\t")

# From URL (note the %20 for space in filename)
d <- read.table("https://treese41528.github.io/STAT350/Computer_Assignment_Tutorials/Data/ANOVA%20paxil.txt",
                header = TRUE)

Built-in R Datasets Used in Course

The course also utilizes several built-in R datasets for examples and demonstrations:

Primary Built-in Datasets

iris - Fisher’s iris flower measurements (150 obs, 5 variables)
mtcars - Motor Trend car statistics (32 cars, 11 variables)
sleep - Student sleep data for paired t-tests (20 obs, 3 variables)
CO2 - Carbon dioxide uptake in grass plants (84 obs, 5 variables)
AirPassengers - Monthly airline passenger numbers (time series)

Additional Built-in Datasets for Practice

chickwts - Chicken weights by feed type (ANOVA examples)
PlantGrowth - Plant growth under different treatments
InsectSprays - Effectiveness of insect sprays
ToothGrowth - Tooth growth in guinea pigs
faithful - Old Faithful geyser eruption data

Loading Built-in Datasets

# Load a specific dataset
data(iris)

# View available datasets
data()

# Get help on a dataset
?iris

# View structure
str(iris)
head(iris)

Data Download and Organization

Recommended Folder Structure:

STAT350_Project/
├── data/
│   ├── AppRating.csv        # Your main dataset
│   ├── helicon_m.csv        # Tutorial datasets
│   └── ...other datasets
├── scripts/
│   ├── CA1.R
│   ├── CA2.R
│   └── ...
└── output/
    ├── figures/
    └── tables/

Download Instructions:

Create project structure: Set up folders as shown above
Download AppRating: Save your session’s version to data/ folder
Download tutorial data: Save tutorial datasets as needed for each assignment
Set working directory: Use RStudio Projects or setwd() to your project folder

Verification After Loading:

Always verify your data after loading:

# Check structure
str(d)

# Check dimensions
dim(d)

# Look for missing values
sum(is.na(d))

# Summary statistics
summary(d)

# First/last few rows
head(d)
tail(d)