Course Datasets
Primary Course Dataset - AppRating
The AppRating dataset is the central dataset used throughout all computer assignments. This dataset contains app ratings and various metrics that students will analyze using different statistical techniques as they progress through the course.
Fall 2025 Session
Dataset: AppRatingFALL2025.csv
Description: AppRatingDescription.pdf
Winter 2025 Session
Dataset: AppRatingWINTER2025.csv
Description: AppRatingDescription.pdf
Important
The AppRating dataset is used in all six computer assignments. Download the appropriate version for your session at the beginning of the course and use it throughout.
Tutorial Support Datasets
These datasets are available in the Computer Assignment Tutorials Data folder and are used for demonstrations and additional practice:
CSV Format Datasets
Bikedata_clean.csv - Cleaned bicycle data
DMS.csv - DMS measurements
eduproduct.csv - Educational product data
eg01-23time24.csv - Time series example
ex07-39mpgdiff.csv - MPG difference data
furnace.csv - Furnace efficiency data
helicon_cleaned.csv - Cleaned helicon measurements
helicon_m.csv - Helicon measurement data
linebackers.csv - Football linebacker statistics
loc.csv - Location data
movies.csv - Movie ratings and information
studyhabits.csv - Student study habits survey
Text Format Datasets
ANOVA paxil.txt - ANOVA example with Paxil data
linebackers.txt - Text version of linebacker data
singer1.txt - Singer height data
Loading Datasets in R
Loading CSV files:
# From local file (after downloading)
d <- read.csv("data/helicon_m.csv")
# Directly from URL
d <- read.csv("https://treese41528.github.io/STAT350/Computer_Assignment_Tutorials/Data/helicon_m.csv")
Loading text files:
# Space-separated text file
d <- read.table("data/linebackers.txt", header = TRUE)
# Or if tab-separated
d <- read.table("data/ANOVA paxil.txt", header = TRUE, sep = "\t")
# From URL (note the %20 for space in filename)
d <- read.table("https://treese41528.github.io/STAT350/Computer_Assignment_Tutorials/Data/ANOVA%20paxil.txt",
header = TRUE)
Built-in R Datasets Used in Course
The course also utilizes several built-in R datasets for examples and demonstrations:
Primary Built-in Datasets
iris
- Fisher’s iris flower measurements (150 obs, 5 variables)mtcars
- Motor Trend car statistics (32 cars, 11 variables)sleep
- Student sleep data for paired t-tests (20 obs, 3 variables)CO2
- Carbon dioxide uptake in grass plants (84 obs, 5 variables)AirPassengers
- Monthly airline passenger numbers (time series)
Additional Built-in Datasets for Practice
chickwts
- Chicken weights by feed type (ANOVA examples)PlantGrowth
- Plant growth under different treatmentsInsectSprays
- Effectiveness of insect spraysToothGrowth
- Tooth growth in guinea pigsfaithful
- Old Faithful geyser eruption data
Loading Built-in Datasets
# Load a specific dataset
data(iris)
# View available datasets
data()
# Get help on a dataset
?iris
# View structure
str(iris)
head(iris)
Data Download and Organization
Recommended Folder Structure:
STAT350_Project/
├── data/
│ ├── AppRating.csv # Your main dataset
│ ├── helicon_m.csv # Tutorial datasets
│ └── ...other datasets
├── scripts/
│ ├── CA1.R
│ ├── CA2.R
│ └── ...
└── output/
├── figures/
└── tables/
Download Instructions:
Create project structure: Set up folders as shown above
Download AppRating: Save your session’s version to
data/
folderDownload tutorial data: Save tutorial datasets as needed for each assignment
Set working directory: Use RStudio Projects or
setwd()
to your project folder
Verification After Loading:
Always verify your data after loading:
# Check structure
str(d)
# Check dimensions
dim(d)
# Look for missing values
sum(is.na(d))
# Summary statistics
summary(d)
# First/last few rows
head(d)
tail(d)