t-Test of Different Types In this course, we’ve covered three distinct types of t-tests: one-sample, two-sample independent, and two-sample paired tests. Each serves a unique purpose in hypothesis testing, though they all operate under the same guiding principles and utilize the versatile t.test() function. However, the choice of data format, diagnostic plots and keywords depends on the specific inference procedure.
We previously discussed the one-sample t-test in Computer Assignment 6. In this tutorial, we focus on the two-sample independent test and the two-sample paired test.
In a two-sample independent t-test, we aim to compare the means of a quantitative variable between two distinct groups. These groups must be independent, implying that the measurements in one group do not affect the measurements in the other group. For conducting such an analysis, you have two main approaches:
Two Vectors of Data: Each vector represents the measurements from one of the groups. This method requires manually separating your data based on group membership before conducting the test.
A Data Frame with a Factor Variable: This approach leverages a factor variable in your data frame that identifies the group membership for each observation, effectively categorizing your continuous variable of interest into two groups based on this factor.
Given its simplicity and direct alignment with R’s formula interface used in various other statistical functions, we will focus on the second approach. This method enhances readability and efficiency, particularly when your data is already organized within a single data frame.
When dealing with more than one group, such as in a two-sample procedure or ANOVA, you might find it useful to filter out categories that aren’t relevant to your analysis. This can be especially helpful when working with datasets containing numerous categories. R provides several ways to accomplish this task, with ifelse being one of the most straightforward approaches.
The ifelse function in R is a vectorized conditional function that allows you to replace values in a vector based on a condition. The syntax of ifelse is as follows:
We’ll create an illustrative example to show how ifelse can be used in practical scenarios.
(Data Set: movies.csv) In the film industry, understanding the financial performance of movies through different lenses, such as audience ratings, is crucial for stakeholders. This understanding helps in tailoring future productions to meet audience expectations and optimize profitability. The movies.csv dataset provides a snapshot of various movies’ profitability metrics, including LOpening, which represents the log-transformed revenue from the opening weekend per theater. This transformation helps in normalizing revenue data, making it more amenable to statistical analysis.
Our analysis focuses on exploring how movies rated for different audiences—‘R’ for adults and a combined ‘Family’ category including ‘PG’ and ‘PG-13’ ratings—fare in terms of their opening weekend revenue per theater.
To compare these groups effectively, we first redefine our movie ratings into two distinct categories: ‘Adult’ for ‘R’ rated movies and ‘Family’ for movies rated ‘PG’ and ‘PG-13’. This re-categorization is captured in a new variable within our dataset, MergedRating:
Title | Rating | Genre | Budget | USRevenue | Opening | LOpening | Theaters | Opinion | Profit |
---|---|---|---|---|---|---|---|---|---|
Madagascar: Escape 2 Africa | PG | Animation | 150.0 | 180.0 | 63.1 | 4.145 | 4056 | 6.9 | 1 |
Sex and the City | R | Comedy | 65.0 | 152.6 | 56.8 | 4.040 | 3285 | 5.4 | 1 |
The Ruins | R | Horror | 8.0 | 17.4 | 8.0 | 2.079 | 2812 | 6.0 | 1 |
Stop-Loss | R | Drama | 25.0 | 10.9 | 4.6 | 1.526 | 1291 | 6.5 | 0 |
The Curious Case of Benjamin Button | PG-13 | Drama | 150.0 | 127.5 | 26.9 | 3.292 | 2988 | 8.0 | 0 |
Redbelt | R | Action | 7.0 | 2.3 | 1.1 | 0.095 | 1379 | 6.9 | 0 |
The Secret Life of Bees | PG-13 | Drama | 11.0 | 37.8 | 10.5 | 2.351 | 1591 | 7.0 | 1 |
Kung Fu Panda | PG | Animation | 130.0 | 215.4 | 60.2 | 4.098 | 4114 | 7.7 | 1 |
The Happening | R | Drama | 60.0 | 64.5 | 30.5 | 3.418 | 2986 | 5.2 | 1 |
Zach and Miri Make a Porno | R | Comedy | 24.0 | 31.5 | 10.1 | 2.313 | 2735 | 7.1 | 1 |
The Strangers | R | Horror | 10.0 | 52.5 | 21.0 | 3.045 | 2466 | 6.0 | 1 |
Prom Night | PG-13 | Horror | 20.0 | 43.8 | 20.8 | 3.035 | 2700 | 3.6 | 1 |
The Dark Knight | PG-13 | Action | 185.0 | 533.3 | 158.4 | 5.065 | 4366 | 8.9 | 1 |
Baby Mama | PG-13 | Comedy | 30.0 | 60.3 | 17.4 | 2.856 | 2543 | 6.1 | 1 |
Wanted | R | Action | 75.0 | 134.3 | 50.9 | 3.930 | 3175 | 6.8 | 1 |
Changeling | R | Drama | 55.0 | 35.7 | 10.0 | 2.303 | 1850 | 8.0 | 0 |
Yes Man | PG-13 | Comedy | 70.0 | 97.7 | 18.3 | 2.907 | 3434 | 7.0 | 1 |
The Express | PG | Drama | 40.0 | 9.6 | 4.6 | 1.526 | 2808 | 7.1 | 0 |
W. | PG-13 | Drama | 25.1 | 25.5 | 10.5 | 2.351 | 2030 | 6.6 | 1 |
The Mummy: Tomb of the Dragon Emporer | PG-13 | Action | 145.0 | 102.2 | 40.5 | 3.701 | 3760 | 5.1 | 0 |
Eagle Eye | PG-13 | Action | 80.0 | 101.1 | 29.2 | 3.374 | 3510 | 6.6 | 1 |
Burn After Reading | R | Comedy | 37.0 | 60.3 | 19.1 | 2.950 | 2651 | 7.2 | 1 |
Saw V | R | Horror | 10.8 | 56.7 | 30.1 | 3.405 | 3060 | 5.8 | 1 |
Miracle and St Anna | R | Action | 45.0 | 7.9 | 3.5 | 1.253 | 1185 | 5.9 | 0 |
The Day the Earth Stood Still | PG-13 | Drama | 80.0 | 79.4 | 30.5 | 3.418 | 3560 | 5.5 | 0 |
Be Kind Rewind | PG-13 | Comedy | 20.0 | 11.2 | 4.1 | 1.411 | 808 | 6.6 | 0 |
Jumper | PG-13 | Action | 85.0 | 80.2 | 32.1 | 3.469 | 3428 | 5.9 | 0 |
Hancock | PG-13 | Action | 150.0 | 227.9 | 62.6 | 4.137 | 3965 | 6.5 | 1 |
Speed Racer | PG | Action | 120.0 | 43.9 | 18.6 | 2.923 | 3606 | 6.3 | 0 |
The Eye | R | Drama | 12.0 | 31.4 | 12.4 | 2.518 | 2436 | 5.3 | 1 |
Death Race | R | Action | 45.0 | 36.1 | 12.6 | 2.534 | 2532 | 6.6 | 0 |
College | R | Comedy | 6.5 | 4.7 | 2.6 | 0.956 | 2123 | 4.3 | 0 |
Blindness | R | Drama | 25.0 | 3.1 | 2.0 | 0.693 | 1690 | 6.7 | 0 |
Iron Man | PG-13 | Action | 140.0 | 318.3 | 102.1 | 4.626 | 4105 | 8.0 | 1 |
Lakeview Terrace | PG-13 | Drama | 22.0 | 39.3 | 15.0 | 2.708 | 2464 | 6.3 | 1 |
movies$MergedRating <- ifelse(movies$Rating == "PG" | movies$Rating == "PG-13", "Family", "Adult")
kable(head(movies), caption = "Movie Profitability Data")
Title | Rating | Genre | Budget | USRevenue | Opening | LOpening | Theaters | Opinion | Profit | MergedRating |
---|---|---|---|---|---|---|---|---|---|---|
Madagascar: Escape 2 Africa | PG | Animation | 150 | 180.0 | 63.1 | 4.145 | 4056 | 6.9 | 1 | Family |
Sex and the City | R | Comedy | 65 | 152.6 | 56.8 | 4.040 | 3285 | 5.4 | 1 | Adult |
The Ruins | R | Horror | 8 | 17.4 | 8.0 | 2.079 | 2812 | 6.0 | 1 | Adult |
Stop-Loss | R | Drama | 25 | 10.9 | 4.6 | 1.526 | 1291 | 6.5 | 0 | Adult |
The Curious Case of Benjamin Button | PG-13 | Drama | 150 | 127.5 | 26.9 | 3.292 | 2988 | 8.0 | 0 | Family |
Redbelt | R | Action | 7 | 2.3 | 1.1 | 0.095 | 1379 | 6.9 | 0 | Adult |
Refer back to Computer Assignment #6 Tutorial for information regarding logical operators.
Hypothesis Testing Framework
Test Selection: For our purpose, a two-sample independent t-test is appropriate as it compares means between two distinct groups that are not related or paired. This test suits our scenario since each movie is unique and falls into one of two independent categories, ‘Adult’ or ‘Family’.
Alternative Hypothesis: We aim to determine if there’s a significant difference in profitability (as measured by log opening revenue, LOpening) between ‘Adult’ and ‘Family’ movies. Hence, our alternative hypothesis could be that the mean LOpening for ‘Adult’ movies is different from ‘Family’ movies.
Data Visualization: To understand the distribution of LOpening for each category, we generate histograms and boxplots.
First, calculate group level statistics and density.
# Calculate the sample mean and standard deviation for each group
xbar <- tapply(movies$LOpening, movies$MergedRating, mean)
s <- tapply(movies$LOpening, movies$MergedRating, sd)
# Create estimated normal density curves for each group
movies$normal.density <- ifelse(movies$MergedRating == "Family",
dnorm(movies$LOpening, xbar["Family"], s["Family"]),
dnorm(movies$LOpening, xbar["Adult"], s["Adult"]))
To ensure accurate comparision between the two groups in the histogram we need to use the ‘facet_grid()’ function from the ggplot2 package, designed to create a grid of plots based on the values of the levels of our factor. It allows for the simultaneous visualization of subsets of data across different categories, facilitating comparisons and highlighting differences or patterns within the data.
binLen <- as.numeric(max(tapply(movies$LOpening, movies$MergedRating,length)))
n_bins <- round(max(sqrt(binLen)+2, 5))
ggplot(movies, aes(x = LOpening)) +
geom_histogram(aes(y = after_stat(density)), bins = n_bins, fill = "grey", col = "black") +
facet_grid(. ~ MergedRating) +
geom_density(col = "red", lwd = 1) +
geom_line(aes(y = normal.density), col = "blue", lwd = 1) +
labs(title = "Distribution of Log Opening Revenue by Rating Category")
Create boxplots for both ‘Family’ and ‘Adult’ rating categories. Boxplots are instrumental in visualizing the central tendency and variability of data. By designating a categorical variable for the x-axis, we can generate side-by-side boxplots, facilitating an effortless comparison between the two groups. This visual comparison can help highlight differences in the distribution of log opening weekend revenue per theater across rating categories, providing insights into how movie ratings may influence financial performance.
ggplot(movies, aes(x = MergedRating, y = LOpening)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
stat_summary(fun = mean, colour = "black", geom = "point", size = 3) +
ggtitle("Boxplots of Log Opening Revenue by Rating Category")
Calculating Slope and Intercept for Reference Lines
For each rating category, we calculate the slope and intercept of the reference line that would represent a perfectly normal distribution. These calculations allow ggplot2 to draw the reference lines accurately for each category in the Q-Q plots:
movies$intercept <- ifelse(movies$MergedRating == "Family", xbar["Family"], xbar["Adult"])
movies$slope <- ifelse(movies$MergedRating == "Family", s["Family"], s["Adult"])
With the intercept and slope prepared, we proceed to construct Q-Q plots for LOpening within the ‘Family’ and ‘Adult’ groups, facilitating a comparison of their distributions to a normal distribution:
ggplot(movies, aes(sample = LOpening)) +
stat_qq() +
facet_grid(MergedRating ~ .) +
geom_abline(aes(intercept = intercept, slope = slope), color = "blue", linetype = "dashed") +
ggtitle("Q-Q Plots of Log Opening Revenue by Rating Category")
For this analysis, we use the formula interface of the t.test() function, which allows for a concise specification of the groups being compared:
t.test(LOpening ~ MergedRating, data = movies,
mu = 0, conf.level = 0.99,
paired = FALSE, alternative = "two.sided",
var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: LOpening by MergedRating
## t = -2.5144, df = 29.12, p-value = 0.0177
## alternative hypothesis: true difference in means between group Adult and group Family is not equal to 0
## 99 percent confidence interval:
## -1.917939 0.087768
## sample estimates:
## mean in group Adult mean in group Family
## 2.316125 3.231211
In situations where you’re comparing two related groups—such as before-and-after measurements in a controlled experiment or matched pairs in observational studies—a two-sample paired t-test provides a powerful tool for analysis. This test focuses on the differences between paired observations, which means you’ll need to create a new variable representing these differences.
Creating a Difference Variable
To conduct a paired t-test, the first step involves calculating the differences between each pair of matched observations. This new variable, let’s call it diff, captures the essence of the paired design by isolating and emphasizing the change or effect of interest.
The direction in which you calculate these differences (i.e., variable1 - variable2 vs. variable2 - variable1) is a matter of context or convention and does not influence the statistical validity of the test. However, it’s essential to be consistent with the hypothesized direction of the effect. For example, if you’re expected to estimate the mean difference of a - b, then your difference calculation should reflect this order.
Code for Creating the Difference Variable While we won’t repeat the specifics here, remember that creating this difference variable can be achieved with simple subtraction, it typically looks something like this:
# Assuming 'data' is your dataframe, and 'before' and 'after' are the paired observations
data$diff <- data$after - data$before
(Data Set: ex07-39mpgdiff.csv) Fuel efficiency comparison. A researcher records the mpg (miles per gallon, a measurement of the fuel economy) of his car each time he fills the tank. He did this by dividing the miles driven since the last fill-up by the amount of gallons pumped at fill-up. He wants to determine if these calculations differ from what his car’s computer estimates.
For the paired t-test, we focus on the Diff variable, which represents the difference between computer estimates and driver measurements. This variable highlights the change or discrepancy of interest, serving as the basis for our analysis. If this variable was not already calculated we would need to obtain it as mentioned above.
Fill.up | Computer | Driver | Diff |
---|---|---|---|
1 | 41.5 | 36.5 | 5.0 |
2 | 50.7 | 44.2 | 6.5 |
3 | 36.6 | 37.2 | -0.6 |
4 | 37.3 | 35.6 | 1.7 |
5 | 34.2 | 30.5 | 3.7 |
6 | 45.0 | 40.5 | 4.5 |
7 | 48.0 | 40.0 | 8.0 |
8 | 43.2 | 41.0 | 2.2 |
9 | 47.7 | 42.8 | 4.9 |
10 | 42.2 | 39.2 | 3.0 |
11 | 43.2 | 38.8 | 4.4 |
12 | 44.6 | 44.5 | 0.1 |
13 | 48.4 | 45.4 | 3.0 |
14 | 46.4 | 45.3 | 1.1 |
15 | 46.8 | 45.7 | 1.1 |
16 | 39.2 | 34.2 | 5.0 |
17 | 37.3 | 35.2 | 2.1 |
18 | 43.5 | 39.8 | 3.7 |
19 | 44.3 | 44.9 | -0.6 |
20 | 43.3 | 47.5 | -4.2 |
Test Selection: For our analysis, a two-sample paired t-test is ideal since it compares the means of related observations. Here, each pair of observations consists of the MPG as calculated by the car’s computer and as measured by the driver for the same fill-up, making them inherently paired. This test allows us to assess if there’s a statistically significant difference between the computer’s estimates and the driver’s measurements.
Alternative Hypothesis: We aim to determine whether there’s a significant discrepancy between the car’s computer MPG estimates and the driver’s MPG measurements. Thus, our alternative hypothesis posits that the mean difference between the computer’s estimates and the driver’s measurements is not equal to zero, indicating a systematic bias in either the computer’s or the driver’s favor.
Data Visualization: To visualize the distribution of MPG differences (Computer MPG - Driver MPG), histograms and boxplots can be informative. These plots will help us understand the spread and central tendency of the MPG differences, alongside any potential outliers or skewness in the data. The code is similar to one-sample procedures and will not be repeated.
For this analysis, we use the formula interface of the t.test() function, which allows for a concise specification of the groups being compared. Notice we can either use the ‘Diff’ variable as one-sample procedure or use the two variables ‘Computer’ and ‘Driver’ and use a paired procedure to get the same results:
One-Sample Approach Using the ‘Diff’ Variable: If we choose to focus on the already calculated differences between the car’s computer estimates and the driver’s measurements (Diff), we can apply a one-sample t-test. This approach treats the set of differences as a single sample being tested against a hypothesized mean difference of zero.
t.test.results <- t.test(mpg$Diff, mu = 0, conf.level = 0.95, alternative = "two.sided")
t.test.results
##
## One Sample t-test
##
## data: mpg$Diff
## t = 4.358, df = 19, p-value = 0.0003386
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 1.418847 4.041153
## sample estimates:
## mean of x
## 2.73
Paired Two-Sample Approach Using ‘Computer’ and ‘Driver’ Variables: Alternatively, we can directly compare the Computer and Driver variables using a paired two-sample t-test. This method implicitly calculates the differences between each pair of corresponding observations, aligning closely with the nature of our data as paired measurements from the same fill-up events.
t.test(mpg$Computer, mpg$Driver, mu = 0, conf.level = 0.95, alternative = "two.sided", paired = TRUE)
##
## Paired t-test
##
## data: mpg$Computer and mpg$Driver
## t = 4.358, df = 19, p-value = 0.0003386
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 1.418847 4.041153
## sample estimates:
## mean difference
## 2.73