2.3. Tools for Numerical (Quantitative) Data
Numerical data offers richer visualization possibilities than categorical data because it now contains information about distance between values. We are now interested in the overall shape of the distribution, where the numbers are clustered, and how far they spread. A histogram answers all three at a glance. In this chapter, we’ll focus on building good histograms.
Road Map 🧭
Visualize numerical variables with histograms.
Understand the impact of choosing different numbers of classes/bins for a histogram.
2.3.1. Building your first histogram
A histogram divides the number line into adjacent bins of equal width and draws a bar whose height equals the count (or relative frequency) falling in each bin.
Example 1 – insect counts (discrete numeric)

This histogram displays insect counts from an agricultural study testing different insecticide sprays. Each bar represents the number of observations falling within a range of insect counts. The twin peaks reflect two different spray formulations—detail we will dig into in Chapter 3 when we learn how to separate a quantitative variable by group.
library(ggplot2)
data(InsectSprays)
# rule of thumb: round(sqrt(n)) + 2 bins
n_obs <- nrow(InsectSprays)
n_bins <- round(sqrt(n_obs)) + 2
ggplot(InsectSprays, aes(x = count)) +
geom_histogram(bins = n_bins, colour = "black", fill = "skyblue", linewidth = 1.2) +
labs(title = "Distribution of insect counts (Beall, 1942)",
x = "Number of insects", y = "Frequency")
2.3.2. Choosing the number of bins
Bin width is a Goldilocks choice: too wide hides detail, too narrow shows only noise. For sample sizes up to a few hundred, the square-root rule works well:
For larger datasets (n > 400), experiment with 20-30 bins until the main pattern is clear without a saw-tooth fringe. The goal is to reveal the data’s fundamental structure while filtering out noise.
Example 2 – furnace energy use (continuous numeric)
We illustrate bin width choice by plotting the same furnace energy consumption data four times with different bin counts.

These four histograms display the same furnace energy consumption data with different bin counts:
6 bins (top left): Oversmooths the data, hiding important features
10 bins (top right): Balances detail and clarity, revealing the general right-skewed shape
15 bins (bottom left): Shows more granular structure but begins to display some potentially random fluctuation
30 bins (bottom right): Too detailed, resulting in a jagged appearance dominated by sampling variability
Ten to fifteen bins reveal the slight right-skew without drowning the eye in high-frequency jitter.
library(ggplot2)
furnace <- read.csv("furnace.txt") # replace with your path
for (bins in c(6, 10, 15, 30)) {
p <- ggplot(furnace, aes(Consumption)) +
geom_histogram(bins = bins, colour = "black", fill = "grey70") +
labs(title = paste(bins, "bins"), x = "BTU", y = "Frequency")
print(p)
}
2.3.3. Discrete vs. continuous data visualization
For discrete numerical data with few unique values (like counts from 0-7), a bar graph may be more appropriate than a histogram. Bar graphs place one bar at each distinct value, with space between them to indicate their categorical nature, even if the categories are numbers.
The difference between histograms and bar graphs for quantitative data lies in how they handle the underlying value space: * Bar graphs treat each value as a distinct category * Histograms group ranges of values into bins with no gaps between bars
For discrete data with many possible values, histograms become the preferred choice to avoid cluttered displays.
Example – Number of children per family

This bar graph displays the number of children per family from a survey of 100 couples aged 30-40. Unlike a histogram, which would group values into bins, this graph shows the exact count for each possible value (0 through 7 children). Since there are only a few possible values, this bar graph displays the discrete data more effectively than a histogram would.
2.3.4. Enhancing histograms – density & normal overlay
A quick visual test for normality is to overlay a theoretical bell curve and compare. We can also add a density curve that smooths the histogram data to get a better sense of the underlying distribution.
Example 1 – Residential furnace energy use

This enhanced histogram of furnace energy consumption includes: * Blue bars showing the distribution of BTU measurements * A red density curve that smooths the data * A blue normal distribution curve for comparison
The gap between these curves reveals that the data is slightly right-skewed compared to a normal distribution.
library(ggplot2)
furnace <- read.csv("furnace.txt") # replace with your path
xbar <- mean(furnace$Consumption)
s <- sd(furnace$Consumption)
bins <- round(sqrt(nrow(furnace))) + 2
ggplot(furnace, aes(Consumption)) +
geom_histogram(aes(y = after_stat(density)), bins = bins,
fill = "lightblue", colour = "black") +
geom_density(colour = "red", linewidth = 1.2) +
stat_function(fun = dnorm, args = list(mean = xbar, sd = s),
colour = "blue", linewidth = 1.2) +
labs(title = "Residential furnace energy consumption",
y = "Density", x = "BTU")
library(ggplot2)
ggplot(InsectSprays, aes(x = count, fill = spray)) +
geom_histogram(bins = 5, colour = "black", linewidth = 0.8) +
facet_wrap(~ spray, scales = "free_y") +
theme_minimal() + theme(legend.position = "none") +
labs(title = "Insect count distribution by spray type",
x = "Number of insects", y = "Frequency")
2.3.5. Bringing It All Together
Visualizing numerical data effectively requires choosing the right technique for your specific data type and research question. Histograms serve as the foundation for understanding numerical distributions, revealing patterns that guide further analysis. As we progress, we’ll build on these visual insights with formal numerical summaries and statistical tests.
Key Takeaways 📝
Histograms turn walls of numbers into shape, center, spread, and possible outliers.
Use \(\lceil \sqrt{n} \rceil + 2\) bins as a starting point, then adjust by eye.
Overlay normal curves to assess skewness and normality.