A bar chart is a great way to display a categorical variable on the x-axis. The y-axis can represent:

  1. A count of observations in each group.
  2. A summary value (e.g., mean, median, min/max) computed for each group.

In this lesson we use the mtcars dataset and focus on:

  • cyl: number of cylinders (numeric, but used as a category)
  • am: transmission (0 = automatic, 1 = manual)
  • mpg: miles per gallon (numeric)

How to create a bar chart

With ggplot2, the general pattern is:

ggplot(data, aes(...)) + geom_*

In this tutorial we will mainly use:

  • geom_bar() for counts (default behavior: stat = "count")
  • geom_col() for precomputed values (equivalent to geom_bar(stat = "identity"))

Bar chart (counts)

Basic count bar chart

library(ggplot2)

ggplot(mtcars, aes(x = factor(cyl))) +
  geom_bar() +
  theme_classic() +
  labs(x = "Cylinders", y = "Count")

Change the bar color

ggplot(mtcars, aes(x = factor(cyl))) +
  geom_bar(fill = "coral") +
  theme_classic() +
  labs(x = "Cylinders", y = "Count")

Change transparency (alpha)

ggplot(mtcars, aes(x = factor(cyl))) +
  geom_bar(fill = "coral", alpha = 0.5) +
  theme_classic() +
  labs(x = "Cylinders", y = "Count")

Color bars by group

ggplot(mtcars, aes(x = factor(cyl), fill = factor(cyl))) +
  geom_bar() +
  theme_classic() +
  labs(x = "Cylinders", y = "Count", fill = "cyl")

Grouped bar charts

A common use case is to show counts of a second categorical variable within each bar.

Prepare the dataset

library(dplyr)

cars <- mtcars |>
  mutate(
    cyl = factor(cyl),
    am = factor(am, labels = c("auto", "manual"))
  )

Stacked bars (default)

ggplot(cars, aes(x = cyl, fill = am)) +
  geom_bar() +
  theme_classic() +
  labs(x = "Cylinders", y = "Count", fill = "Transmission")

Percent stacked bars

Use position = "fill" to show proportions instead of raw counts.

ggplot(cars, aes(x = cyl, fill = am)) +
  geom_bar(position = "fill") +
  theme_classic() +
  labs(x = "Cylinders", y = "Proportion", fill = "Transmission")

Side-by-side bars

ggplot(cars, aes(x = cyl, fill = am)) +
  geom_bar(position = position_dodge()) +
  theme_classic() +
  labs(x = "Cylinders", y = "Count", fill = "Transmission")

Bar chart (values)

Sometimes you want bars to represent a numeric value (e.g., mean mpg) rather than counts. In this case, compute the summary first and then use geom_col().

Step 1) Compute mean mpg by cylinders

data_bar <- cars |>
  group_by(cyl) |>
  summarise(mean_mpg = mean(mpg), .groups = "drop") |>
  mutate(mean_mpg = round(mean_mpg, 2))

data_bar
cyl mean_mpg
4 26.66
6 19.74
8 15.10

Step 2) Plot the bars

ggplot(data_bar, aes(x = cyl, y = mean_mpg)) +
  geom_col() +
  theme_classic() +
  labs(x = "Cylinders", y = "Mean mpg")

Step 3) Horizontal bars

ggplot(data_bar, aes(x = cyl, y = mean_mpg)) +
  geom_col() +
  coord_flip() +
  theme_classic() +
  labs(x = "Cylinders", y = "Mean mpg")

Step 4) Color by group + adjust width

p <- ggplot(data_bar, aes(x = cyl, y = mean_mpg, fill = cyl)) +
  geom_col(width = 0.6) +
  coord_flip() +
  theme_classic() +
  guides(fill = "none") +
  labs(x = "Cylinders", y = "Mean mpg")

p

Step 5) Add labels

p +
  geom_text(aes(label = mean_mpg), hjust = -0.15, color = "grey20", size = 3.5) +
  expand_limits(y = max(data_bar$mean_mpg) * 1.10)

Histogram

A histogram is used for continuous variables and shows the distribution of values by binning. For example, here is the distribution of mpg:

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 12, fill = "#4AA4DE", color = "white") +
  theme_classic() +
  labs(x = "mpg", y = "Count")

Summary

Objective Example
Count bars ggplot(df, aes(x)) + geom_bar()
Count bars (grouped, stacked) ggplot(df, aes(x, fill = g)) + geom_bar()
Count bars (grouped, side-by-side) ggplot(df, aes(x, fill = g)) + geom_bar(position = position_dodge())
Percent stacked bars ggplot(df, aes(x, fill = g)) + geom_bar(position = ‘fill’)
Bars representing values (precomputed y) ggplot(df_sum, aes(x, y)) + geom_col()
 

A work by Gianluca Sottile

gianluca.sottile@unipa.it