The apply family

The apply family helps you iterate over common data structures without writing explicit for loops. Use: - apply() for matrices (or data frames you treat as matrices) - lapply() for lists/vectors/data frames (returns a list) - sapply() like lapply() but it tries to simplify the result - vapply() like sapply() but safer because you pre-specify the output type - tapply() to apply a function by group(s) defined by factor(s)

apply()

apply() works on arrays/matrices. It takes three main arguments:

apply(X, MARGIN, FUN, ...)
  • X: an array or matrix
  • MARGIN: where to apply the function
    • 1 = rows
    • 2 = columns
    • c(1, 2) = rows and columns
  • FUN: function to apply (e.g., sum, mean, custom functions)

Example: sum columns of a matrix

m1 <- matrix(1:30, nrow = 5, ncol = 6)
m1
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    6   11   16   21   26
## [2,]    2    7   12   17   22   27
## [3,]    3    8   13   18   23   28
## [4,]    4    9   14   19   24   29
## [5,]    5   10   15   20   25   30
col_sums_apply <- apply(m1, 2, sum)
col_sums_apply
## [1]  15  40  65  90 115 140

Tip: prefer dedicated helpers when available

For simple row/column sums, rowSums() and colSums() are clear and fast:

colSums(m1)
## [1]  15  40  65  90 115 140
rowSums(m1)
## [1]  81  87  93  99 105

lapply()

lapply() applies a function to each element of a list/vector (or each column of a data frame) and always returns a list.

lapply(X, FUN, ...)

Example: convert movie titles to lower case

movies <- c("SPYDERMAN", "BATMAN", "VERTIGO", "CHINATOWN")

movies_lower_list <- lapply(movies, tolower)
str(movies_lower_list)
## List of 4
##  $ : chr "spyderman"
##  $ : chr "batman"
##  $ : chr "vertigo"
##  $ : chr "chinatown"

If you want a character vector instead of a list:

movies_lower_vec <- unlist(movies_lower_list, use.names = FALSE)
str(movies_lower_vec)
##  chr [1:4] "spyderman" "batman" "vertigo" "chinatown"

sapply() (and vapply())

sapply() is a “user-friendly” wrapper around lapply() that tries to simplify the result into a vector/matrix/array when possible.

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

vapply() is similar but safer because you specify the expected return type/length.

Example: min/max on the cars dataset

dt <- cars

min_l <- lapply(dt, min)
min_s <- sapply(dt, min)
min_v <- vapply(dt, min, numeric(1))  # type-stable

min_l
## $speed
## [1] 4
## 
## $dist
## [1] 2
min_s
## speed  dist 
##     4     2
min_v
## speed  dist 
##     4     2
max_l <- lapply(dt, max)
max_s <- sapply(dt, max)
max_v <- vapply(dt, max, numeric(1))

max_l
## $speed
## [1] 25
## 
## $dist
## [1] 120
max_s
## speed  dist 
##    25   120
max_v
## speed  dist 
##    25   120

Using a custom function

avg_range <- function(x) {
  (min(x) + max(x)) / 2
}

avg_s <- sapply(dt, avg_range)
avg_v <- vapply(dt, avg_range, numeric(1))

avg_s
## speed  dist 
##  14.5  61.0
avg_v
## speed  dist 
##  14.5  61.0

When sapply() does NOT simplify

If the function returns vectors of different lengths, sapply() cannot build a matrix and will often fall back to a list (or you can force it with simplify = FALSE).

Example: keep values above the mean (variable length output)

above_mean <- function(x) {
  m <- mean(x)
  x[x > m]
}

res_l <- lapply(dt, above_mean)
res_s1 <- sapply(dt, above_mean)                 # may return a list if lengths differ
res_s2 <- sapply(dt, above_mean, simplify = FALSE)

identical(res_l, res_s2)
## [1] TRUE

tapply()

tapply() splits a vector into groups defined by a factor (or list of factors) and applies a function to each group.

tapply(X, INDEX, FUN, ...)
  • X: typically a vector
  • INDEX: factor (or list of factors) defining groups
  • FUN: function to apply

Example: median Sepal.Width by species

tapply(iris$Sepal.Width, iris$Species, median)
##     setosa versicolor  virginica 
##        3.4        2.8        3.0

Modern tidy alternative (optional)

library(dplyr)

iris |>
  dplyr::group_by(Species) |>
  dplyr::summarise(median_sepal_width = median(Sepal.Width), .groups = "drop")
 

A work by Gianluca Sottile

gianluca.sottile@unipa.it