The apply family helps you iterate over common data
structures without writing explicit for loops. Use: -
apply() for matrices (or data frames you treat as
matrices) - lapply() for lists/vectors/data frames
(returns a list) - sapply() like lapply() but
it tries to simplify the result - vapply() like
sapply() but safer because you pre-specify the output type
- tapply() to apply a function by group(s) defined by
factor(s)
apply() works on arrays/matrices. It takes three main
arguments:
X: an array or matrixMARGIN: where to apply the function
1 = rows2 = columnsc(1, 2) = rows and columnsFUN: function to apply (e.g., sum,
mean, custom functions)## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 6 11 16 21 26
## [2,] 2 7 12 17 22 27
## [3,] 3 8 13 18 23 28
## [4,] 4 9 14 19 24 29
## [5,] 5 10 15 20 25 30
## [1] 15 40 65 90 115 140
lapply() applies a function to each element of a
list/vector (or each column of a data frame) and always returns
a list.
movies <- c("SPYDERMAN", "BATMAN", "VERTIGO", "CHINATOWN")
movies_lower_list <- lapply(movies, tolower)
str(movies_lower_list)## List of 4
## $ : chr "spyderman"
## $ : chr "batman"
## $ : chr "vertigo"
## $ : chr "chinatown"
If you want a character vector instead of a list:
## chr [1:4] "spyderman" "batman" "vertigo" "chinatown"
sapply() is a “user-friendly” wrapper around
lapply() that tries to simplify the result into a
vector/matrix/array when possible.
vapply() is similar but safer because you specify the
expected return type/length.
dt <- cars
min_l <- lapply(dt, min)
min_s <- sapply(dt, min)
min_v <- vapply(dt, min, numeric(1)) # type-stable
min_l## $speed
## [1] 4
##
## $dist
## [1] 2
## speed dist
## 4 2
## speed dist
## 4 2
## $speed
## [1] 25
##
## $dist
## [1] 120
## speed dist
## 25 120
## speed dist
## 25 120
If the function returns vectors of different
lengths, sapply() cannot build a matrix and will
often fall back to a list (or you can force it with
simplify = FALSE).
above_mean <- function(x) {
m <- mean(x)
x[x > m]
}
res_l <- lapply(dt, above_mean)
res_s1 <- sapply(dt, above_mean) # may return a list if lengths differ
res_s2 <- sapply(dt, above_mean, simplify = FALSE)
identical(res_l, res_s2)## [1] TRUE
tapply() splits a vector into groups defined by a factor
(or list of factors) and applies a function to each group.
X: typically a vectorINDEX: factor (or list of factors) defining groupsFUN: function to apply## setosa versicolor virginica
## 3.4 2.8 3.0
A work by Gianluca Sottile
gianluca.sottile@unipa.it