In data analysis you often need to sort rows by one or more variables. In base R, this is commonly done with order(), which returns the indices of the sorted order.

order() vs sort()

  • sort(x) returns the sorted values.
  • order(x) returns the indices that would sort x.
x <- c(5, 1, 10)
sort(x)
## [1]  1  5 10
order(x)
## [1] 2 1 3
x[order(x)]
## [1]  1  5 10

Example data

library(tibble)
set.seed(1234)

df <- tibble(
  c1 = rnorm(50, 5, 1.5),
  c2 = rnorm(50, 5, 1.5),
  c3 = rnorm(50, 5, 1.5),
  c4 = rnorm(50, 5, 1.5),
  c5 = rnorm(50, 5, 1.5)
)
df
c1 c2 c3 c4 c5
3.189401 2.290953 5.621785 4.4341435 5.7278402
5.416144 4.126886 4.287922 5.1464292 6.0451532
6.626662 3.336666 5.098990 7.4581170 5.2782709
1.481453 3.477557 4.246283 3.6866113 6.0511003
5.643687 4.756536 3.761002 5.1826400 5.4675215
5.759084 5.844584 5.250484 7.0431960 6.1406935
4.137890 7.471726 3.655603 4.6480684 7.7636954
4.180052 3.839970 5.252278 3.4199258 6.6685443
4.153322 7.408864 5.532452 3.6953246 5.0489959
3.664943 3.263287 4.921842 4.4148095 3.3283266
4.284211 5.984883 4.706098 3.7289749 5.6270867
3.502420 8.823487 4.026395 4.6090409 4.3996471
3.835619 4.947859 3.335349 4.3783704 7.2402397
5.096688 3.995550 6.273911 4.7254238 2.5893786
6.439241 4.988593 5.033544 5.6105841 4.3763723
4.834572 7.665627 6.246711 5.9369497 5.6330126
4.233486 3.292088 3.133568 7.5173086 4.7723952
3.633207 7.051741 5.253540 4.8969595 4.0907733
3.744242 6.994347 6.009749 4.5187401 4.5429184
8.623753 5.504709 4.960585 7.2065086 5.9443041
5.201132 5.010339 4.712912 7.5564941 6.3427580
4.263971 4.316797 3.827140 5.0648661 5.9903189
4.339178 4.450214 8.087243 4.5010140 8.4102253
5.689384 5.972430 6.125752 2.2666469 6.7602464
3.959420 8.105406 7.736313 7.1168936 5.4315646
2.827693 4.769902 5.120089 3.7436263 4.0103449
5.862134 2.913949 4.052886 3.3143558 9.3787102
3.464516 3.914627 2.730068 9.5656488 6.0161233
4.977292 5.387393 4.045850 5.3525320 3.9735195
3.596077 4.524411 5.339452 4.9501121 5.2797381
6.653446 4.733315 6.520536 0.9016707 4.5134100
4.286610 4.745009 5.379125 4.8503141 4.5879437
3.935840 2.941547 3.242078 6.4640476 3.5997450
4.248113 4.739319 6.003071 5.6208034 5.1752680
2.556360 6.275348 2.524849 6.3684832 5.4787404
3.248571 6.046413 4.451222 7.9755983 3.3836868
1.729941 5.824996 4.525823 6.7536628 0.1502718
2.988510 4.395902 2.077631 4.2368945 4.6176880
4.558559 4.712609 6.380086 6.0562703 5.0442767
4.301154 3.208208 4.065693 4.7023756 5.8914107
7.174244 4.920262 4.498945 4.1928938 5.0887028
3.397036 5.382794 7.092722 0.7163620 5.6200983
3.716953 7.558946 5.955012 3.8155297 3.3533417
4.579065 6.502270 4.837352 5.7317220 6.0667629
3.508490 4.256625 5.770644 8.2520488 6.0783331
3.547229 5.533325 5.598908 5.7510419 5.3774766
3.339023 3.298088 7.494285 5.9303153 7.0359117
3.122021 6.317305 5.413840 3.5511452 5.6067027
4.214258 6.459375 5.759409 5.2439821 5.3965464
4.254725 8.181676 5.521328 1.8826437 5.4020659

Sort with base R (order)

Example 1: Sort by one column (ascending)

df_c1 <- df[order(df$c1), ]
head(df_c1)
c1 c2 c3 c4 c5
1.481453 3.477557 4.246283 3.686611 6.0511003
1.729941 5.824996 4.525823 6.753663 0.1502718
2.556360 6.275348 2.524849 6.368483 5.4787404
2.827693 4.769902 5.120089 3.743626 4.0103449
2.988510 4.395902 2.077631 4.236895 4.6176880
3.122021 6.317305 5.413840 3.551145 5.6067027

Example 2: Sort by multiple columns (ascending, ascending)

df_c3_c4 <- df[order(df$c3, df$c4), ]
head(df_c3_c4)
c1 c2 c3 c4 c5
2.988510 4.395902 2.077631 4.236895 4.617688
2.556360 6.275348 2.524849 6.368483 5.478740
3.464516 3.914627 2.730068 9.565649 6.016123
4.233486 3.292088 3.133568 7.517309 4.772395
3.935840 2.941547 3.242078 6.464048 3.599745
3.835619 4.947859 3.335349 4.378370 7.240240

Example 3: Mixed order (descending then ascending)

A common base R trick is using -x for descending order (numeric columns only).

df_desc <- df[order(-df$c3, df$c4), ]
head(df_desc)
c1 c2 c3 c4 c5
4.339178 4.450214 8.087243 4.5010140 8.410225
3.959420 8.105406 7.736313 7.1168936 5.431565
3.339023 3.298088 7.494285 5.9303153 7.035912
3.397036 5.382794 7.092722 0.7163620 5.620098
6.653446 4.733315 6.520536 0.9016707 4.513410
4.558559 4.712609 6.380086 6.0562703 5.044277

Sort with dplyr (arrange)

dplyr::arrange() is often easier to read than order() when sorting by multiple columns.

library(dplyr)

df |>
  arrange(c1) |>
  head()
c1 c2 c3 c4 c5
1.481453 3.477557 4.246283 3.686611 6.0511003
1.729941 5.824996 4.525823 6.753663 0.1502718
2.556360 6.275348 2.524849 6.368483 5.4787404
2.827693 4.769902 5.120089 3.743626 4.0103449
2.988510 4.395902 2.077631 4.236895 4.6176880
3.122021 6.317305 5.413840 3.551145 5.6067027

Multiple columns:

df |>
  arrange(c3, c4) |>
  head()
c1 c2 c3 c4 c5
2.988510 4.395902 2.077631 4.236895 4.617688
2.556360 6.275348 2.524849 6.368483 5.478740
3.464516 3.914627 2.730068 9.565649 6.016123
4.233486 3.292088 3.133568 7.517309 4.772395
3.935840 2.941547 3.242078 6.464048 3.599745
3.835619 4.947859 3.335349 4.378370 7.240240

Mixed directions:

df |>
  arrange(desc(c3), c4) |>
  head()
c1 c2 c3 c4 c5
4.339178 4.450214 8.087243 4.5010140 8.410225
3.959420 8.105406 7.736313 7.1168936 5.431565
3.339023 3.298088 7.494285 5.9303153 7.035912
3.397036 5.382794 7.092722 0.7163620 5.620098
6.653446 4.733315 6.520536 0.9016707 4.513410
4.558559 4.712609 6.380086 6.0562703 5.044277
 

A work by Gianluca Sottile

gianluca.sottile@unipa.it