dplyr arrange: Sort rows by one or more variables

In this tutorial, we will learn how to sort a dataframe by one or more columns using dplyr’s arrange() function. dplyr’s arrange() function is one of the important functions in dplyr that lets you use dplyr to sort rows. By sorting, we mean dplyr’s arrange() changes the order of the rows. based on. the values of a column(s) without changing its content. It only affect the rows and leave the columns unchanged.

Basic syntax of using arrange() is that, it takes a data frame and one or more column names to order by. When we use arrange() with multiple columns

each additional column will be used to break ties in the values of preceding columns

Let us get started by loading tidyverse.

library(tidyverse)

We will use fuel economy data available as mpg with ggplot2 package that is part of tidyverse.

mpg %>% 
   head()

## # A tibble: 6 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

dplyr arrange(): Sort by a column

To arrange rows of the dataframe by values of one column in the data, we provide the column name as argument to arrange() function (in addition to dataframe). In the example below we use pipe operator %>% to feed the data to arrange() function to sort by the column cty.

The column cty contains, mileage for city driving for each of the car . And we are sorting the rows by the city mileage value.

dplyr’s arrange() function rearranges the rows in ascending order of mileage value.

mpg %>%
  arrange(cty) %>%
  head()
## # A tibble: 6 × 11
##   manufacturer model       displ  year   cyl trans drv     cty   hwy fl    class
##   <chr>        <chr>       <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 dodge        dakota pic…   4.7  2008     8 auto… 4         9    12 e     pick…
## 2 dodge        durango 4wd   4.7  2008     8 auto… 4         9    12 e     suv  
## 3 dodge        ram 1500 p…   4.7  2008     8 auto… 4         9    12 e     pick…
## 4 dodge        ram 1500 p…   4.7  2008     8 manu… 4         9    12 e     pick…
## 5 jeep         grand cher…   4.7  2008     8 auto… 4         9    12 e     suv  
## 6 chevrolet    c1500 subu…   5.3  2008     8 auto… r        11    15 e     suv

We can see that cars with largest city mileage will be at the last row of the dataframe.

mpg %>%
  arrange(cty) %>%
  tail()

## # A tibble: 6 × 11
##   manufacturer model      displ  year   cyl trans  drv     cty   hwy fl    class
##   <chr>        <chr>      <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
## 1 toyota       corolla      1.8  2008     4 auto(… f        26    35 r     comp…
## 2 honda        civic        1.6  1999     4 manua… f        28    33 r     subc…
## 3 toyota       corolla      1.8  2008     4 manua… f        28    37 r     comp…
## 4 volkswagen   new beetle   1.9  1999     4 auto(… f        29    41 d     subc…
## 5 volkswagen   jetta        1.9  1999     4 manua… f        33    44 d     comp…
## 6 volkswagen   new beetle   1.9  1999     4 manua… f        35    44 d     subc…

dplyr arrange(): Sort by a column in descending order

As we saw, by default dplyr’s arrange() reorders rows by a column in ascending order. We can use desc() function to re-order by a column in descending order.

For example, this code below sorts by cty column in descending order. Therefore it shows the cars with highest city mileage first.

mpg %>%
  arrange(desc(cty)) %>%
  head()

## # A tibble: 6 × 11
##   manufacturer model      displ  year   cyl trans  drv     cty   hwy fl    class
##   <chr>        <chr>      <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
## 1 volkswagen   new beetle   1.9  1999     4 manua… f        35    44 d     subc…
## 2 volkswagen   jetta        1.9  1999     4 manua… f        33    44 d     comp…
## 3 volkswagen   new beetle   1.9  1999     4 auto(… f        29    41 d     subc…
## 4 honda        civic        1.6  1999     4 manua… f        28    33 r     subc…
## 5 toyota       corolla      1.8  2008     4 manua… f        28    37 r     comp…
## 6 honda        civic        1.8  2008     4 manua… f        26    34 r     subc…

dplyr arrange(): Sort by multiple columns

To sort by multiple columns, we specify the column names as argument to dplyr’s arrange() function. In the example below, we use arrange() function to sort rows by values of two columns, cyl and cty, number of cylinders in car and. city milage.

mpg %>%
  arrange(cyl, cty)

## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 toyota       4runner 4…   2.7  1999     4 manu… 4        15    20 r     suv  
##  2 toyota       toyota ta…   2.7  1999     4 manu… 4        15    20 r     pick…
##  3 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
##  4 toyota       4runner 4…   2.7  1999     4 auto… 4        16    20 r     suv  
##  5 toyota       toyota ta…   2.7  1999     4 auto… 4        16    20 r     pick…
##  6 toyota       toyota ta…   2.7  2008     4 manu… 4        17    22 r     pick…
##  7 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 dodge        caravan 2…   2.4  1999     4 auto… f        18    24 r     mini…
## 10 hyundai      sonata       2.4  1999     4 auto… f        18    26 r     mids…
## # … with 224 more rows

Here is an image visualizing the result of sorting a dataframe by two columns with. dplyr’s arrange()(thanks to TidyDataTutor.com). It highlighs two columns we are sorting by first and then shows how the rows are re-ordered after applying arrange() function. Notice the first value of the first column after sorting and the values of second columns. Each column’s values sorted in a hierarchy.

dplyr arrange(): Sort by multiple columns example
plyr arrange(): Sort by multiple columns example

An important feature of dplyr’s arrange() to note is

Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicitly mention grouping variables (or use .by_group = TRUE) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.

Exit mobile version