tidyverse all_of(): select columns from a vector

In this tutorial, we will learn about how to select multiple columns from a dataframe by using the column names as a vector at once.

tidyverse’ tidyselect package has numerous options for selecting columns from a datafame. all_of() is one of the functions in tidyselect that helps us selecting multiple columns using a character vector.

Let us see an example of why we should use all_of() to select columns from a vector. First we will load tidyverse the meta R package.

starwars %>% head()

# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
# … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

Thee names of the columns that we want to select is in a vector.

column_name_vector <- c("name", "height", "skin_color", "gender")

By deefault, one might try to use select() using the vector as argument

starwars %>%  select(column_name_vector)

The code does get executed and give a result that may not bee correct. And we also get the following warning .

Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(column_name_vector)` instead of `column_name_vector` to silence this message.
ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session

In our example it give what we needed.

# A tibble: 87 × 4
   name               height skin_color  gender   
   <chr>               <int> <chr>       <chr>    
 1 Luke Skywalker        172 fair        masculine
 2 C-3PO                 167 gold        masculine
 3 R2-D2                  96 white, blue masculine
 4 Darth Vader           202 white       masculine
 5 Leia Organa           150 light       feminine 
 6 Owen Lars             178 light       masculine
 7 Beru Whitesun lars    165 light       feminine 
 8 R5-D4                  97 white, red  masculine
 9 Biggs Darklighter     183 light       masculine
10 Obi-Wan Kenobi        182 fair        masculine
# … with 77 more rows

tidyselect’s all_of(): to select columns of from a vector

However, the right approach is to use all_of(vector_name) as argument to select() function. Now we will get the result.

starwars %>% 

# A tibble: 87 × 4
   name               height skin_color  gender   
   <chr>               <int> <chr>       <chr>    
 1 Luke Skywalker        172 fair        masculine
 2 C-3PO                 167 gold        masculine
 3 R2-D2                  96 white, blue masculine
 4 Darth Vader           202 white       masculine
 5 Leia Organa           150 light       feminine 
 6 Owen Lars             178 light       masculine
 7 Beru Whitesun lars    165 light       feminine 
 8 R5-D4                  97 white, red  masculine
 9 Biggs Darklighter     183 light       masculine
10 Obi-Wan Kenobi        182 fair        masculine
# … with 77 more rows

Note that all_of() function is

for strict selection. If any of the variables in the character vector is missing, an error is thrown.

# a vector containing a name that is not present in the dataframe
column_name_vector <- c("name", "height", "skin_color", "actor")

# selecting columns using a character vector
starwars %>% 

Since the column actor is not present in the dataframe, all_of() will throw the following error and quit.

Quitting from lines 37-41 (select_columns_from_vectors.qmd) 
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `actor` doesn't exist.

In the situations, where you are not interested in getting all the columns in the vector, but any of the columns in the vector, we need to use any_of() function instead of all_of().

