In this tutorial, we will learn about how to select multiple columns from a dataframe by using the column names as a vector at once.
tidyverse’ tidyselect package has numerous options for selecting columns from a datafame. all_of() is one of the functions in tidyselect that helps us selecting multiple columns using a character vector.
Let us see an example of why we should use all_of() to select columns from a vector. First we will load tidyverse the meta R package.
library(tidyverse)
starwars %>% head() # A tibble: 6 × 14 name height mass hair_color skin_color eye_color birth_year sex gender <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> 1 Luke Sky… 172 77 blond fair blue 19 male mascu… 2 C-3PO 167 75 <NA> gold yellow 112 none mascu… 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu… 4 Darth Va… 202 136 none white yellow 41.9 male mascu… 5 Leia Org… 150 49 brown light brown 19 fema… femin… 6 Owen Lars 178 120 brown, gr… light blue 52 male mascu… # … with 5 more variables: homeworld <chr>, species <chr>, films <list>, # vehicles <list>, starships <list>
Thee names of the columns that we want to select is in a vector.
column_name_vector <- c("name", "height", "skin_color", "gender")
By deefault, one might try to use select() using the vector as argument
starwars %>% select(column_name_vector)
The code does get executed and give a result that may not bee correct. And we also get the following warning .
Note: Using an external vector in selections is ambiguous. ℹ Use `all_of(column_name_vector)` instead of `column_name_vector` to silence this message. ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>. This message is displayed once per session
In our example it give what we needed.
# A tibble: 87 × 4 name height skin_color gender <chr> <int> <chr> <chr> 1 Luke Skywalker 172 fair masculine 2 C-3PO 167 gold masculine 3 R2-D2 96 white, blue masculine 4 Darth Vader 202 white masculine 5 Leia Organa 150 light feminine 6 Owen Lars 178 light masculine 7 Beru Whitesun lars 165 light feminine 8 R5-D4 97 white, red masculine 9 Biggs Darklighter 183 light masculine 10 Obi-Wan Kenobi 182 fair masculine # … with 77 more rows
tidyselect’s all_of(): to select columns of from a vector
However, the right approach is to use all_of(vector_name) as argument to select() function. Now we will get the result.
starwars %>% select(all_of(column_name_vector)) # A tibble: 87 × 4 name height skin_color gender <chr> <int> <chr> <chr> 1 Luke Skywalker 172 fair masculine 2 C-3PO 167 gold masculine 3 R2-D2 96 white, blue masculine 4 Darth Vader 202 white masculine 5 Leia Organa 150 light feminine 6 Owen Lars 178 light masculine 7 Beru Whitesun lars 165 light feminine 8 R5-D4 97 white, red masculine 9 Biggs Darklighter 183 light masculine 10 Obi-Wan Kenobi 182 fair masculine # … with 77 more rows
Note that all_of() function is
for strict selection. If any of the variables in the character vector is missing, an error is thrown.
# a vector containing a name that is not present in the dataframe column_name_vector <- c("name", "height", "skin_color", "actor") # selecting columns using a character vector starwars %>% select(all_of(column_name_vector))
Since the column actor is not present in the dataframe, all_of() will throw the following error and quit.
Quitting from lines 37-41 (select_columns_from_vectors.qmd) Error in `select()`: ! Can't subset columns that don't exist. ✖ Column `actor` doesn't exist. Backtrace:
In the situations, where you are not interested in getting all the columns in the vector, but any of the columns in the vector, we need to use any_of() function instead of all_of().