In this tutorial, we will learn how to select columns, whose names contains a string using dplyr’s contains() function. dplyr’s contains() function belongs to a family helper functions to select columns like starts_with() and ends_with(). First we will see a simple example of using single string and selecting all columns that contains the string. And then we will learn how to use contains() function with multiple strings and select columns containing them.
To get started with some examples, let us load tidyverse and palmerpenguins package.
library(tidyvrerse) library(palmerpenguins) packageVersion("dplyr") ## [1] '1.0.9'
Our penguin data has 8 columns.
penguins %>% head(5) ## # A tibble: 5 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## # … with 1 more variable: year <int>
dplyr’s contains() to select columns matching a string
To select columns containing a string we use contains() function in combination with select() function in dplyr. In this example, we are selecting columns whose names contain the string “gth”. And we get two columns containing the string.
penguins %>% select(contains("gth")) ## # A tibble: 344 × 2 ## bill_length_mm flipper_length_mm ## <dbl> <int> ## 1 39.1 181 ## 2 39.5 186 ## 3 40.3 195 ## 4 NA NA ## 5 36.7 193 ## 6 39.3 190 ## 7 38.9 181 ## 8 39.2 195 ## 9 34.1 193 ## 10 42 190 ## # … with 334 more rows
dplyr’s contains() to select columns with multiple matching strings
We can use multiple strings and select columns whose names contain them. To do that we provide multiple strings as a vector to contains() function as shown below. In this example, we select columns which contain “gth” and “pth” in their names.
penguins %>% select(contains(c("gth", "pth"))) ## # A tibble: 344 × 3 ## bill_length_mm flipper_length_mm bill_depth_mm ## <dbl> <int> <dbl> ## 1 39.1 181 18.7 ## 2 39.5 186 17.4 ## 3 40.3 195 18 ## 4 NA NA NA ## 5 36.7 193 19.3 ## 6 39.3 190 20.6 ## 7 38.9 181 17.8 ## 8 39.2 195 19.6 ## 9 34.1 193 18.1 ## 10 42 190 20.2 ## # … with 334 more rows
dplyr’s contains() cannot do regular expression
Note that dplyr’s contains can only do literal string match and it cannot do regular expression. For example, if we try to select all columns containing “pth” or “gth”, we will get an empty tibble.
penguins %>% select(contains("[pg]th")) ## # A tibble: 344 × 0