• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr contains(): select columns that contains a string

rstats101 · August 5, 2022 ·

In this tutorial, we will learn how to select columns, whose names contains a string using dplyr’s contains() function. dplyr’s contains() function belongs to a family helper functions to select columns like starts_with() and ends_with(). First we will see a simple example of using single string and selecting all columns that contains the string. And then we will learn how to use contains() function with multiple strings and select columns containing them.

To get started with some examples, let us load tidyverse and palmerpenguins package.

library(tidyvrerse)
library(palmerpenguins)
packageVersion("dplyr")

## [1] '1.0.9'

Our penguin data has 8 columns.

penguins %>% head(5)

## # A tibble: 5 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year <int>

dplyr’s contains() to select columns matching a string

To select columns containing a string we use contains() function in combination with select() function in dplyr. In this example, we are selecting columns whose names contain the string “gth”. And we get two columns containing the string.

penguins %>%
  select(contains("gth"))

## # A tibble: 344 × 2
##    bill_length_mm flipper_length_mm
##             <dbl>             <int>
##  1           39.1               181
##  2           39.5               186
##  3           40.3               195
##  4           NA                  NA
##  5           36.7               193
##  6           39.3               190
##  7           38.9               181
##  8           39.2               195
##  9           34.1               193
## 10           42                 190
## # … with 334 more rows

dplyr’s contains() to select columns with multiple matching strings

We can use multiple strings and select columns whose names contain them. To do that we provide multiple strings as a vector to contains() function as shown below. In this example, we select columns which contain “gth” and “pth” in their names.

penguins %>%
  select(contains(c("gth", "pth")))
## # A tibble: 344 × 3
##    bill_length_mm flipper_length_mm bill_depth_mm
##             <dbl>             <int>         <dbl>
##  1           39.1               181          18.7
##  2           39.5               186          17.4
##  3           40.3               195          18  
##  4           NA                  NA          NA  
##  5           36.7               193          19.3
##  6           39.3               190          20.6
##  7           38.9               181          17.8
##  8           39.2               195          19.6
##  9           34.1               193          18.1
## 10           42                 190          20.2
## # … with 334 more rows

dplyr’s contains() cannot do regular expression

Note that dplyr’s contains can only do literal string match and it cannot do regular expression. For example, if we try to select all columns containing “pth” or “gth”, we will get an empty tibble.

penguins %>% 
  select(contains("[pg]th"))

## # A tibble: 344 × 0

Related

Filed Under: dplyr, dplyr contains() Tagged With: select columns with dplyr contains

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version