How to select columns that starts with a prefix/string in R

In this tutorial, we will learn how to select columns that starts with a prefix or string using dplyr’s strats_with() function and base R startsWith() function. The tidyverse R package dplyr has a number of helper functions to select columns of interest under different condition. dplyr’s starts_with() function is one of select helper functions to select columns that start with a string. Similarly, we will show how to use base R’s startsWith() function to select column with a prefix.

To get started with some examples, let us load tidyverse and palmerpenguins package.

library(tidyvrerse)
library(palmerpenguins)
packageVersion("dplyr")

## [1] '1.0.9'

Taking a quick look at the data and the column names of the dataframe, we can see few columns have a common prefix.

penguins %>% head(5)

## # A tibble: 5 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year <int>

dplyr starts_with(): select columns starting with a character: Example 1

dplyr’s starts_With() function is one of the select helper function that can select columns using a prefix. First, let us select columns that starts with a character using starts_with() function. We get three columns that starts with a character “b”.

penguins %>%
  select(starts_with("b"))

## # A tibble: 344 × 3
##    bill_length_mm bill_depth_mm body_mass_g
##             <dbl>         <dbl>       <int>
##  1           39.1          18.7        3750
##  2           39.5          17.4        3800
##  3           40.3          18          3250
##  4           NA            NA            NA
##  5           36.7          19.3        3450
##  6           39.3          20.6        3650
##  7           38.9          17.8        3625
##  8           39.2          19.6        4675
##  9           34.1          18.1        3475
## 10           42            20.2        4250
## # … with 334 more rows

dplyr starts_with(): select columns starting with a string: Example 2

In this example, we are selecting columns that starts with a string using starts_with() function. We get two matching columns that starts with a string of interest.

penguins %>%
  select(starts_with("bill"))

## # A tibble: 344 × 2
##    bill_length_mm bill_depth_mm
##             <dbl>         <dbl>
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # … with 334 more rows

base R startsWith(): select columns starting with a string

Similarly, we can use base R’s startsWith() function to determine if the input start with a prefix string and it returns logical vector (TRUE/FALSE).

For example, we can determine if the column names of a dataframe starts with a character using startsWith() as shown below.

startsWith(colnames(penguins), "b")

## [1] FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE

In order to select the columns that starts with a character, we will use the logical vector to subset the columns of a dataframe. Here, we can select columns starting with a character using

penguins[,startsWith(colnames(penguins), "b")]

## # A tibble: 344 × 3
##    bill_length_mm bill_depth_mm body_mass_g
##             <dbl>         <dbl>       <int>
##  1           39.1          18.7        3750
##  2           39.5          17.4        3800
##  3           40.3          18          3250
##  4           NA            NA            NA
##  5           36.7          19.3        3450
##  6           39.3          20.6        3650
##  7           38.9          17.8        3625
##  8           39.2          19.6        4675
##  9           34.1          18.1        3475
## 10           42            20.2        4250
## # … with 334 more rows

If we are interested in selecting columns tarts with a string, not just a character, the use case is very similar. We use startsWith() function to select columns whose names begins with a string “bill”

penguins[, startsWith(colnames(penguins), "bill")]

## # A tibble: 344 × 2
##    bill_length_mm bill_depth_mm
##             <dbl>         <dbl>
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # … with 334 more rows

Exit mobile version