In this tutorial, we will learn how to select columns that starts with a prefix or string using dplyr’s strats_with() function and base R startsWith() function. The tidyverse R package dplyr has a number of helper functions to select columns of interest under different condition. dplyr’s starts_with() function is one of select helper functions to select columns that start with a string. Similarly, we will show how to use base R’s startsWith() function to select column with a prefix.
To get started with some examples, let us load tidyverse and palmerpenguins package.
library(tidyvrerse) library(palmerpenguins) packageVersion("dplyr") ## [1] '1.0.9'
Taking a quick look at the data and the column names of the dataframe, we can see few columns have a common prefix.
penguins %>% head(5) ## # A tibble: 5 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## # … with 1 more variable: year <int>
dplyr starts_with(): select columns starting with a character: Example 1
dplyr’s starts_With() function is one of the select helper function that can select columns using a prefix. First, let us select columns that starts with a character using starts_with() function. We get three columns that starts with a character “b”.
penguins %>% select(starts_with("b")) ## # A tibble: 344 × 3 ## bill_length_mm bill_depth_mm body_mass_g ## <dbl> <dbl> <int> ## 1 39.1 18.7 3750 ## 2 39.5 17.4 3800 ## 3 40.3 18 3250 ## 4 NA NA NA ## 5 36.7 19.3 3450 ## 6 39.3 20.6 3650 ## 7 38.9 17.8 3625 ## 8 39.2 19.6 4675 ## 9 34.1 18.1 3475 ## 10 42 20.2 4250 ## # … with 334 more rows
dplyr starts_with(): select columns starting with a string: Example 2
In this example, we are selecting columns that starts with a string using starts_with() function. We get two matching columns that starts with a string of interest.
penguins %>% select(starts_with("bill")) ## # A tibble: 344 × 2 ## bill_length_mm bill_depth_mm ## <dbl> <dbl> ## 1 39.1 18.7 ## 2 39.5 17.4 ## 3 40.3 18 ## 4 NA NA ## 5 36.7 19.3 ## 6 39.3 20.6 ## 7 38.9 17.8 ## 8 39.2 19.6 ## 9 34.1 18.1 ## 10 42 20.2 ## # … with 334 more rows
base R startsWith(): select columns starting with a string
Similarly, we can use base R’s startsWith() function to determine if the input start with a prefix string and it returns logical vector (TRUE/FALSE).
For example, we can determine if the column names of a dataframe starts with a character using startsWith() as shown below.
startsWith(colnames(penguins), "b") ## [1] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
In order to select the columns that starts with a character, we will use the logical vector to subset the columns of a dataframe. Here, we can select columns starting with a character using
penguins[,startsWith(colnames(penguins), "b")] ## # A tibble: 344 × 3 ## bill_length_mm bill_depth_mm body_mass_g ## <dbl> <dbl> <int> ## 1 39.1 18.7 3750 ## 2 39.5 17.4 3800 ## 3 40.3 18 3250 ## 4 NA NA NA ## 5 36.7 19.3 3450 ## 6 39.3 20.6 3650 ## 7 38.9 17.8 3625 ## 8 39.2 19.6 4675 ## 9 34.1 18.1 3475 ## 10 42 20.2 4250 ## # … with 334 more rows
If we are interested in selecting columns tarts with a string, not just a character, the use case is very similar. We use startsWith() function to select columns whose names begins with a string “bill”
penguins[, startsWith(colnames(penguins), "bill")] ## # A tibble: 344 × 2 ## bill_length_mm bill_depth_mm ## <dbl> <dbl> ## 1 39.1 18.7 ## 2 39.5 17.4 ## 3 40.3 18 ## 4 NA NA ## 5 36.7 19.3 ## 6 39.3 20.6 ## 7 38.9 17.8 ## 8 39.2 19.6 ## 9 34.1 18.1 ## 10 42 20.2 ## # … with 334 more rows