• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr filter(): How to select rows with partially matching string

rstats101 · July 11, 2022 ·

In this tutorial, we will learn how to select or filter rows of a dataframe with partially matching string. dplyr’s filter() function selects/filters rows based on values of one or more columns when it completely matches. However, to filter or select rows with partially matching strings in a column, we can use filter with additional functions in R. In this post, we will learn how to use two approaches, one using grepl() function from base R and str_detect() function from stringr package to select rows that partially match.

To get started let us load tidyverse, the suite of R packages.

library(tidyverse)

We will use world population data built-in with tidyr package in tidyverse to learn how can we use grepl() and str_detect() functions to select partially matching rows.

population %>% head()
## # A tibble: 6 × 3
##   country      year population
##   <chr>       <int>      <int>
## 1 Afghanistan  1995   17586073
## 2 Afghanistan  1996   18415307
## 3 Afghanistan  1997   19021226
## 4 Afghanistan  1998   19496836
## 5 Afghanistan  1999   19987071
## 6 Afghanistan  2000   20595360

Filtering rows with partial match using grepl()

grepl() function from base R is a close relative grep() function and it takes a pattern and a vector or text and returns a boolean vector with True if the pattern matches or False if it does not. By default, grepl() does not ignore case, but with ignore.case=TRUE we can make grepl() to ignore the case while matching.


grepl(pattern, 
      x,
      ignore.case = FALSE)

To filter rows with partial match we will use filter() function as before, but this time with grepl() as argument. In the example below, we are looking for matching pattern “Germ” as a pattern and country as vector to look for match. Here, grepl() will return True when country column partially match for “Germ” and possibly others as well.

population %>% 
  filter(grepl("Germ",country)) %>%
  head()

## # A tibble: 6 × 3
##   country  year population
##   <chr>   <int>      <int>
## 1 Germany  1995   83147770
## 2 Germany  1996   83388930
## 3 Germany  1997   83490697
## 4 Germany  1998   83500716
## 5 Germany  1999   83490881
## 6 Germany  2000   83512459

With grepl() we can also use regular expression to describe pattern. For example, to select countries that end with “any” we use

population %>% 
  filter(grepl("any$", country))

## # A tibble: 19 × 3
##    country  year population
##    <chr>   <int>      <int>
##  1 Germany  1995   83147770
##  2 Germany  1996   83388930
##  3 Germany  1997   83490697
##  4 Germany  1998   83500716
##  5 Germany  1999   83490881
##  6 Germany  2000   83512459

Here is another example of using simple regex, but this time getting countries that start with “Ger”

population %>% 
  filter(grepl("^Ger", country))

## # A tibble: 19 × 3
##    country  year population
##    <chr>   <int>      <int>
##  1 Germany  1995   83147770
##  2 Germany  1996   83388930
##  3 Germany  1997   83490697
##  4 Germany  1998   83500716
##  5 Germany  1999   83490881
##  6 Germany  2000   83512459
...
...

Filtering rows with partial match using str_detect()

Another equivalent function available for filtering rows with partial match is str_detect() function in stringr package. As the name suggests, str_detect() “detects the presence or absence of a pattern in a string”. It is equivalent to grepl(). Note that in contrast to grepl(), the variable name is the first argument and then the pattern of interest while using str_detect().

population %>% 
  filter(str_detect(country,"Ger"))

## # A tibble: 19 × 3
##    country  year population
##    <chr>   <int>      <int>
##  1 Germany  1995   83147770
##  2 Germany  1996   83388930
##  3 Germany  1997   83490697
##  4 Germany  1998   83500716
##  5 Germany  1999   83490881
##  6 Germany  2000   83512459
....
....

Like grepl() function, we can use regexp to filter rows using str_detect() function. In the example below, We are selecting rows based on values starting with a prefix.

population %>% 
  filter(str_detect(country, "^Ger"))

## # A tibble: 19 × 3
##    country  year population
##    <chr>   <int>      <int>
##  1 Germany  1995   83147770
##  2 Germany  1996   83388930
##  3 Germany  1997   83490697
##  4 Germany  1998   83500716
##  5 Germany  1999   83490881
##  6 Germany  2000   83512459
...
...

Related

Filed Under: dplyr Tagged With: dplyr filter and str_detect(), dplyr filter() and grepl(), filter rows based on partial match

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version