• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to select rows with multiple partial matching strings

rstats101 · July 3, 2024 ·

In this tutorial, we will learn how to select or filter rows of a dataframe with multiple partially matching strings. dplyr’s filter() function selects/filters rows based on values of one or more columns when it completely matches. However, if you want to select rows with partially matching strings in a column, we use filter() function in combination with str_detect() in R.

In this post, we will learn how to use filter() and str_detect() functions to detect multiple partial matrching string.

library(tidyverse)

We will use world population data built-in with tidyr package in tidyverse to learn how can we use grepl() and str_detect() functions to select partially matching rows.

population %>% head()
## # A tibble: 6 × 3
##   country      year population
##   <chr>       <int>      <int>
## 1 Afghanistan  1995   17586073
## 2 Afghanistan  1996   18415307
## 3 Afghanistan  1997   19021226
## 4 Afghanistan  1998   19496836
## 5 Afghanistan  1999   19987071
## 6 Afghanistan  2000   20595360

Filtering rows with partial match using str_detect()

Heee is an example of using stringr’s str_detect() function to detect “the presence or absence of a pattern in a string”. Note that while using str_detect() the variable or column name is the first argument and then the pattern of interest.

population %>% 
  filter(str_detect(country,"Ger"))

## # A tibble: 19 × 3
##    country  year population
##    <chr>   <int>      <int>
##  1 Germany  1995   83147770
##  2 Germany  1996   83388930
##  3 Germany  1997   83490697
##  4 Germany  1998   83500716
##  5 Germany  1999   83490881
##  6 Germany  2000   83512459
....
....

If we want to filter using multiple patterns, we can use str_detect() function in the following way. In this example using str_detect() function below, we filter rows if either “Ger” or “Aus” matches in country column. And this has given us rows corresponding the countries Australia and Germany.

population %>% 
  filter(str_detect(country,"Ger|Aus"))

## # A tibble: 57 × 3
##    country    year population
##    <chr>     <int>      <int>
##  1 Australia  1995   18124234
##  2 Australia  1996   18339037
##  3 Australia  1997   18563442
##  4 Australia  1998   18794552
##  5 Australia  1999   19027438
##  6 Australia  2000   19259377
##  7 Australia  2001   19487257
##  8 Australia  2002   19714625
##  9 Australia  2003   19953121
## 10 Australia  2004   20218481
## # … with 47 more rows

Related

Filed Under: rstats101, str_detect(), stringr Tagged With: filter rows based on multiple partial matching strings

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version