How to filter rows in a dataframe: dplyr’s filter()

With dplyr’s filter() function, one can filter rows or subset rows from a dataframe. In this post, we will learn how to select subset of rows based on values one or more columns in a dataframe using dplyr’s filter() function.

First we load tidyverse the suit of R packages.

library(tidyverse)

We will use world population dataset available built in R from tidy package.

World population data over the years have three columns like this.

population %>% head()

## # A tibble: 6 × 3
##   country      year population
##   <chr>       <int>      <int>
## 1 Afghanistan  1995   17586073
## 2 Afghanistan  1996   18415307
## 3 Afghanistan  1997   19021226
## 4 Afghanistan  1998   19496836
## 5 Afghanistan  1999   19987071
## 6 Afghanistan  2000   20595360

dplyr filter() Example 1: Select based on a value of a column

Let us filter or subset the dataframe based on one of the column’s value. In this example, we are selecting or subsetting rows whose value of country equals “Germany”. This filter() operation gives us smaller dataframe with Germany’s population data.

population %>% 
  filter(country == "Germany")

## # A tibble: 19 × 3
##    country  year population
##    <chr>   <int>      <int>
##  1 Germany  1995   83147770
##  2 Germany  1996   83388930
##  3 Germany  1997   83490697
##  4 Germany  1998   83500716
##  5 Germany  1999   83490881
##  6 Germany  2000   83512459
##  7 Germany  2001   83583461
##  8 Germany  2002   83685160
##  9 Germany  2003   83788480
## 10 Germany  2004   83848844
## 11 Germany  2005   83835978
## 12 Germany  2006   83740302
## 13 Germany  2007   83578794
## 14 Germany  2008   83379538
## 15 Germany  2009   83182774
## 16 Germany  2010   83017404
## 17 Germany  2011   82892904
## 18 Germany  2012   82800121
## 19 Germany  2013   82726626

Although this example had used equality sign, “==” to make the comparison, other common comparisons we often make and readily usable with dplyr’s filter() function are

  • != inequality sign
  • < less than sign
  • > greater than sign
  • <= less than or equal to sign
  • >= greater than or equal to sign

dplyr filter() Example 2: Select based on values of two columns

In the second example illustrating the use of filter() function, we show how we can select or filter rows based on values of more than one column. And also we will use two types of comparison, one equality sign and the other greater than sign.

Here we select rows where country is equal to Germany and the year is > 2010. We combine the two conditions with & symbol as we want both to be satisfied.

population %>% 
  filter(country == "Germany" & year > 2010)

## # A tibble: 3 × 3
##   country  year population
##   <chr>   <int>      <int>
## 1 Germany  2011   82892904
## 2 Germany  2012   82800121
## 3 Germany  2013   82726626

dplyr filter() Example 3: Select based on values of a single column

In the third example, we show how to select or filter rows of a dataframe for multiple values of a single column. In this example we %in% operator instead of equality sign to select two countries.

population %>% 
  filter(country %in% c("Germany", "Australia"))

## # A tibble: 38 × 3
##    country    year population
##    <chr>     <int>      <int>
##  1 Australia  1995   18124234
##  2 Australia  1996   18339037
##  3 Australia  1997   18563442
##  4 Australia  1998   18794552
##  5 Australia  1999   19027438
##  6 Australia  2000   19259377
##  7 Australia  2001   19487257
##  8 Australia  2002   19714625
##  9 Australia  2003   19953121
## 10 Australia  2004   20218481
## # … with 28 more rows


	
Exit mobile version