• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr’s anti_join() to find rows based on presence or absence in a dataframe

rstats101 · July 5, 2024 ·

In this tutorial, we will learn how to use dplyr’s anti_join() function to filter rows from one dataframe based on the presence or absence of matches in another dataframe. dplyr’s anti_join() function is extremely useful for cases when we want to find what row is present or missing in a dataframe when compared to another dataframe.

library(tidyverse)

dplyr’s anti_join() function takes two dataframes df1 and df2 as arguments and return all rows from df1 without a match in df2. Let us create two toy dataframes to illustrate how to use dplyr’s anti_join().

Here is our first dataframe df1, with three rows.

set.seed(123)
id = sample(1:5,3, replace = FALSE)
df1 <- tibble(id ,
              packages = c("dplyr", "tidyr","tibble"))
df1

# A tibble: 3 × 2
     id packages
  <int> <chr>   
1     3 dplyr   
2     2 tidyr   
3     5 tibble  

The second dataframe also has three rows.

set.seed(123)
id = sample(1:6,3, replace = FALSE)
df2 <- tibble(id,
              packages= c("dplyr","ggplot2","tidyr"))
df2

# A tibble: 3 × 2
     id packages
  <int> <chr>   
1     3 dplyr   
2     6 ggplot2 
3     2 tidyr   

Note that not all rows are the same in the two dataframes. The third row in df1 is not present in df2. Similarly second row in df2 is not present in df1.

dplyr’s anti_join() to get rows present in one dataframe but not in another

By using anti_join() on df1 and df2, we will get a new dataframe with one row that is present in df1, but not in df2. Also note that in the simple use case of anti_join(), it performed join using both the columns.

anti_join(df1,df2)

Joining with `by = join_by(id, packages)`

# A tibble: 1 × 2
     id packages
  <int> <chr>   
1     5 tibble  

If we reverse the order of input to anti_join() df2 first and df1 next, we will get a dataframe with rows that is present in df2, but not in df1.

anti_join(df2, df1)

Joining with `by = join_by(id, packages)`
# A tibble: 1 × 2
     id packages
  <int> <chr>   
1     6 ggplot2 

Related

Filed Under: dplyr anti_join(), rstats101 Tagged With: dplyr anti_join() to find unmatched rows

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version