• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

duplicated() function in R: Find duplicated elements in a vector or dataframe

rstats101 · June 30, 2021 ·

In this tutorial, we will learn about the base R function duplicated() and how can we use duplicated() function to find if an element in a vector is duplicated or a row in a dataframe is duplicated. duplicated() function can take a vector, matrix or a dataframe as input and give us boolean or logical vector telling if it duplicated or not.

Find Duplicate elements in a vector with duplicated()

Let us create some data vector with duplicates. Here we use sample() function to get bootstrapped samples with replacements.

set.seed(123)
x <- sample(10,10, replace=TRUE)

We can see that our data vector contains multiple duplicates. Basically first and second elements are duplicated , fifth and eighth elements are duplicated and third and 10th elements are duplicated.

x
##  [1]  3  3 10  2  6  5  4  6  9 10

We can use duplicated() function to identify duplicated elements. duplicated() function on the vector gives us boolean vector with TRUE values where there is duplicated elements.

duplicated(x)

##  [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE

Here we use which() function identify the indices of boolean vector where there is TRUE values. This gives us the indices of duplicated index.

which(duplicated(x))
## [1]  2  8 10

By default, duplicated() function identifies duplicated elements from first element. With fromLast=TRUE argument, we can identify duplicated elements from last. Here is example with fromLast=TRUE on the same data.

We can see that now different elements are identified as duplicates.

duplicated(x, fromLast = TRUE)
##  [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

Find the index of first duplicate elements in a vector

anyDuplicated() function in R is a related function that is useful to identify the index of first duplicate elements. It returns the index i of the first duplicated entry x[i] if there is one, and 0 otherwise

anyDuplicated(x)
## [1] 2

Note that it stops after identifying the first duplicates. In our example, we have more elements duplicated after the first duplicated element.

Find Duplicated rows in a dataframe with duplicated()

duplicated() function is also useful in identifying duplicated rows in a dataframe. Let us create a dataframe with duplicated rows using sample() and tibble() function in tidyverse().

set.seed(123)
df <- tibble(
  a = sample(3, 10, rep = TRUE),
  b = sample(3, 10, rep = TRUE)
  #c = sample(3, 10, rep = TRUE),
)

Our dataframe looks like this with two columns and a few duplicated rows.

df

## # A tibble: 10 x 2
##        a     b
##    <int> <int>
##  1     3     2
##  2     3     2
##  3     3     1
##  4     2     2
##  5     3     3
##  6     2     1
##  7     2     3
##  8     2     3
##  9     3     1
## 10     1     1

By using duplicated() function on the dataframe we can get boolean vector identifying if the row is duplicated or not.

duplicated(df)
##  [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

In this example, we can see that second row is duplicated and eighth row is also duplicated as they are TRUE in the boolean vector.

Find Duplicated rows in a matrix with duplicated()

We can use duplicated() function on a matrix to find the rows that are duplicated. Let us convert the dataframe we created above into a matrix using as.matrix() function.

mat <- as.matrix(df)

Now we have our data as matrix and using duplicated() function on the matrix, we can identify the rows that are duplicated.

duplicated(mat)
##  [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

Related

Filed Under: duplicated(), rstats Tagged With: find duplicate elements, find duplicate rows

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version