• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr n_distinct(): count unique elements or rows

rstats101 · September 14, 2024 ·

In this post, we will learn how to use dplyr’s n_distinct() function to count the number of unique or distinct values in one or more vectors or columns of a dataframe.

dplyr’s n_distinct() is very useful when you are working with a dataframe and need to know how many unique or distinct values or combinatons are there. dplyr claims that it is faster than counting using unique() function in base R.

We will see two examples of using dplyr’s n_distinct() function, first applying n_distinct() on a vector or a column with no missing values and then see an example of using n_distinct() with missing vales (NAs).

library(tidyverse)
packageVersion("dplyr")

[1] '1.1.4'

Let us create a sample dataframe with a column that we are interested in counting the number of unique values. Let that column also contain NA, to illustrate how to handle missing values with n_distinct()

# Create a data frame
df <- tibble(
  id = c(2, 4, 1, 2, 3, 4, NA),
  amount = c(250, 200, 250, 150, 300, 120,200)
)

df

# A tibble: 7 × 2
     id amount
  <dbl>  <dbl>
1     2    250
2     4    200
3     1    250
4     2    150
5     3    300
6     4    120
7    NA    200

To count the number of unique values in a column, we can use dplyr’s n_distinct() function as shown below. And we get the column “id” has 5 unique values, 1,2,3,4, and NA.

Note that by default n_distinct() does not remove NAs and count.

n_distinct(df$id)

[1] 5

We can also use tidyverse approach to get the number of unique/distinct elements in a column as shown below.

df |>
  pull(id) |> 
  n_distinct()

[1] 5

If we want to ignore missing values while counting the number of elements, we need to use na.rm=TRUE as an argument to n_distinct() function as shown below.

df |>
  pull(id) |> 
  n_distinct(na.rm=TRUE)

[1] 4

n_distinct() to count number of unique rows in a dataframe

Note our data frame has a row duplicated. We can n_distinct() to compute the number of unique rows

df

# A tibble: 7 × 2
     id amount
  <dbl>  <dbl>
1     2    250
2     4    200
3     1    150
4     2    250
5     3    300
6     4    120
7    NA    200
 

Including NA, we have 6 distinct rows in the dataframe.

df |>
   n_distinct()

[1] 6

Related

Filed Under: dplyr n_distinct(), rstats101 Tagged With: n_distinct() to count unique elements

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version