3 ways to rank numbers with tidyverse

In this tutorial, we will learn 3 ways to rank integers in tidyverse. Tidyverse’s dplyr has three integer ranking functions, row_number(), min_rank(), and dense_Rank(), inspired by SQL. And these integer ranking functions differ in how they handle ties.

library(tidyverse)
packageVersion("dplyr")
[1] '1.1.2'

Let us jump into simple examples as given by dplyr and create tibble with a sorted column with ties.

df <- tibble(x = c(10,20,20,60))
print(df)

# A tibble: 4 × 1
      x
  <dbl>
1    10
2    20
3    20
4    60

unique rank with row_number()

row_number() gives every input a unique rank, so that c(10, 20, 20, 30) would get ranks c(1, 2, 3, 4). It’s equivalent to rank(ties.method = “first”).

df %>%
  mutate(row_no =  row_number(x))

# A tibble: 4 × 2
      x row_no
  <dbl>  <int>
1    10      1
2    20      2
3    20      3
4    60      4

min_rank(): lowest rank for all tied elements

min_rank() function deals with any ties by assigning the lowest rank to all tied elements. For example

df %>%
  mutate(min_rank =  min_rank(x))

# A tibble: 4 × 2
      x min_rank
  <dbl>    <int>
1    10        1
2    20        2
3    20        2
4    60        4

dense_rank(): ranking with no gaps

dense_rank() is similar to min_rank() in that it provides the same smallest rank to tied elements, but it does not leave any gaps unlike min_rank(). For example

df %>%
  mutate(dense_rank =  dense_rank(x))

# A tibble: 4 × 2
      x dense_rank
  <dbl>      <int>
1    10          1
2    20          2
3    20          2
4    60          3

3 ranking functions in action

The previous examples showed how the three ranking functions work and their difference. Now let us see another example where the original column is not sorted.

Our data looks like this.

df2 <- tibble( y = c(8,5,4,4,6))
print(df2)

# A tibble: 5 × 1
      y
  <dbl>
1     8
2     5
3     4
4     4
5     6

The ranking function row_number() would give us

df2 %>%
  mutate(row_no =  row_number(y))

# A tibble: 5 × 2
      y row_no
  <dbl>  <int>
1     8      5
2     5      3
3     4      1
4     4      2
5     6      4

The ranking function min_rank() would give us

df2 %>%
  mutate(min_rank =  min_rank(y))

# A tibble: 5 × 2
      y min_rank
  <dbl>    <int>
1     8        5
2     5        3
3     4        1
4     4        1
5     6        4

The ranking function dplyr’s dense_rank() would give us

df2 %>%
  mutate(dense_rank =  dense_rank(y))

# A tibble: 5 × 2
      y dense_rank
  <dbl>      <int>
1     8          4
2     5          2
3     4          1
4     4          1
5     6          3

unique rank with row_number()

min_rank(): lowest rank for all tied elements

dense_rank(): ranking with no gaps

3 ranking functions in action

Related