How to add row number within each group in dplyr

In this tutorial, we will learn how to add row numbers with in each group of a dataframe in R. We will use combination of group_by() and row_number() functions from dplyr to add row number by group.

To get started let us load tidyverse and palmer penguin datasets.

library(tidyverse)
library(palmerpenguins)

Here we use dplyr version 1.0.8.

packageVersion("dplyr")
## [1] '1.0.8'

For this example, we will sample 10 rows and select two columns of penguins dataset.

set.seed(42)
df <- penguins %>% 
  sample_n(10) %>%
  select(1:2)

## # A tibble: 10 × 2
##    species   island   
##    <fct>     <fct>    
##  1 Adelie    Dream    
##  2 Chinstrap Dream    
##  3 Gentoo    Biscoe   
##  4 Adelie    Torgersen
##  5 Gentoo    Biscoe   
##  6 Adelie    Dream    
##  7 Adelie    Torgersen
##  8 Chinstrap Dream    
##  9 Adelie    Torgersen
## 10 Chinstrap Dream

The key idea to add row number per each group of interest is to use group the data by the variable of interest. For example, if we want to add row number within one group, we group by that variable and then use row_number() function as shown below. Here, we add row number for each “species”.

df %>% 
  group_by(species) %>%
  mutate(num = row_number()) 

## # A tibble: 10 × 3
## # Groups:   species [3]
##    species   island      num
##    <fct>     <fct>     <int>
##  1 Adelie    Dream         1
##  2 Chinstrap Dream         1
##  3 Gentoo    Biscoe        1
##  4 Adelie    Torgersen     2
##  5 Gentoo    Biscoe        2
##  6 Adelie    Dream         3
##  7 Adelie    Torgersen     4
##  8 Chinstrap Dream         2
##  9 Adelie    Torgersen     5
## 10 Chinstrap Dream         3

Since the group variable need not be sorted in our data, we can sort the data by the grouping variable to see the row numbers in each group.

df %>% 
  group_by(species) %>%
  mutate(num = row_number()) %>%
  arrange(species)

## # A tibble: 10 × 3
## # Groups:   species [3]
##    species   island      num
##    <fct>     <fct>     <int>
##  1 Adelie    Dream         1
##  2 Adelie    Torgersen     2
##  3 Adelie    Dream         3
##  4 Adelie    Torgersen     4
##  5 Adelie    Torgersen     5
##  6 Chinstrap Dream         1
##  7 Chinstrap Dream         2
##  8 Chinstrap Dream         3
##  9 Gentoo    Biscoe        1
## 10 Gentoo    Biscoe        2

In the above example, we showed how to use group_by() and row_number() function to add row number by a single grouping variable. We can use the same approach, if we are interested in grouping the dataframe by multiple variables and add row numbers for each unique combination of the grouping variables.

Here is an example, where we group by two variables and add row number with in each combination.

df %>% 
  group_by(species, island) %>%
  mutate(num = row_number()) 

## # A tibble: 10 × 3
## # Groups:   species, island [4]
##    species   island      num
##    <fct>     <fct>     <int>
##  1 Adelie    Dream         1
##  2 Chinstrap Dream         1
##  3 Gentoo    Biscoe        1
##  4 Adelie    Torgersen     1
##  5 Gentoo    Biscoe        2
##  6 Adelie    Dream         2
##  7 Adelie    Torgersen     2
##  8 Chinstrap Dream         2
##  9 Adelie    Torgersen     3
## 10 Chinstrap Dream         3

As before, we can clearly see the row number per group in order after sorting by the two grouping variables.

df %>% 
  group_by(species, island) %>%
  mutate(num = row_number()) %>%
  arrange(species, island)

## # A tibble: 10 × 3
## # Groups:   species, island [4]
##    species   island      num
##    <fct>     <fct>     <int>
##  1 Adelie    Dream         1
##  2 Adelie    Dream         2
##  3 Adelie    Torgersen     1
##  4 Adelie    Torgersen     2
##  5 Adelie    Torgersen     3
##  6 Chinstrap Dream         1
##  7 Chinstrap Dream         2
##  8 Chinstrap Dream         3
##  9 Gentoo    Biscoe        1
## 10 Gentoo    Biscoe        2
Exit mobile version