In this tutorial, we will learn how to add row numbers with in each group of a dataframe in R. We will use combination of group_by() and row_number() functions from dplyr to add row number by group.
To get started let us load tidyverse and palmer penguin datasets.
library(tidyverse) library(palmerpenguins)
Here we use dplyr version 1.0.8.
packageVersion("dplyr") ## [1] '1.0.8'
For this example, we will sample 10 rows and select two columns of penguins dataset.
set.seed(42) df <- penguins %>% sample_n(10) %>% select(1:2) ## # A tibble: 10 × 2 ## species island ## <fct> <fct> ## 1 Adelie Dream ## 2 Chinstrap Dream ## 3 Gentoo Biscoe ## 4 Adelie Torgersen ## 5 Gentoo Biscoe ## 6 Adelie Dream ## 7 Adelie Torgersen ## 8 Chinstrap Dream ## 9 Adelie Torgersen ## 10 Chinstrap Dream
The key idea to add row number per each group of interest is to use group the data by the variable of interest. For example, if we want to add row number within one group, we group by that variable and then use row_number() function as shown below. Here, we add row number for each “species”.
df %>% group_by(species) %>% mutate(num = row_number()) ## # A tibble: 10 × 3 ## # Groups: species [3] ## species island num ## <fct> <fct> <int> ## 1 Adelie Dream 1 ## 2 Chinstrap Dream 1 ## 3 Gentoo Biscoe 1 ## 4 Adelie Torgersen 2 ## 5 Gentoo Biscoe 2 ## 6 Adelie Dream 3 ## 7 Adelie Torgersen 4 ## 8 Chinstrap Dream 2 ## 9 Adelie Torgersen 5 ## 10 Chinstrap Dream 3
Since the group variable need not be sorted in our data, we can sort the data by the grouping variable to see the row numbers in each group.
df %>% group_by(species) %>% mutate(num = row_number()) %>% arrange(species) ## # A tibble: 10 × 3 ## # Groups: species [3] ## species island num ## <fct> <fct> <int> ## 1 Adelie Dream 1 ## 2 Adelie Torgersen 2 ## 3 Adelie Dream 3 ## 4 Adelie Torgersen 4 ## 5 Adelie Torgersen 5 ## 6 Chinstrap Dream 1 ## 7 Chinstrap Dream 2 ## 8 Chinstrap Dream 3 ## 9 Gentoo Biscoe 1 ## 10 Gentoo Biscoe 2
In the above example, we showed how to use group_by() and row_number() function to add row number by a single grouping variable. We can use the same approach, if we are interested in grouping the dataframe by multiple variables and add row numbers for each unique combination of the grouping variables.
Here is an example, where we group by two variables and add row number with in each combination.
df %>% group_by(species, island) %>% mutate(num = row_number()) ## # A tibble: 10 × 3 ## # Groups: species, island [4] ## species island num ## <fct> <fct> <int> ## 1 Adelie Dream 1 ## 2 Chinstrap Dream 1 ## 3 Gentoo Biscoe 1 ## 4 Adelie Torgersen 1 ## 5 Gentoo Biscoe 2 ## 6 Adelie Dream 2 ## 7 Adelie Torgersen 2 ## 8 Chinstrap Dream 2 ## 9 Adelie Torgersen 3 ## 10 Chinstrap Dream 3
As before, we can clearly see the row number per group in order after sorting by the two grouping variables.
df %>% group_by(species, island) %>% mutate(num = row_number()) %>% arrange(species, island) ## # A tibble: 10 × 3 ## # Groups: species, island [4] ## species island num ## <fct> <fct> <int> ## 1 Adelie Dream 1 ## 2 Adelie Dream 2 ## 3 Adelie Torgersen 1 ## 4 Adelie Torgersen 2 ## 5 Adelie Torgersen 3 ## 6 Chinstrap Dream 1 ## 7 Chinstrap Dream 2 ## 8 Chinstrap Dream 3 ## 9 Gentoo Biscoe 1 ## 10 Gentoo Biscoe 2
1 comment
Comments are closed.