In this tutorial, we will learn how to create a nested dataframe using nest() function in tidyverse. A nested dataframe is a dataframe where one or more columns are list columns. In a simple dataframe, columns are simple/atomic vectors. However, column can contain other data structures like list, or dataframe. Such columns are called list columns.
library(tidyverse) packageVersion("dplyr") [1] '1.1.4'
Let us create a dataframe with group id and group members as two columns.
data <- data.frame( group_id = c("A", "A", "A", "B", "B", "C", "C"), member = c("John", "Paul", "Stella", "Paul", "Jake", "John", "Mary") )
data group_id member 1 A John 2 A Paul 3 A Stella 4 B Paul 5 B Jake 6 C John 7 C Mary
We will show how to create a nested dataframe such that in each row we will have group id and the list of members as a list in the list column. We first group_by() by the group ID and then use summarize to create the list its members.
nested <- data |> group_by(group_id) |> summarize(members = list(member))
Our nested dataframe looks like this.
nested # A tibble: 3 × 2 group_id members <chr> <list> 1 A <chr [3]> 2 B <chr [2]> 3 C <chr [2]>
Here is way to access the values in the list columns
nested$members[[1]] [1] "John" "Paul" "Stella"
nested$members[[2]] [1] "Paul" "Jake"
We can unnest the nested dataframe and get back the original dataframe using unnest() function.
nested |> unnest() Warning: `cols` is now required when using `unnest()`. ℹ Please use `cols = c(members)`. # A tibble: 7 × 2 group_id members <chr> <chr> 1 A John 2 A Paul 3 A Stella 4 B Paul 5 B Jake 6 C John 7 C Mary
Here we clearly specify how to unnest the nested dataframe.
nested |> unnest(members) # A tibble: 7 × 2 group_id members <chr> <chr> 1 A John 2 A Paul 3 A Stella 4 B Paul 5 B Jake 6 C John 7 C Mary
Note that we have not used nest() function to create nested dataframe. With tidyr’s nest() function we can create list columns with tibbles easily.
Leave a Reply