How to create a nested dataframe with lists

In this tutorial, we will learn how to create a nested dataframe using nest() function in tidyverse. A nested dataframe is a dataframe where one or more columns are list columns. In a simple dataframe, columns are simple/atomic vectors. However, column can contain other data structures like list, or dataframe. Such columns are called list columns.

library(tidyverse)
packageVersion("dplyr")
[1] '1.1.4'

Let us create a dataframe with group id and group members as two columns.

data <- data.frame(
  group_id = c("A", "A", "A", "B", "B", "C", "C"),
  member = c("John", "Paul", "Stella", "Paul", "Jake", "John", "Mary")
)
data

  group_id member
1        A   John
2        A   Paul
3        A Stella
4        B   Paul
5        B   Jake
6        C   John
7        C   Mary

We will show how to create a nested dataframe such that in each row we will have group id and the list of members as a list in the list column. We first group_by() by the group ID and then use summarize to create the list its members.

nested <- data |>
  group_by(group_id) |>
  summarize(members = list(member))

Our nested dataframe looks like this.

nested

# A tibble: 3 × 2
  group_id members  
  <chr>    <list>   
1 A        <chr [3]>
2 B        <chr [2]>
3 C        <chr [2]>

Here is way to access the values in the list columns

nested$members[[1]]

[1] "John"   "Paul"   "Stella"
nested$members[[2]]

[1] "Paul" "Jake"

We can unnest the nested dataframe and get back the original dataframe using unnest() function.

nested |> unnest()

Warning: `cols` is now required when using `unnest()`.
ℹ Please use `cols = c(members)`.
# A tibble: 7 × 2
  group_id members
  <chr>    <chr>  
1 A        John   
2 A        Paul   
3 A        Stella 
4 B        Paul   
5 B        Jake   
6 C        John   
7 C        Mary   

Here we clearly specify how to unnest the nested dataframe.

nested |> unnest(members)

# A tibble: 7 × 2
  group_id members
  <chr>    <chr>  
1 A        John   
2 A        Paul   
3 A        Stella 
4 B        Paul   
5 B        Jake   
6 C        John   
7 C        Mary   

Note that we have not used nest() function to create nested dataframe. With tidyr’s nest() function we can create list columns with tibbles easily.

Leave a comment

Your email address will not be published. Required fields are marked *

Exit mobile version