How to Split a Dataframe into a list of Dataframes by groups in R

In this tutorial, we will learn how to split a dataframe into a list of dataframes by groups in R. We will first learn how to use the base R function, split(), to divide a dataframe into multiple dataframes into a list. Then, we will learn how to use dplyr’s group_split() function to do the same.

To get started, we will first load tidyverse, a suite R packages, and palmer penguins for using the penguins data.

library(tidyverse)
# check the version of loaded package dplyr
packageVersion("dplyr")
## [1] '1.0.8'
library(palmerpenguins)

How to Split a Dataframe into a list of Dataframes by groups using split() in base R

split() function in base R divides the data in a vector or a dataframe into a list of groups. Here we show how to split a dataframe by group

list_of_dataframes_by_split <- split(penguins, penguins$species)

Looking at the structure of the resulting object from split(), we can see that it is a list containing 3 elements, with each element is a dataframe.

str(list_of_dataframes_by_split)

## List of 3
##  $ Adelie   : tibble [152 × 8] (S3: tbl_df/tbl/data.frame)
##   ..$ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##   ..$ bill_length_mm   : num [1:152] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##   ..$ bill_depth_mm    : num [1:152] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##   ..$ flipper_length_mm: int [1:152] 181 186 195 NA 193 190 181 195 193 190 ...
##   ..$ body_mass_g      : int [1:152] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##   ..$ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##   ..$ year             : int [1:152] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
##  $ Chinstrap: tibble [68 × 8] (S3: tbl_df/tbl/data.frame)
##   ..$ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ bill_length_mm   : num [1:68] 46.5 50 51.3 45.4 52.7 45.2 46.1 51.3 46 51.3 ...
##   ..$ bill_depth_mm    : num [1:68] 17.9 19.5 19.2 18.7 19.8 17.8 18.2 18.2 18.9 19.9 ...
##   ..$ flipper_length_mm: int [1:68] 192 196 193 188 197 198 178 197 195 198 ...
##   ..$ body_mass_g      : int [1:68] 3500 3900 3650 3525 3725 3950 3250 3750 4150 3700 ...
##   ..$ sex              : Factor w/ 2 levels "female","male": 1 2 2 1 2 1 1 2 1 2 ...
##   ..$ year             : int [1:68] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
##  $ Gentoo   : tibble [124 × 8] (S3: tbl_df/tbl/data.frame)
##   ..$ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 3 3 3 3 3 3 3 3 3 3 ...
##   ..$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ bill_length_mm   : num [1:124] 46.1 50 48.7 50 47.6 46.5 45.4 46.7 43.3 46.8 ...
##   ..$ bill_depth_mm    : num [1:124] 13.2 16.3 14.1 15.2 14.5 13.5 14.6 15.3 13.4 15.4 ...
##   ..$ flipper_length_mm: int [1:124] 211 230 210 218 215 210 211 219 209 215 ...
##   ..$ body_mass_g      : int [1:124] 4500 5700 4450 5700 5400 4550 4800 5200 4400 5150 ...
##   ..$ sex              : Factor w/ 2 levels "female","male": 1 2 1 2 2 1 1 2 1 2 ...
##   ..$ year             : int [1:124] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

How to Split a Dataframe into a list of Dataframes by groups using group_split() in dplyr

dplyr has an experimental function group_split() that behaves very much like base R’s split() function. We can use group_split() in two ways. In this example below, we provide the dataframe and the grouping variable to split the dataframe into a list of smaller data frames.

penguins %>% 
  group_split(species)

## <list_of<
##   tbl_df<
##     species          : factor<b22a0>
##     island           : factor<ccf33>
##     bill_length_mm   : double
##     bill_depth_mm    : double
##     flipper_length_mm: integer
##     body_mass_g      : integer
##     sex              : factor<8f119>
##     year             : integer
##   >
## >[3]>
## [[1]]
## # A tibble: 152 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 142 more rows, and 2 more variables: sex <fct>, year <int>
## 
## [[2]]
## # A tibble: 68 × 8
##    species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
##  1 Chinstrap Dream            46.5          17.9               192        3500
##  2 Chinstrap Dream            50            19.5               196        3900
##  3 Chinstrap Dream            51.3          19.2               193        3650
##  4 Chinstrap Dream            45.4          18.7               188        3525
##  5 Chinstrap Dream            52.7          19.8               197        3725
##  6 Chinstrap Dream            45.2          17.8               198        3950
##  7 Chinstrap Dream            46.1          18.2               178        3250
##  8 Chinstrap Dream            51.3          18.2               197        3750
##  9 Chinstrap Dream            46            18.9               195        4150
## 10 Chinstrap Dream            51.3          19.9               198        3700
## # … with 58 more rows, and 2 more variables: sex <fct>, year <int>
## 
## [[3]]
## # A tibble: 124 × 8
##    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
##  1 Gentoo  Biscoe           46.1          13.2               211        4500
##  2 Gentoo  Biscoe           50            16.3               230        5700
##  3 Gentoo  Biscoe           48.7          14.1               210        4450
##  4 Gentoo  Biscoe           50            15.2               218        5700
##  5 Gentoo  Biscoe           47.6          14.5               215        5400
##  6 Gentoo  Biscoe           46.5          13.5               210        4550
##  7 Gentoo  Biscoe           45.4          14.6               211        4800
##  8 Gentoo  Biscoe           46.7          15.3               219        5200
##  9 Gentoo  Biscoe           43.3          13.4               209        4400
## 10 Gentoo  Biscoe           46.8          15.4               215        5150
## # … with 114 more rows, and 2 more variables: sex <fct>, year <int>

dplyr’s group_split() function can also work on grouped object, i.e. result from group_by() function in dplyr. For example, here we have grouped object after applying group_by() to the dataframe.

grp_obj <- penguins %>% 
  group_by(species) 

Then we can split into a list dataframes using group_split() as shown here and we get the same results as before.

grp_obj %>%
  group_split()
Exit mobile version