In this tutorial, we will learn how to separate a column with multiple elements separated by a delimiter into different rows. We will use tidy’s separate_rows() function first to separate a column with comma separated elements. And then show how to use separate_rows() on multiple columns. We will also see how to make separate_rows() function to automatically infer the type of updated column.
Let us get started by loading tidyverse and checking the tidyr package version.
library(tidyvrerse) packageVersion("tidyr") ## [1] '1.2.0'
To illustrate tidyr’s separate_rows(), we create a toy dataframe using tibble() function with a column containing elements that are separated by comma.
df <- tibble( continent = c("America","Asia","Europe"), country = c("USA,Canada", "Singapore","England, France, Germany") )
Our dataframe with collapsed column looks like this.
df ## # A tibble: 3 × 2 ## continent country ## <chr> <chr> ## 1 America USA,Canada ## 2 Asia Singapore ## 3 Europe England, France, Germany
tidyr’s separate_rows(): Example with one column
When a column contains elements with one or more delimited values, separate_rows() function separates the values in the column and places each one in its own row.
seprate_rows() function takes one or more columns that have elements separated by a delimiter. When we don’t specify a column, it does nothing as expected.
df %>% separate_rows() ## # A tibble: 3 × 2 ## continent country ## <chr> <chr> ## 1 America USA,Canada ## 2 Asia Singapore ## 3 Europe England, France, Germany
In our example, the country column contain elements separated by comma. By using the collapsed column as argument to separate_rows() function we get expanded dataframe with separte row for each collapsed element.
df %>% separate_rows(country) ## # A tibble: 6 × 2 ## continent country ## <chr> <chr> ## 1 America USA ## 2 America Canada ## 3 Asia Singapore ## 4 Europe England ## 5 Europe France ## 6 Europe Germany
tidyr’s separate_rows(): Example with multiple columns
In the previous example, we had just one column with collapsed elements. To illustrate separate_rows() can work with multiple columns, we create an updated tibble with two columns that have comma separated elements.
df <- tibble( continent = c("America","Asia","Eurpoe"), country = c("USA,Canada", "Singapore","England, France, Germany"), lifeExp = c("70,72", "68", "75,76,77") ) df ## # A tibble: 3 × 3 ## continent country lifeExp ## <chr> <chr> <chr> ## 1 America USA,Canada 70,72 ## 2 Asia Singapore 68 ## 3 Eurpoe England, France, Germany 75,76,77
By providing both the columns that needed to be separated into rows as arguments, we get a tibble like this
df %>% separate_rows(country, lifeExp) ## # A tibble: 6 × 3 ## continent country lifeExp ## <chr> <chr> <chr> ## 1 America USA 70 ## 2 America Canada 72 ## 3 Asia Singapore 68 ## 4 Eurpoe England 75 ## 5 Eurpoe France 76 ## 6 Eurpoe Germany 77
tidyr’s separate_rows() Example with inferring datatype
Note that in the above example, the second collapsed column is of numerical type. We can instruct separate_rows() function to convert column to appropriate data type after separating by using convert=TRUE>. Now our lifeExp column is int instead of character.
df %>% separate_rows(country, lifeExp, convert=TRUE) ## # A tibble: 6 × 3 ## continent country lifeExp ## <chr> <chr> <int> ## 1 America USA 70 ## 2 America Canada 72 ## 3 Asia Singapore 68 ## 4 Eurpoe England 75 ## 5 Eurpoe France 76 ## 6 Eurpoe Germany 77
tidyr’s separate_rows() Example sep argument
separate_rows() function is pretty decent in detecting the commonly used delimiters. Here is example where each element is separated by two under scores, an uncommon delimiter.
df <- tibble( continent = c("America","Asia","Eurpoe"), country = c("USA__Canada", "Singapore","England__France__Germany"), lifeExp = c("70__72", "68", "75__76__77") ) df ## # A tibble: 3 × 3 ## continent country lifeExp ## <chr> <chr> <chr> ## 1 America USA__Canada 70__72 ## 2 Asia Singapore 68 ## 3 Eurpoe England__France__Germany 75__76__77
Even the the delimiter that we used is not common, tidyr’s separate_rows() nicely detected them and separated into rows.
df %>% separate_rows(country, lifeExp,convert=TRUE) ## # A tibble: 6 × 3 ## continent country lifeExp ## <chr> <chr> <int> ## 1 America USA 70 ## 2 America Canada 72 ## 3 Asia Singapore 68 ## 4 Eurpoe England 75 ## 5 Eurpoe France 76 ## 6 Eurpoe Germany 77
With “sep” argument, we can specify a delimiter to separate into rows with separate_rows().