How to Separate a Column into Multiple Rows in R: Hint tidyr’s spearate_row()

In this tutorial, we will learn how to separate a column with multiple elements separated by a delimiter into different rows. We will use tidy’s separate_rows() function first to separate a column with comma separated elements. And then show how to use separate_rows() on multiple columns. We will also see how to make separate_rows() function to automatically infer the type of updated column.

Let us get started by loading tidyverse and checking the tidyr package version.

library(tidyvrerse)
packageVersion("tidyr")

## [1] '1.2.0'

To illustrate tidyr’s separate_rows(), we create a toy dataframe using tibble() function with a column containing elements that are separated by comma.

df <- tibble(
  continent = c("America","Asia","Europe"),
  country = c("USA,Canada", "Singapore","England, France, Germany")
)

Our dataframe with collapsed column looks like this.

df

## # A tibble: 3 × 2
##   continent country                 
##   <chr>     <chr>                   
## 1 America   USA,Canada              
## 2 Asia      Singapore               
## 3 Europe    England, France, Germany

tidyr’s separate_rows(): Example with one column

When a column contains elements with one or more delimited values, separate_rows() function separates the values in the column and places each one in its own row.

seprate_rows() function takes one or more columns that have elements separated by a delimiter. When we don’t specify a column, it does nothing as expected.

df %>% 
   separate_rows()

## # A tibble: 3 × 2
##   continent country                 
##   <chr>     <chr>                   
## 1 America   USA,Canada              
## 2 Asia      Singapore               
## 3 Europe    England, France, Germany

In our example, the country column contain elements separated by comma. By using the collapsed column as argument to separate_rows() function we get expanded dataframe with separte row for each collapsed element.

df %>% 
   separate_rows(country)

## # A tibble: 6 × 2
##   continent country  
##   <chr>     <chr>    
## 1 America   USA      
## 2 America   Canada   
## 3 Asia      Singapore
## 4 Europe    England  
## 5 Europe    France   
## 6 Europe    Germany

tidyr’s separate_rows(): Example with multiple columns

In the previous example, we had just one column with collapsed elements. To illustrate separate_rows() can work with multiple columns, we create an updated tibble with two columns that have comma separated elements.

df <- tibble(
  continent = c("America","Asia","Eurpoe"),
  country = c("USA,Canada", "Singapore","England, France, Germany"),
  lifeExp = c("70,72", "68", "75,76,77")
)

df
## # A tibble: 3 × 3
##   continent country                  lifeExp 
##   <chr>     <chr>                    <chr>   
## 1 America   USA,Canada               70,72   
## 2 Asia      Singapore                68      
## 3 Eurpoe    England, France, Germany 75,76,77

By providing both the columns that needed to be separated into rows as arguments, we get a tibble like this

df %>% 
  separate_rows(country, lifeExp)

## # A tibble: 6 × 3
##   continent country   lifeExp
##   <chr>     <chr>     <chr>  
## 1 America   USA       70     
## 2 America   Canada    72     
## 3 Asia      Singapore 68     
## 4 Eurpoe    England   75     
## 5 Eurpoe    France    76     
## 6 Eurpoe    Germany   77

tidyr’s separate_rows() Example with inferring datatype

Note that in the above example, the second collapsed column is of numerical type. We can instruct separate_rows() function to convert column to appropriate data type after separating by using convert=TRUE>. Now our lifeExp column is int instead of character.

df %>% 
  separate_rows(country, lifeExp, convert=TRUE)

## # A tibble: 6 × 3
##   continent country   lifeExp
##   <chr>     <chr>       <int>
## 1 America   USA            70
## 2 America   Canada         72
## 3 Asia      Singapore      68
## 4 Eurpoe    England        75
## 5 Eurpoe    France         76
## 6 Eurpoe    Germany        77

tidyr’s separate_rows() Example sep argument

separate_rows() function is pretty decent in detecting the commonly used delimiters. Here is example where each element is separated by two under scores, an uncommon delimiter.

df <- tibble(
  continent = c("America","Asia","Eurpoe"),
  country = c("USA__Canada", "Singapore","England__France__Germany"),
  lifeExp = c("70__72", "68", "75__76__77")
)

df
## # A tibble: 3 × 3
##   continent country                  lifeExp   
##   <chr>     <chr>                    <chr>     
## 1 America   USA__Canada              70__72    
## 2 Asia      Singapore                68        
## 3 Eurpoe    England__France__Germany 75__76__77

Even the the delimiter that we used is not common, tidyr’s separate_rows() nicely detected them and separated into rows.

df %>% 
  separate_rows(country, lifeExp,convert=TRUE)

## # A tibble: 6 × 3
##   continent country   lifeExp
##   <chr>     <chr>       <int>
## 1 America   USA            70
## 2 America   Canada         72
## 3 Asia      Singapore      68
## 4 Eurpoe    England        75
## 5 Eurpoe    France         76
## 6 Eurpoe    Germany        77

With “sep” argument, we can specify a delimiter to separate into rows with separate_rows().

Exit mobile version