tidyr’s separate_delim_wider(): Split a string into columns

In this tutorial, we will learn how to split a string in a column into multiple columns using a delimiter. We will use tidyr’s separate_delim_wider() function, that separates a string in a column into multiple columns using delimiter. After spliting the dataframe is wider with more columns. We will start with a simple example of spliting a column into two columns using delimiter and see multiple examples of spliting a string into columns.

library(tidyverse)

Let us create a toy dataframe with two columns. The second column of the dataframe contains full names and we would like to split it into two columns with last name and first name.

df1 <- tibble(id = 1:3, 
              Name = c("John-Nash", "Hadley-Wickahm",
                       "Richard-Feynman"))
df1

# A tibble: 3 × 2
     id Name           
  <int> <chr>          
1     1 John-Nash      
2     2 Hadley-Wickahm 
3     3 Richard-Feynman

split a string into two columns using delimiter

Note that the first and last names are separated by the delimiter “_”. We can use tidyr’s separate_wider_delim() function to separte the string by the delimiter into two columns. In the example below we also specify the names of the two new columns we created using names argument.

df1 %>%
  separate_wider_delim(Name, delim = "-", 
                       names = c("First_Name", "Last_Name"))

# A tibble: 3 × 3
     id First_Name Last_Name
  <int> <chr>      <chr>    
1     1 John       Nash     
2     2 Hadley     Wickahm  
3     3 Richard    Feynman  

The same approach can be used to split a string into multiple columns. In the example below we show how to use separate_wider_delim() to split a string into three columns by delimiter “_”.

df2 <- tibble(id = 1:3, 
              Name = c("John-Nash-Math", "Hadley-Wickahm-Stat", 
                       "Richard-Feynman-Physics"))
df2

# A tibble: 3 × 2
     id Name                   
  <int> <chr>                  
1     1 John-Nash-Math         
2     2 Hadley-Wickahm-Stat    
3     3 Richard-Feynman-Physics
df2 %>%
  separate_wider_delim(Name, 
                       delim = "-", 
                       names = c("First_Name", "Last_Name", "Field"))

# A tibble: 3 × 4
     id First_Name Last_Name Field  
  <int> <chr>      <chr>     <chr>  
1     1 John       Nash      Math   
2     2 Hadley     Wickahm   Stat   
3     3 Richard    Feynman   Physics

Sometimes you may want to automatically assign names of the new columns created after splitting a string. In the example below we use names_sep argument to create column names automatically. We can see that we have used original column name with automatic prefix separated by name separator we specified.

df2 %>%
  separate_wider_delim(Name, delim = "-",names_sep = "_")

# A tibble: 3 × 4
     id Name_1  Name_2  Name_3 
  <int> <chr>   <chr>   <chr>  
1     1 John    Nash    Math   
2     2 Hadley  Wickahm Stat   
3     3 Richard Feynman Physics

Here is an example, where we show how to deal with multiple delimiters. We first separate a string in a column using one delimiter and then by the second delimiter.

df3 <- tibble(id = 1:3,
              Name = c("John-Nash_Math", "Hadley-Wickahm_Stat", 
              "Richard-Feynman_Physics"))
df3 %>%
  separate_wider_delim(Name, 
                       delim = "-",
                       names=c("First_Name", "Name")) %>%
  separate_wider_delim(Name, 
                       delim = "_",
                       names=c("Last_Name", "Field"))
Exit mobile version