In this tutorial, we will learn how to split a string in a column into multiple columns using a delimiter. We will use tidyr’s separate_delim_wider() function, that separates a string in a column into multiple columns using delimiter. After spliting the dataframe is wider with more columns. We will start with a simple example of spliting a column into two columns using delimiter and see multiple examples of spliting a string into columns.
library(tidyverse)
Let us create a toy dataframe with two columns. The second column of the dataframe contains full names and we would like to split it into two columns with last name and first name.
df1 <- tibble(id = 1:3, Name = c("John-Nash", "Hadley-Wickahm", "Richard-Feynman")) df1 # A tibble: 3 × 2 id Name <int> <chr> 1 1 John-Nash 2 2 Hadley-Wickahm 3 3 Richard-Feynman
split a string into two columns using delimiter
Note that the first and last names are separated by the delimiter “_”. We can use tidyr’s separate_wider_delim() function to separte the string by the delimiter into two columns. In the example below we also specify the names of the two new columns we created using names argument.
df1 %>% separate_wider_delim(Name, delim = "-", names = c("First_Name", "Last_Name")) # A tibble: 3 × 3 id First_Name Last_Name <int> <chr> <chr> 1 1 John Nash 2 2 Hadley Wickahm 3 3 Richard Feynman
The same approach can be used to split a string into multiple columns. In the example below we show how to use separate_wider_delim() to split a string into three columns by delimiter “_”.
df2 <- tibble(id = 1:3, Name = c("John-Nash-Math", "Hadley-Wickahm-Stat", "Richard-Feynman-Physics")) df2 # A tibble: 3 × 2 id Name <int> <chr> 1 1 John-Nash-Math 2 2 Hadley-Wickahm-Stat 3 3 Richard-Feynman-Physics
df2 %>% separate_wider_delim(Name, delim = "-", names = c("First_Name", "Last_Name", "Field")) # A tibble: 3 × 4 id First_Name Last_Name Field <int> <chr> <chr> <chr> 1 1 John Nash Math 2 2 Hadley Wickahm Stat 3 3 Richard Feynman Physics
Sometimes you may want to automatically assign names of the new columns created after splitting a string. In the example below we use names_sep argument to create column names automatically. We can see that we have used original column name with automatic prefix separated by name separator we specified.
df2 %>% separate_wider_delim(Name, delim = "-",names_sep = "_") # A tibble: 3 × 4 id Name_1 Name_2 Name_3 <int> <chr> <chr> <chr> 1 1 John Nash Math 2 2 Hadley Wickahm Stat 3 3 Richard Feynman Physics
Here is an example, where we show how to deal with multiple delimiters. We first separate a string in a column using one delimiter and then by the second delimiter.
df3 <- tibble(id = 1:3, Name = c("John-Nash_Math", "Hadley-Wickahm_Stat", "Richard-Feynman_Physics")) df3 %>% separate_wider_delim(Name, delim = "-", names=c("First_Name", "Name")) %>% separate_wider_delim(Name, delim = "_", names=c("Last_Name", "Field"))