tidyr unite(): combine multiple columns into one

In this tutorial, we will learn how to use unite() function in tidyr package to combine multiple columns into a single column. By combining , we mean to concatenate the values of two or more columns separated by a delimiter like underscore. We will start with combining two columns into one column using three examples. And then we will show an example of combining more than two columns into a single column with two examples.

Let us first load the packages needed.

library(tidyvrerse)
library(palmerpenguins)
packageVersion("tidyr")

## [1] '1.2.0'

We will be. using palmer penguin data set to show how to use unite() function.

penguins %>% 
  head()
## # A tibble: 6 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## 6 Adelie  Torge…           39.3          20.6              190        3650 male 
## # … with 1 more variable: year <int>

Instead of using all the columns in penguins dataset, we select four columns for illustrating tidyr’s unite(). function combine multiple columns into a single column.

Below we create a new dataframe with just four columns.

df <-  penguins %>% 
  select(species, island, sex, body_mass_g) 

df %>%  head()

## # A tibble: 6 × 4
##   species island    sex    body_mass_g
##   <fct>   <fct>     <fct>        <int>
## 1 Adelie  Torgersen male          3750
## 2 Adelie  Torgersen female        3800
## 3 Adelie  Torgersen female        3250
## 4 Adelie  Torgersen <NA>            NA
## 5 Adelie  Torgersen female        3450
## 6 Adelie  Torgersen male          3650

Combine two columns into one column with tidyr’s unite().

To combine two columns into. a single column, we specify the new column name that will contain the combined columns first. And then specify the names of the columns to be combined.

In the example below, we are combining species and island columns and call the combined column as ‘species_island’.

df %>% 
  unite(col="species_island", 
        c(species, island)) %>%
  head()

We get a new dataframe with the new combined column. Note that by default unite() has used underscore to “_” combine the values of columns. And also by default, unite() removes the original two columns that we combined.


## # A tibble: 6 × 3
##   species_island   sex    body_mass_g
##   <chr>            <fct>        <int>
## 1 Adelie_Torgersen male          3750
## 2 Adelie_Torgersen female        3800
## 3 Adelie_Torgersen female        3250
## 4 Adelie_Torgersen <NA>            NA
## 5 Adelie_Torgersen female        3450
## 6 Adelie_Torgersen male          3650

Keep original columns while combining two columns into a single column

We can keep the original two columns we are combining using the argument ‘remove=FALSE’ with unite(). Now unite() function will not remove the original two columns.

df %>% 
  unite(col="species_island",
        c(species, island), 
        remove=FALSE) %>%
  head()

# # A tibble: 6 × 5
##   species_island   species island    sex    body_mass_g
##   <chr>            <fct>   <fct>     <fct>        <int>
## 1 Adelie_Torgersen Adelie  Torgersen male          3750
## 2 Adelie_Torgersen Adelie  Torgersen female        3800
## 3 Adelie_Torgersen Adelie  Torgersen female        3250
## 4 Adelie_Torgersen Adelie  Torgersen <NA>            NA
## 5 Adelie_Torgersen Adelie  Torgersen female        3450
## 6 Adelie_Torgersen Adelie  Torgersen male          3650

Combine two columns into a single column using a specific delimiter

We can also specify the delimiter to combine two columns instead of the default under score. In the example below we combine two columns with unite() function but with delimiter “–” set by sep=”–” argument.

df %>% 
  unite(col="species_island",
        c(species, island), 
        sep="--") %>%
  head()

## # A tibble: 6 × 3
##   species_island    sex    body_mass_g
##   <chr>             <fct>        <int>
## 1 Adelie--Torgersen male          3750
## 2 Adelie--Torgersen female        3800
## 3 Adelie--Torgersen female        3250
## 4 Adelie--Torgersen <NA>            NA
## 5 Adelie--Torgersen female        3450
## 6 Adelie--Torgersen male          3650

Combine more than two columns in to a single column

To. combine multiple columns, here 3 columns, we just need to specify the tidy select method to specify the columns. In the example below we give the three column names as a vector.

df %>% 
  unite(col="species_island_sex", 
        c(species, island, sex)) 

## # A tibble: 344 × 2
##    species_island_sex      body_mass_g
##    <chr>                         <int>
##  1 Adelie_Torgersen_male          3750
##  2 Adelie_Torgersen_female        3800
##  3 Adelie_Torgersen_female        3250
##  4 Adelie_Torgersen_NA              NA
##  5 Adelie_Torgersen_female        3450
##  6 Adelie_Torgersen_male          3650
##  7 Adelie_Torgersen_female        3625
##  8 Adelie_Torgersen_male          4675
##  9 Adelie_Torgersen_NA            3475
## 10 Adelie_Torgersen_NA            4250
## # … with 334 more rows

Dealing with missing values while Combining multiple columns in to a single column

If you see the output of previous example of combining three columns into one, we can see that NAs in the third column is represened as “NA” in the combined column. By default, unite() function does not remove NAs.

We can remove any missing value NA in one of the columns we are combining by specifying na.rm=TRUE. Here is an example of removing NAs.

Note that the fourth value of combined column does not have NA.

df %>% 
  unite(col="species_island_sex", 
        c(species, island, sex),
        na.rm=TRUE) %>%
  head()

## # A tibble: 6 × 2
##   species_island_sex      body_mass_g
##   <chr>                         <int>
## 1 Adelie_Torgersen_male          3750
## 2 Adelie_Torgersen_female        3800
## 3 Adelie_Torgersen_female        3250
## 4 Adelie_Torgersen                 NA
## 5 Adelie_Torgersen_female        3450
## 6 Adelie_Torgersen_male          3650
Exit mobile version