• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

tidyr unite(): combine multiple columns into one

rstats101 · January 20, 2023 ·

In this tutorial, we will learn how to use unite() function in tidyr package to combine multiple columns into a single column. By combining , we mean to concatenate the values of two or more columns separated by a delimiter like underscore. We will start with combining two columns into one column using three examples. And then we will show an example of combining more than two columns into a single column with two examples.

Let us first load the packages needed.

library(tidyvrerse)
library(palmerpenguins)
packageVersion("tidyr")

## [1] '1.2.0'

We will be. using palmer penguin data set to show how to use unite() function.

penguins %>% 
  head()
## # A tibble: 6 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## 6 Adelie  Torge…           39.3          20.6              190        3650 male 
## # … with 1 more variable: year <int>

Instead of using all the columns in penguins dataset, we select four columns for illustrating tidyr’s unite(). function combine multiple columns into a single column.

Below we create a new dataframe with just four columns.

df <-  penguins %>% 
  select(species, island, sex, body_mass_g) 

df %>%  head()

## # A tibble: 6 × 4
##   species island    sex    body_mass_g
##   <fct>   <fct>     <fct>        <int>
## 1 Adelie  Torgersen male          3750
## 2 Adelie  Torgersen female        3800
## 3 Adelie  Torgersen female        3250
## 4 Adelie  Torgersen <NA>            NA
## 5 Adelie  Torgersen female        3450
## 6 Adelie  Torgersen male          3650

Combine two columns into one column with tidyr’s unite().

To combine two columns into. a single column, we specify the new column name that will contain the combined columns first. And then specify the names of the columns to be combined.

In the example below, we are combining species and island columns and call the combined column as ‘species_island’.

df %>% 
  unite(col="species_island", 
        c(species, island)) %>%
  head()

We get a new dataframe with the new combined column. Note that by default unite() has used underscore to “_” combine the values of columns. And also by default, unite() removes the original two columns that we combined.


## # A tibble: 6 × 3
##   species_island   sex    body_mass_g
##   <chr>            <fct>        <int>
## 1 Adelie_Torgersen male          3750
## 2 Adelie_Torgersen female        3800
## 3 Adelie_Torgersen female        3250
## 4 Adelie_Torgersen <NA>            NA
## 5 Adelie_Torgersen female        3450
## 6 Adelie_Torgersen male          3650

Keep original columns while combining two columns into a single column

We can keep the original two columns we are combining using the argument ‘remove=FALSE’ with unite(). Now unite() function will not remove the original two columns.

df %>% 
  unite(col="species_island",
        c(species, island), 
        remove=FALSE) %>%
  head()

# # A tibble: 6 × 5
##   species_island   species island    sex    body_mass_g
##   <chr>            <fct>   <fct>     <fct>        <int>
## 1 Adelie_Torgersen Adelie  Torgersen male          3750
## 2 Adelie_Torgersen Adelie  Torgersen female        3800
## 3 Adelie_Torgersen Adelie  Torgersen female        3250
## 4 Adelie_Torgersen Adelie  Torgersen <NA>            NA
## 5 Adelie_Torgersen Adelie  Torgersen female        3450
## 6 Adelie_Torgersen Adelie  Torgersen male          3650

Combine two columns into a single column using a specific delimiter

We can also specify the delimiter to combine two columns instead of the default under score. In the example below we combine two columns with unite() function but with delimiter “–” set by sep=”–” argument.

df %>% 
  unite(col="species_island",
        c(species, island), 
        sep="--") %>%
  head()

## # A tibble: 6 × 3
##   species_island    sex    body_mass_g
##   <chr>             <fct>        <int>
## 1 Adelie--Torgersen male          3750
## 2 Adelie--Torgersen female        3800
## 3 Adelie--Torgersen female        3250
## 4 Adelie--Torgersen <NA>            NA
## 5 Adelie--Torgersen female        3450
## 6 Adelie--Torgersen male          3650

Combine more than two columns in to a single column

To. combine multiple columns, here 3 columns, we just need to specify the tidy select method to specify the columns. In the example below we give the three column names as a vector.

df %>% 
  unite(col="species_island_sex", 
        c(species, island, sex)) 

## # A tibble: 344 × 2
##    species_island_sex      body_mass_g
##    <chr>                         <int>
##  1 Adelie_Torgersen_male          3750
##  2 Adelie_Torgersen_female        3800
##  3 Adelie_Torgersen_female        3250
##  4 Adelie_Torgersen_NA              NA
##  5 Adelie_Torgersen_female        3450
##  6 Adelie_Torgersen_male          3650
##  7 Adelie_Torgersen_female        3625
##  8 Adelie_Torgersen_male          4675
##  9 Adelie_Torgersen_NA            3475
## 10 Adelie_Torgersen_NA            4250
## # … with 334 more rows

Dealing with missing values while Combining multiple columns in to a single column

If you see the output of previous example of combining three columns into one, we can see that NAs in the third column is represened as “NA” in the combined column. By default, unite() function does not remove NAs.

We can remove any missing value NA in one of the columns we are combining by specifying na.rm=TRUE. Here is an example of removing NAs.

Note that the fourth value of combined column does not have NA.

df %>% 
  unite(col="species_island_sex", 
        c(species, island, sex),
        na.rm=TRUE) %>%
  head()

## # A tibble: 6 × 2
##   species_island_sex      body_mass_g
##   <chr>                         <int>
## 1 Adelie_Torgersen_male          3750
## 2 Adelie_Torgersen_female        3800
## 3 Adelie_Torgersen_female        3250
## 4 Adelie_Torgersen                 NA
## 5 Adelie_Torgersen_female        3450
## 6 Adelie_Torgersen_male          3650

Related

Filed Under: tidyr, tidyr unite() Tagged With: combine multiple columns into one

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version