Join dataframes by different column names with dplyr

Join dataframes by different column names with dplyr
Join dataframes by different column names with dplyr

In this tutorial, we will learn how to join dataframes by columns with different names in the two dataframes.

dplyr offers a great set of tools to join or merge dataframes. We will use two approaches to merge two dataframe by different names.

Join dataframes by different column names with dplyr

We will start with loading tidyverse.

library(tidyverse)

Then we will create two toy dataframes such that the columns we would like to join them have different names.

Our first toy dataframe has two columns.

penguins1 <- tibble(species = c("Gentoo","Adelie", "Chinstrap"),
                    sex = c("male", "female", "male"))
penguins1

## # A tibble: 3 × 2
##   species   sex   
##   <chr>     <chr> 
## 1 Gentoo    male  
## 2 Adelie    female
## 3 Chinstrap male

And the toy dataframe has two columns. And the first column in the second dataframe refers to the same variable in the first dataframe, but with different name.

penguins2 <- tibble(penguin = c("Gentoo","Adelie", "Chinstrap"),
                    body_mass = c(5700, 4500, 3900))
penguins2

## # A tibble: 3 × 2
##   penguin   body_mass
##   <chr>         <dbl>
## 1 Gentoo         5700
## 2 Adelie         4500
## 3 Chinstrap      3900

Joining dataframes with different column names with “by =c()”

One of the ways to join or merge two dataframes using columns that have different names is to use by argument with c(). Within c(), we specify the name in the first dataframe equals the name in the second dataframe. Note that the column names are provided as a string, i.e. within double quotes.

penguins1 %>%
  left_join(penguins2, by = c("species"="penguin"))

## # A tibble: 3 × 3
##   species   sex    body_mass
##   <chr>     <chr>      <dbl>
## 1 Gentoo    male        5700
## 2 Adelie    female      4500
## 3 Chinstrap male        3900

Joining dataframes with different column names using “join_by”

Another way to join or merge two dataframes using columns that have different names is to use join_by() by. Within join_by(), we specify the name in the first dataframe and the name in the second dataframe with equality == symbol. Note that the column names are provided as a variable i.e. without double quotes.

penguins1 %>%
  full_join(penguins2, by = join_by(species==penguin))

## # A tibble: 3 × 3
##   species   sex    body_mass
##   <chr>     <chr>      <dbl>
## 1 Gentoo    male        5700
## 2 Adelie    female      4500
## 3 Chinstrap male        3900

In the above examples we showed how to join dataframes by different column names using left_join()/full_join(). We could also use other joins in dplyr with different names.

1 comment

Comments are closed.

Exit mobile version