In this tutorial, we will learn how to join dataframes by columns with different names in the two dataframes.
dplyr offers a great set of tools to join or merge dataframes. We will use two approaches to merge two dataframe by different names.
We will start with loading tidyverse.
library(tidyverse)
Then we will create two toy dataframes such that the columns we would like to join them have different names.
Our first toy dataframe has two columns.
penguins1 <- tibble(species = c("Gentoo","Adelie", "Chinstrap"), sex = c("male", "female", "male"))
penguins1 ## # A tibble: 3 × 2 ## species sex ## <chr> <chr> ## 1 Gentoo male ## 2 Adelie female ## 3 Chinstrap male
And the toy dataframe has two columns. And the first column in the second dataframe refers to the same variable in the first dataframe, but with different name.
penguins2 <- tibble(penguin = c("Gentoo","Adelie", "Chinstrap"), body_mass = c(5700, 4500, 3900))
penguins2 ## # A tibble: 3 × 2 ## penguin body_mass ## <chr> <dbl> ## 1 Gentoo 5700 ## 2 Adelie 4500 ## 3 Chinstrap 3900
Joining dataframes with different column names with “by =c()”
One of the ways to join or merge two dataframes using columns that have different names is to use by argument with c(). Within c(), we specify the name in the first dataframe equals the name in the second dataframe. Note that the column names are provided as a string, i.e. within double quotes.
penguins1 %>% left_join(penguins2, by = c("species"="penguin")) ## # A tibble: 3 × 3 ## species sex body_mass ## <chr> <chr> <dbl> ## 1 Gentoo male 5700 ## 2 Adelie female 4500 ## 3 Chinstrap male 3900
Joining dataframes with different column names using “join_by”
Another way to join or merge two dataframes using columns that have different names is to use join_by() by. Within join_by(), we specify the name in the first dataframe and the name in the second dataframe with equality == symbol. Note that the column names are provided as a variable i.e. without double quotes.
penguins1 %>% full_join(penguins2, by = join_by(species==penguin)) ## # A tibble: 3 × 3 ## species sex body_mass ## <chr> <chr> <dbl> ## 1 Gentoo male 5700 ## 2 Adelie female 4500 ## 3 Chinstrap male 3900
In the above examples we showed how to join dataframes by different column names using left_join()/full_join(). We could also use other joins in dplyr with different names.
[…] are joining. Often you might want to inner join two dataframes with two different column names. See this post to learn how to join two dataframes with different column […]