In this tutorial, we will learn how to create/add one or more columns to a dataframe using dplyr’s mutate() function. We will first see an example of creating a single new column in a dataframe and then see an example of adding multiple columns using mutate() function.
First we load tidyverse the suit of R packages.
library(tidyverse) packageVersion("dplyr") ## [1] '1.0.9'
We will use a toy dataframe to illustrate creating columns using dplyr mutate()
df <- tibble(species = c("Adelie", "Chinstrap", "Gentoo"), body_mass = c(3700, 3733, 5076), bill_length = c(39,49,48))
df ## # A tibble: 3 × 3 ## species body_mass bill_length ## <chr> <dbl> <dbl> ## 1 Adelie 3700 39 ## 2 Chinstrap 3733 49 ## 3 Gentoo 5076 48
Adding a new column to a dataframe with dplyr’s mutate()
We can create a new column and add it to the dataframe using dplyr’s mutate(). In this example, we create a new column from the existing column using mutate().
df %>% mutate(body_mass_kg = body_mass/1000) ## # A tibble: 3 × 4 ## species body_mass bill_length body_mass_kg ## <chr> <dbl> <dbl> <dbl> ## 1 Adelie 3700 39 3.7 ## 2 Chinstrap 3733 49 3.73 ## 3 Gentoo 5076 48 5.08
Here is a second example of creating a new column using dplyr’s mutate(), but this time the new column is not from one of the existing columns. Here we add an unique row number using dplyr’s row_number() function.
df %>% mutate(row_id = row_number()) ## # A tibble: 3 × 4 ## species body_mass bill_length row_id ## <chr> <dbl> <dbl> <int> ## 1 Adelie 3700 39 1 ## 2 Chinstrap 3733 49 2 ## 3 Gentoo 5076 48 3
Adding multiple columns to a dataframe with dplyr’s mutate()
With mutate() function we can create multiple columns and add them to the dataframe. In this example we create two new columns from two existing columns.
df %>% mutate(body_mass_kg = body_mass/1000) %>% mutate(bill_length_m = bill_length/1000) ## # A tibble: 3 × 5 ## species body_mass bill_length. body_mass_kg bill_length_m ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie 3700 39 3.7 0.039 ## 2 Chinstrap 3733 49 3.73 0.049 ## 3 Gentoo 5076 48 5.08 0.048
Note that in the above example, we create two new columns using two mutate() statement calls. However, we can combine them into a single mutate() function call and create multiple columns. In this example below, we create two columns using a single mutate() function.
df %>% mutate(body_mass_kg = body_mass/1000, bill_length_m = bill_length/1000) ## # A tibble: 3 × 5 ## species body_mass bill_length body_mass_kg bill_length_m ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie 3700 39 3.7 0.039 ## 2 Chinstrap 3733 49 3.73 0.049 ## 3 Gentoo 5076 48 5.08 0.048
We can also update or change the existing column while using mutate() function. For example, in the example below we created new columns, body mass in kg unit and bill length in meters, and changed the existing body mass and bill length columns.
df %>% mutate(body_mass = body_mass/1000, bill_length = bill_length/1000) ## # A tibble: 3 × 3 ## species body_mass bill_length ## <chr> <dbl> <dbl> ## 1 Adelie 3.7 0.039 ## 2 Chinstrap 3.73 0.049 ## 3 Gentoo 5.08 0.048
[…] dplyr’s mutate() and transmute() functions are similar in nature but with a big difference. As you can see transmute() function creates new columns and also deletes existing columns. However, dplyr’s mutate() function creates new columns and it keeps all the existing columns. […]