dplyr’s mutate(): How to create new columns

In this tutorial, we will learn how to create/add one or more columns to a dataframe using dplyr’s mutate() function. We will first see an example of creating a single new column in a dataframe and then see an example of adding multiple columns using mutate() function.

First we load tidyverse the suit of R packages.

library(tidyverse)
packageVersion("dplyr")

## [1] '1.0.9'

We will use a toy dataframe to illustrate creating columns using dplyr mutate()

df <- tibble(species = c("Adelie", "Chinstrap", "Gentoo"),
             body_mass = c(3700, 3733, 5076),
             bill_length = c(39,49,48))
df

## # A tibble: 3 × 3
##   species   body_mass  bill_length
##   <chr>           <dbl>          <dbl>
## 1 Adelie           3700             39
## 2 Chinstrap        3733             49
## 3 Gentoo           5076             48

Adding a new column to a dataframe with dplyr’s mutate()

We can create a new column and add it to the dataframe using dplyr’s mutate(). In this example, we create a new column from the existing column using mutate().

df %>%
  mutate(body_mass_kg = body_mass/1000)

## # A tibble: 3 × 4
##   species   body_mass  bill_length  body_mass_kg
##   <chr>           <dbl>          <dbl>        <dbl>
## 1 Adelie           3700             39         3.7 
## 2 Chinstrap        3733             49         3.73
## 3 Gentoo           5076             48         5.08

Here is a second example of creating a new column using dplyr’s mutate(), but this time the new column is not from one of the existing columns. Here we add an unique row number using dplyr’s row_number() function.

df %>%
 mutate(row_id = row_number())

## # A tibble: 3 × 4
##   species   body_mass bill_length row_id
##   <chr>         <dbl>       <dbl>  <int>
## 1 Adelie         3700          39      1
## 2 Chinstrap      3733          49      2
## 3 Gentoo         5076          48      3

Adding multiple columns to a dataframe with dplyr’s mutate()

With mutate() function we can create multiple columns and add them to the dataframe. In this example we create two new columns from two existing columns.

df %>%
  mutate(body_mass_kg = body_mass/1000) %>%
  mutate(bill_length_m = bill_length/1000)

## # A tibble: 3 × 5
##   species   body_mass  bill_length.  body_mass_kg bill_length_m
##   <chr>           <dbl>          <dbl>        <dbl>         <dbl>
## 1 Adelie           3700             39         3.7          0.039
## 2 Chinstrap        3733             49         3.73         0.049
## 3 Gentoo           5076             48         5.08         0.048

Note that in the above example, we create two new columns using two mutate() statement calls. However, we can combine them into a single mutate() function call and create multiple columns. In this example below, we create two columns using a single mutate() function.

df %>%
  mutate(body_mass_kg = body_mass/1000,
         bill_length_m = bill_length/1000)

## # A tibble: 3 × 5
##   species   body_mass bill_length body_mass_kg bill_length_m
##   <chr>         <dbl>       <dbl>        <dbl>         <dbl>
## 1 Adelie         3700          39         3.7          0.039
## 2 Chinstrap      3733          49         3.73         0.049
## 3 Gentoo         5076          48         5.08         0.048

We can also update or change the existing column while using mutate() function. For example, in the example below we created new columns, body mass in kg unit and bill length in meters, and changed the existing body mass and bill length columns.

df %>%
  mutate(body_mass = body_mass/1000,
         bill_length = bill_length/1000)
## # A tibble: 3 × 3
##   species   body_mass bill_length
##   <chr>         <dbl>       <dbl>
## 1 Adelie         3.7        0.039
## 2 Chinstrap      3.73       0.049
## 3 Gentoo         5.08       0.048
Exit mobile version