In this tutorial we will learn how to replace missing values/NA in a column with a specific value. We will replace NA in a column using two approaches. At first we will use dplyr function mutate and ifelse to identify the element(s) with NA and replace with a specific value. Next we will use base R approach to replace missing value(s) in a column.
To get started let us load the packages needed.
1 | library (tidyverse) |
And also create a simple dataframe from scratch. Here create tibble, tidyverse variant of data frame using tribble() function.
1 2 3 4 5 6 | sales <- tibble:: tribble ( ~quarter, ~year, ~sales, "Q1" , 2000, 6603, "Q2" , 2000, 7182, "Q3" , 2000, 8175, NA , 2000, 9001) |
The dataframe has three columns and the first column has one missing value, NA, in the last row.
1 2 3 4 5 6 7 8 9 | sales ## # A tibble: 4 × 3 ## quarter year sales ## <chr> <dbl> <dbl> ## 1 Q1 2000 6603 ## 2 Q2 2000 7182 ## 3 Q3 2000 8175 ## 4 <NA> 2000 9001 |
Replace NA in column with a specific value using tidyverse
Let us say we want to replace the missing value with a specific value “Q4”, we can use mutate() function to update the column with a new one. We use ifelse() function identify missing value element and replace it with the value we want.
1 2 3 4 5 6 7 8 9 | sales %>% mutate (quarter= ifelse ( is.na (quarter), "Q4" ,quarter)) ## # A tibble: 4 × 3 ## quarter year sales ## <chr> <dbl> <dbl> ## 1 Q1 2000 6603 ## 2 Q2 2000 7182 ## 3 Q3 2000 8175 ## 4 Q4 2000 9001 |
Replace NA in column with a specific value using base R
If were to use base R function to replace missing value in a column, we will first identify the index where there is NA in the column using is.na() function and assign the value of interest as shown below.
1 2 3 4 5 6 7 8 9 10 | sales$quarter[ is.na (sales$quarter)] <- "Q4" sales ## # A tibble: 4 × 3 ## quarter year sales ## <chr> <dbl> <dbl> ## 1 Q1 2000 6603 ## 2 Q2 2000 7182 ## 3 Q3 2000 8175 ## 4 Q4 2000 9001 |