In this post we will learn how to replace NAs, i.e. missing values with zeros in data frame in R. With tidyr’s replace_na() function, we can replace NAs in specific columns of a dataframe to zero or any other specific value.
We will start with learning how to use replace_na() to replace NAs in a single column to zeros. And then we will see an example of replacing NAs with any specific value. Finally, we will see how to use replace_na() replace NAs in a vector.
Let us load tidyr that is part of tidyverse meta package. We also check the version of tidyr used in the example using packageVersion() function.
library(tidyverse) packageVersion("tidyr") ## [1] '1.2.0'
First, let us create a small data frame with two columns with missing values. We use sample() function to columns with random NAs.
set.seed(2022) df <- tibble(group=sample(c("A","B", NA_character_), size = 6, replace=TRUE ), count = sample(c(rep(NA, 4), 18:20), size=6, replace=TRUE))
Our dataframe with missing values or NAs look like this. The first column is of character type with two NAs and the second column is of integer type with three NAs.
df ## # A tibble: 6 × 2 ## group count ## <chr> <int> ## 1 <NA> 19 ## 2 B NA ## 3 <NA> NA ## 4 <NA> NA ## 5 B 19 ## 6 <NA> 20
Replacing NAs in a column with Zeros
In the first example we replace missing values in one column of a dataframe with zeros using tidyr’s replace_na() function using list() as argument.
df %>% replace_na(list(count=0)) ## # A tibble: 6 × 2 ## group count ## <chr> <int> ## 1 <NA> 19 ## 2 B 0 ## 3 <NA> 0 ## 4 <NA> 0 ## 5 B 19 ## 6 <NA> 20
Replacing NAs in a column with a specific value
Using tidyr’s replace_na() function, we can replace NAs in a column with any specific value. Here we use tidyr’s replace_na() function using list() as argument to replace NAs with -1.
df %>% replace_na(list(count = -1)) ## # A tibble: 6 × 2 ## group count ## <chr> <int> ## 1 <NA> 19 ## 2 B -1 ## 3 <NA> -1 ## 4 <NA> -1 ## 5 B 19 ## 6 <NA> 20
Replacing NAs in multiple columns with specific value for each column
We can use tidyr’s replace_na() function with list() argument to replace NAs in more than one columns with any specific values of interest. In the example below, we replace the first column’s NAs with one value and the second column’s NAs with another specific value using tidyr’s replace_na() function with list() as argument.
df %>% replace_na(list(group="unknown", count = -1)) ## # A tibble: 6 × 2 ## group count ## <chr> <int> ## 1 unknown 19 ## 2 B -1 ## 3 unknown -1 ## 4 unknown -1 ## 5 B 19 ## 6 unknown 20
tidyr’s replace_na() to replace NAs in a vector
To replace NAs in a vector with a zero or any other specific value we use replace_na() in combination with mutate() fucntion, but without list() as argument to replace_na() function.
df %>% mutate(count=replace_na(count,0)) ## # A tibble: 6 × 2 ## group count ## <chr> <int> ## 1 <NA> 19 ## 2 B 0 ## 3 <NA> 0 ## 4 <NA> 0 ## 5 B 19 ## 6 <NA> 20