• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to replace NAs with zero in a dataframe

rstats101 · February 17, 2023 ·

In this post we will learn how to replace NAs, i.e. missing values with zeros in data frame in R. With tidyr’s replace_na() function, we can replace NAs in specific columns of a dataframe to zero or any other specific value.

We will start with learning how to use replace_na() to replace NAs in a single column to zeros. And then we will see an example of replacing NAs with any specific value. Finally, we will see how to use replace_na() replace NAs in a vector.

Let us load tidyr that is part of tidyverse meta package. We also check the version of tidyr used in the example using packageVersion() function.

library(tidyverse)
packageVersion("tidyr")

## [1] '1.2.0'

First, let us create a small data frame with two columns with missing values. We use sample() function to columns with random NAs.

set.seed(2022)
df <- tibble(group=sample(c("A","B", NA_character_),
                          size = 6,
                          replace=TRUE ),
             count = sample(c(rep(NA, 4), 18:20),
                            size=6,
                            replace=TRUE))

Our dataframe with missing values or NAs look like this. The first column is of character type with two NAs and the second column is of integer type with three NAs.

df
## # A tibble: 6 × 2
##   group count
##   <chr> <int>
## 1 <NA>     19
## 2 B        NA
## 3 <NA>     NA
## 4 <NA>     NA
## 5 B        19
## 6 <NA>     20

Replacing NAs in a column with Zeros

In the first example we replace missing values in one column of a dataframe with zeros using tidyr’s replace_na() function using list() as argument.

df %>%
  replace_na(list(count=0))

## # A tibble: 6 × 2
##   group count
##   <chr> <int>
## 1 <NA>     19
## 2 B         0
## 3 <NA>      0
## 4 <NA>      0
## 5 B        19
## 6 <NA>     20

Replacing NAs in a column with a specific value

Using tidyr’s replace_na() function, we can replace NAs in a column with any specific value. Here we use tidyr’s replace_na() function using list() as argument to replace NAs with -1.

df %>%
  replace_na(list(count = -1))


## # A tibble: 6 × 2
##   group count
##   <chr> <int>
## 1 <NA>     19
## 2 B        -1
## 3 <NA>     -1
## 4 <NA>     -1
## 5 B        19
## 6 <NA>     20

Replacing NAs in multiple columns with specific value for each column

We can use tidyr’s replace_na() function with list() argument to replace NAs in more than one columns with any specific values of interest. In the example below, we replace the first column’s NAs with one value and the second column’s NAs with another specific value using tidyr’s replace_na() function with list() as argument.

df %>%
  replace_na(list(group="unknown",
                  count = -1))
## # A tibble: 6 × 2
##   group   count
##   <chr>   <int>
## 1 unknown    19
## 2 B          -1
## 3 unknown    -1
## 4 unknown    -1
## 5 B          19
## 6 unknown    20

tidyr’s replace_na() to replace NAs in a vector

To replace NAs in a vector with a zero or any other specific value we use replace_na() in combination with mutate() fucntion, but without list() as argument to replace_na() function.

df %>%
  mutate(count=replace_na(count,0))

## # A tibble: 6 × 2
##   group count
##   <chr> <int>
## 1 <NA>     19
## 2 B         0
## 3 <NA>      0
## 4 <NA>      0
## 5 B        19
## 6 <NA>     20

Related

Filed Under: rstats101, tidyr replace_na() Tagged With: replace NAs with specific values, replace NAs with zeros

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version