How to count number of missing values per row in a dataframe

In this tutorial, we will learn how to count the number missing values, NAs, in each row of a dataframe in R. We will see examples of counting NAs per row using four different approaches. For the first two solutions, we will use tidyverse function rowwise() from dplyr. The next two approaches to count NAs in each row, we will use the base R function rowSums() function in tidyverse format and use base R function apply().

Count the number of missing values per row
Count NAs per row

To get started let us load tidyverse the meta R package containing core tidyverse packages.

library(tidyverse)

Create dataframe with NAs using simulated data

First, we will create a data frame with NAs in each row. We use a vector with some NAs as basis to sample data for each row.

Our vector is of size 20 with 50% of the values are NAs

x  <- c(1:10, rep(NA,10)) 
x

[1]  1  2  3  4  5  6  7  8  9 10 NA NA NA NA NA NA NA NA NA NA

We will sample the vector with replacement for creating a toy dataframe with NAs. We can see that every row of the dataframe has missing values.

df <- tibble(C1= sample(x,5),
             C2= sample(x,5),
             C3= sample(x,5))
df

# A tibble: 5 × 3
     C1    C2    C3
  <int> <int> <int>
1    NA    NA    NA
2    NA    NA     1
3    NA    NA    NA
4     8    NA     6
5     9     3    NA

Counting missing values(NA) per row using dplyr rowwise() function

rowwise() function in dplyr lets you perform row-wise operations. By using rowwise() function, we are basically grouping by each row of the dataframe. After grouping by each row using rowwise(), we count the number of NAs in each row and add the count to each row using mutate() function.

Note that since our toy dataframe contains only numerical values, we select all colums in a row using across(everything()) function.

df %>% 
  rowwise() %>%
  mutate(n_NAs = sum(is.na(across(everything()))))

# A tibble: 5 × 4
# Rowwise: 
     C1    C2    C3 n_NAs
  <int> <int> <int> <int>
1    NA    NA    NA     3
2    NA    NA     1     2
3    NA    NA    NA     3
4     8    NA     6     1
5     9     3    NA     1

Counting missing values (NA) per row using dplyr rowwise() function with cur_data()

Another way to count the number of NAs in each row using rowwise() function is to use cur_data() function in dplyr which gives the current data for the current group/row.

We count the number NAs in each row and add it to the dataframe as a column.

df %>% 
  rowwise() %>%
  mutate(n_NAs = sum(is.na(cur_data())))

# A tibble: 5 × 4
# Rowwise: 
     C1    C2    C3 n_NAs
  <int> <int> <int> <int>
1    NA    NA    NA     3
2    NA    NA     1     2
3    NA    NA    NA     3
4     8    NA     6     1
5     9     3    NA     1

Counting missing values (NA) per row using rowSums() function in base R

We can use rowSums() function in base R in combination with mutate() function to count the number of missing values (NAs) in each row of a dataframe.

Here we access the values of current row using the “.” and add the count as a new column.

df %>% 
  mutate(n_NAs = rowSums(is.na(.)))

# A tibble: 5 × 4
     C1    C2    C3 n_NAs
  <int> <int> <int> <dbl>
1    NA    NA    NA     3
2    NA    NA     1     2
3    NA    NA    NA     3
4     8    NA     6     1
5     9     3    NA     1

Counting missing values (NA) per row using apply() function in base R

Another approach that uses base R solution is to use apply() function on each row and count the NAs as shown below.

df$n_NAs <- apply(df,1, function(x){sum(is.na(x))})
df

# A tibble: 5 × 4
     C1    C2    C3 n_NAs
  <int> <int> <int> <int>
1    NA    NA    NA     3
2    NA    NA     1     2
3    NA    NA    NA     3
4     8    NA     6     1
5     9     3    NA     1

Note the common feature of all four solutions to count NAs per each row avoids using an explicit for-loop.

Exit mobile version