• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

Remove rows with missing values using drop_na() in R

rstats101 · September 16, 2021 ·

In this tutorial we will learn how to remove rows containing missing values using drop_na() function available in tidyr package in R. drop_na() available in tidyverse is a versatile function. First we will see an example of removing all rows with at least one missing values using drop_na() and then we can selectively inspect a specific column and remove rows with missing values based on that select column.
tidyr drop_na(): remove rows with missing values
tidyr drop_na(): remove rows with missing values

First, let us load tidyverse suite of R packages that include tidyr.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Then, let us create a sample data frame with missing values in multiple columns using tibble() function available in tidyverse.

df <- tibble(col1 = letters[1:5], 
             col2 = c(10,20,NA,40,50), 
             col3 = c(10,NA,30,40,NA), 
             col4 = c(1:4,NA))

Our example dataframe contains four missing values denoted as NA in three columns and three rows.

 
## # A tibble: 5 x 4
##   col1   col2  col3  col4
##   <chr> <dbl> <dbl> <int>
## 1 a        10    10     1
## 2 b        20    NA     2
## 3 c        NA    30     3
## 4 d        40    40     4
## 5 e        50    NA    NA

We can use tidyr’s drop_na() function to drop all rows with missing values. And we get a resulting dataframe containing two rows with no missing values.

df %>% 
  tidyr::drop_na()

## # A tibble: 2 x 4
##   col1   col2  col3  col4
##   <chr> <dbl> <dbl> <int>
## 1 a        10    10     1
## 2 d        40    40     4

In the above example, we used magritter’s pipe operator %>% to feed the dataframe to drop_na() function. We can also proved the data frame as argument to drop_na() function to get the same results.

tidyr::drop_na(df)

## # A tibble: 2 x 4
##   col1   col2  col3  col4
##   <chr> <dbl> <dbl> <int>
## 1 a        10    10     1
## 2 d        40    40     4

If we have loaded the tidyr package, we can directly use the function drop_na() without mentioning the package name in the beginning.

df %>% 
   drop_na()

## # A tibble: 2 x 4
##   col1   col2  col3  col4
##   <chr> <dbl> <dbl> <int>
## 1 a        10    10     1
## 2 d        40    40     4

Remove rows based a column’s missing values using drop_na() in R

By default, drop_na() function removes all rows with NAs. Some times you might want to remove rows based on a column’s missing values.

tidyr’s drop_na() can take one or more columns as input and drop missing values in the specified column. For example, here we have removed rows based on third column’s missing value. Note that the resulting dataframe still have missing value on second row from the second column.

df %>% 
  drop_na(col3)

## # A tibble: 3 x 4
##   col1   col2  col3  col4
##   <chr> <dbl> <dbl> <int>
## 1 a        10    10     1
## 2 c        NA    30     3
## 3 d        40    40     4

There is always more than one solutions to a problem. We can also remove rows with missing values using base R function na.omit() available in stats package part of base R.

Check this post to learn how to use na.omit() to remove rows with missing values in a data frame or a matrix.

Related

Filed Under: drop_na R Tagged With: remove missing value drop_na()

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version