Remove rows with missing values using na.omit() in R

Dealing with missing values is a common activity while doing data cleaning and analysis. In R missing values typically represented as NA. Often you might want to remove rows containing missing values in a dataframe or a matrix. In this tutorial we will learn how to remove rows containing missing values using na.omit() function available in stats package in base R.

Remove Rows with missing values using na.omit()
Remove Rows with missing values in R

First we will learn how to remove rows with missing values in a dataframe and then we will learn how to use na.omit() function to remove rows with NA in a matrix.

Create Data with missing values

Let us create a sample dataframe with some missing values. We will use data.frame() function available in base R to create a simple dataframe from scratch.

df <- data.frame(col1 = letters[1:5], 
                 col2 = c(1,2,NA,4,5), 
                 col3 = c(1:4,NA), 
                 col4 = 1:5)

In this example we have created a data frame with two rows containing missing values NA.

df
##   col1 col2 col3 col4
## 1    a    1    1    1
## 2    b    2    2    2
## 3    c   NA    3    3
## 4    d    4    4    4
## 5    e    5   NA    5

Removing rows with missing values in a data frame

We can remove rows containing one or more missing values NA using na.omit() function in R. By using na.omit() function on the data frame, we get a new dataframe with three rows after removing the two rows with missing values.

na.omit(df)

##   col1 col2 col3 col4
## 1    a    1    1    1
## 2    b    2    2    2
## 4    d    4    4    4

Removing rows with missing values in a matrix

na.omit() in R can also be used to remove rows containing missing values NA from a matrix object. Here we create a matrix using the numerical columns of the above dataframe

data_matrix <- as.matrix(df[,2:4])
data_matrix
##      col2 col3 col4
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]   NA    3    3
## [4,]    4    4    4
## [5,]    5   NA    5

Our matrix has three columns and five rows, but two of the rows have missing values NA. By applying na.omit() on the matrix we will get a new matrix with no missing values in any of the rows. Basically na.omit() function, removes the two rows containing missing values.

na.omit(data_matrix)
##      col2 col3 col4
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    4    4    4
## attr(,"na.action")
## [1] 3 5
## attr(,"class")
## [1] "omit"


	
Exit mobile version