Dealing with missing values is a common activity while doing data cleaning and analysis. In R missing values typically represented as NA. Often you might want to remove rows containing missing values in a dataframe or a matrix. In this tutorial we will learn how to remove rows containing missing values using na.omit() function available in stats package in base R.
First we will learn how to remove rows with missing values in a dataframe and then we will learn how to use na.omit() function to remove rows with NA in a matrix.
Create Data with missing values
Let us create a sample dataframe with some missing values. We will use data.frame() function available in base R to create a simple dataframe from scratch.
df <- data.frame(col1 = letters[1:5], col2 = c(1,2,NA,4,5), col3 = c(1:4,NA), col4 = 1:5)
In this example we have created a data frame with two rows containing missing values NA.
df ## col1 col2 col3 col4 ## 1 a 1 1 1 ## 2 b 2 2 2 ## 3 c NA 3 3 ## 4 d 4 4 4 ## 5 e 5 NA 5
Removing rows with missing values in a data frame
We can remove rows containing one or more missing values NA using na.omit() function in R. By using na.omit() function on the data frame, we get a new dataframe with three rows after removing the two rows with missing values.
na.omit(df) ## col1 col2 col3 col4 ## 1 a 1 1 1 ## 2 b 2 2 2 ## 4 d 4 4 4
Removing rows with missing values in a matrix
na.omit() in R can also be used to remove rows containing missing values NA from a matrix object. Here we create a matrix using the numerical columns of the above dataframe
data_matrix <- as.matrix(df[,2:4]) data_matrix ## col2 col3 col4 ## [1,] 1 1 1 ## [2,] 2 2 2 ## [3,] NA 3 3 ## [4,] 4 4 4 ## [5,] 5 NA 5
Our matrix has three columns and five rows, but two of the rows have missing values NA. By applying na.omit() on the matrix we will get a new matrix with no missing values in any of the rows. Basically na.omit() function, removes the two rows containing missing values.
na.omit(data_matrix) ## col2 col3 col4 ## [1,] 1 1 1 ## [2,] 2 2 2 ## [3,] 4 4 4 ## attr(,"na.action") ## [1] 3 5 ## attr(,"class") ## [1] "omit"
[…] missing values using base R function na.omit() available in stats package part of base R. Check this post to learn how to use na.omit() to remove rows with missing values in a data frame or a […]