First let us load tidyverse meta package.
library(tidyverse)
Create a dataframe with a column of all NAs
To create a dataframe with missing values we use a vector with more missing values than non-missing values.
x <- c(1:5,rep(NA,10)) x [1] 1 2 3 4 5 NA NA NA NA NA NA NA NA NA NA
And create a toy dataframe with 3 columns and 5 rows with one of the columns is all NAs.
set.seed(2022) df <- tibble(C1= sample(x,5), C2= sample(x,5), C3= sample(x,5)) df # A tibble: 5 × 3 C1 C2 C3 <int> <int> <int> 1 4 NA NA 2 3 NA NA 3 NA NA NA 4 NA NA 2 5 NA NA NA
Removing columns with all NAs with tidyverse
Using tidyverse approach we remove one or more columns with all NAs using select() function. Here instead of selecting columns by names, we select columns that are all NAs. We use an anonymous function to find if a column is all NAs.
df %>% select(where(function(x) any(!is.na(x)))) # A tibble: 5 × 2 C1 C3 <int> <int> 1 4 NA 2 3 NA 3 NA NA 4 NA 2 5 NA NA
In the above example we have one column with all NAs. Here is the second example where we remove multiple columns with all NAs
set.seed(2202) x <- c(1:3,rep(NA,10)) df2 <- tibble(C1= sample(x,5), C2= sample(x,5), C3= sample(x,5), C4= sample(x,5), C5= sample(x,5)) df2 # A tibble: 5 × 5 C1 C2 C3 C4 C5 <int> <int> <int> <int> <int> 1 3 NA NA NA NA 2 NA NA NA NA 3 3 NA NA NA NA NA 4 NA NA NA NA 1 5 NA NA 1 NA NA
Our dataframe has two columns with all NAs.
df2 %>% select(where(function(x) any(!is.na(x)))) # A tibble: 5 × 3 C1 C3 C5 <int> <int> <int> 1 3 NA NA 2 NA NA 3 3 NA NA NA 4 NA NA 1 5 NA 1 NA
Removing columns with all NAs use base R
To remove columns with all NAs using base R approach, we first compute the number of missing values per column using apply() function.
n_NAs <- apply(df, 2, function(x){sum(is.na(x))}) n_NAs C1 C2 C3 3 5 4
Then we select columns with fewer NAs by checking if the number of NAs is smaller than the number of rows.
df[ ,n_NAs < nrow(df)] # A tibble: 5 × 2 C1 C3 <int> <int> 1 4 NA 2 3 NA 3 NA NA 4 NA 2 5 NA NA
As before now we see an example of using base R approach to remove multiple columns with all NAs. In this example, we use the dataframe with two columns of all NAs and remove them both using base R approach.
n_NAs <- apply(df2, 2, function(x){sum(is.na(x))}) n_NAs C1 C2 C3 C4 C5 4 5 4 5 3
df2[ ,n_NAs < nrow(df2)] # A tibble: 5 × 3 C1 C3 C5 <int> <int> <int> 1 3 NA NA 2 NA NA 3 3 NA NA NA 4 NA NA 1 5 NA 1 NA