• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to remove columns with all NAs

rstats101 · October 14, 2022 ·

In this tutorial, we will learn how to drop columns with values that are all NAs. We will use two approaches to remove columns with all NAs. First, we will use tidyverse approach, where we perform column-wise operation to see all values are NAs and select columns that are not all NAs. Next we will use base R approach by counting the number NAs per columns using apply() function and select columns that are not all NAs.

Remove columns with all NAs
Remove columns with all NAs

First let us load tidyverse meta package.

library(tidyverse)

Create a dataframe with a column of all NAs

To create a dataframe with missing values we use a vector with more missing values than non-missing values.

x  <- c(1:5,rep(NA,10)) 
x

[1]  1  2  3  4  5 NA NA NA NA NA NA NA NA NA NA

And create a toy dataframe with 3 columns and 5 rows with one of the columns is all NAs.

set.seed(2022)
df <- tibble(C1= sample(x,5),
             C2= sample(x,5),
             C3= sample(x,5))
df

# A tibble: 5 × 3
     C1    C2    C3
  <int> <int> <int>
1     4    NA    NA
2     3    NA    NA
3    NA    NA    NA
4    NA    NA     2
5    NA    NA    NA

Removing columns with all NAs with tidyverse

Using tidyverse approach we remove one or more columns with all NAs using select() function. Here instead of selecting columns by names, we select columns that are all NAs. We use an anonymous function to find if a column is all NAs.

df %>%
  select(where(function(x) any(!is.na(x))))

# A tibble: 5 × 2
     C1    C3
  <int> <int>
1     4    NA
2     3    NA
3    NA    NA
4    NA     2
5    NA    NA

In the above example we have one column with all NAs. Here is the second example where we remove multiple columns with all NAs

set.seed(2202)
x  <- c(1:3,rep(NA,10)) 
df2 <- tibble(C1= sample(x,5),
             C2= sample(x,5),
             C3= sample(x,5),
             C4= sample(x,5),
             C5= sample(x,5))
df2

# A tibble: 5 × 5
     C1    C2    C3    C4    C5
  <int> <int> <int> <int> <int>
1     3    NA    NA    NA    NA
2    NA    NA    NA    NA     3
3    NA    NA    NA    NA    NA
4    NA    NA    NA    NA     1
5    NA    NA     1    NA    NA

Our dataframe has two columns with all NAs.

df2 %>%
  select(where(function(x) any(!is.na(x))))

# A tibble: 5 × 3
     C1    C3    C5
  <int> <int> <int>
1     3    NA    NA
2    NA    NA     3
3    NA    NA    NA
4    NA    NA     1
5    NA     1    NA

Removing columns with all NAs use base R

To remove columns with all NAs using base R approach, we first compute the number of missing values per column using apply() function.

n_NAs <- apply(df, 2, function(x){sum(is.na(x))})
n_NAs

C1 C2 C3 
 3  5  4 

Then we select columns with fewer NAs by checking if the number of NAs is smaller than the number of rows.

df[ ,n_NAs < nrow(df)]  

# A tibble: 5 × 2
     C1    C3
  <int> <int>
1     4    NA
2     3    NA
3    NA    NA
4    NA     2
5    NA    NA

As before now we see an example of using base R approach to remove multiple columns with all NAs. In this example, we use the dataframe with two columns of all NAs and remove them both using base R approach.

n_NAs <- apply(df2,  2, 
               function(x){sum(is.na(x))})
n_NAs

C1 C2 C3 C4 C5 
 4  5  4  5  3 
df2[ ,n_NAs < nrow(df2)]  

# A tibble: 5 × 3
     C1    C3    C5
  <int> <int> <int>
1     3    NA    NA
2    NA    NA     3
3    NA    NA    NA
4    NA    NA     1
5    NA     1    NA

Related

Filed Under: apply(), dplyr select() Tagged With: remove columns with all NAs

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version