• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to select only numeric columns in a dataframe

rstats101 · May 12, 2022 ·

In this tutorial, we will learn how to select the columns that are numeric from a dataframe containing columns of different datatype. We will use dplyr’s select() function in combination with where() and is.numeric() functions to select the numeric columns.
How to select all numerical columns from a dataframe
How to select all numerical columns from a dataframe

Let us first load tidyverse and palmer penguin datasets to illustrate selecting numerical variables.
library(tidyverse)
library(palmerpenguins)

select() frunction we will be using is from dplyr package and here check the installed dplyr version.

packageVersion("dplyr")
## [1] '1.0.8'

By looking at the penguins data we can see we have factor variables, integer and double variables.

penguins

## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>

Select All Numerical Columns, without using the names

We can select all the numerical columns from the dataframe without actually specifying the names of the numerical columns by using their datatypes. is.numeric() function can tell us if the variable is numerical or not. is.numeric identifies both double and integer variables as numeric.
And where() function is a selection helper that selects the variables for which a function returns TRUE. In our example is.numeric() is true for numerical variables FALSE for others.

penguins %>%
  select(where(is.numeric))
## # A tibble: 344 × 5
##    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
##             <dbl>         <dbl>             <int>       <int> <int>
##  1           39.1          18.7               181        3750  2007
##  2           39.5          17.4               186        3800  2007
##  3           40.3          18                 195        3250  2007
##  4           NA            NA                  NA          NA  2007
##  5           36.7          19.3               193        3450  2007
##  6           39.3          20.6               190        3650  2007
##  7           38.9          17.8               181        3625  2007
##  8           39.2          19.6               195        4675  2007
##  9           34.1          18.1               193        3475  2007
## 10           42            20.2               190        4250  2007
## # … with 334 more rows

Get this error? “Predicate functions must be wrapped in `where()`”.

Often we might tend to forget using where() function when trying to select numerical columns like this.

penguins %>%
  select(is.numeric)

However, this would give us the following warning once advising us to use where() and give the same results for now.

## Warning: Predicate functions must be wrapped in `where()`.
## 
##   # Bad
##   data %>% select(is.numeric)
## 
##   # Good
##   data %>% select(where(is.numeric))
## 
## ℹ Please update your code.
## This message is displayed once per session.

Related

Filed Under: dplyr select() Tagged With: select numerical columns dplyr

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version