In this tutorial, we will learn how to select the columns that are numeric from a dataframe containing columns of different datatype. We will use dplyr’s select() function in combination with where() and is.numeric() functions to select the numeric columns.
Let us first load tidyverse and palmer penguin datasets to illustrate selecting numerical variables.
library(tidyverse) library(palmerpenguins)
select() frunction we will be using is from dplyr package and here check the installed dplyr version.
packageVersion("dplyr") ## [1] '1.0.8'
By looking at the penguins data we can see we have factor variables, integer and double variables.
penguins ## # A tibble: 344 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## 4 Adelie Torgersen NA NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 3450 ## 6 Adelie Torgersen 39.3 20.6 190 3650 ## 7 Adelie Torgersen 38.9 17.8 181 3625 ## 8 Adelie Torgersen 39.2 19.6 195 4675 ## 9 Adelie Torgersen 34.1 18.1 193 3475 ## 10 Adelie Torgersen 42 20.2 190 4250 ## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
Select All Numerical Columns, without using the names
We can select all the numerical columns from the dataframe without actually specifying the names of the numerical columns by using their datatypes. is.numeric() function can tell us if the variable is numerical or not. is.numeric identifies both double and integer variables as numeric.
And where() function is a selection helper that selects the variables for which a function returns TRUE. In our example is.numeric() is true for numerical variables FALSE for others.
penguins %>% select(where(is.numeric))
## # A tibble: 344 × 5 ## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year ## <dbl> <dbl> <int> <int> <int> ## 1 39.1 18.7 181 3750 2007 ## 2 39.5 17.4 186 3800 2007 ## 3 40.3 18 195 3250 2007 ## 4 NA NA NA NA 2007 ## 5 36.7 19.3 193 3450 2007 ## 6 39.3 20.6 190 3650 2007 ## 7 38.9 17.8 181 3625 2007 ## 8 39.2 19.6 195 4675 2007 ## 9 34.1 18.1 193 3475 2007 ## 10 42 20.2 190 4250 2007 ## # … with 334 more rows
Get this error? “Predicate functions must be wrapped in `where()`”.
Often we might tend to forget using where() function when trying to select numerical columns like this.
penguins %>% select(is.numeric)
However, this would give us the following warning once advising us to use where() and give the same results for now.
## Warning: Predicate functions must be wrapped in `where()`. ## ## # Bad ## data %>% select(is.numeric) ## ## # Good ## data %>% select(where(is.numeric)) ## ## ℹ Please update your code. ## This message is displayed once per session.