With dplyr’s if_else() function, we can create a new variable based on the values of another variable. if_else() function in dplyr takes in a condition as input and can assign a value based on the condition is true or false.
In this example, we will use dplyr’s if_else() to categorise or dichotomise a numerical variable into two groups first. If the value of existing variable is > 0 we assign “+ve” otherwise “-ve”. Then see how to deal with missing values using if_else() function.
First, let us load tidyverse and create an example toy dataframe with just one variable.
library(tidyverse)
We use sample() function to create a random dataset with both positive and negative values. Our toy dataset contains one numerical variable with some missing values.
df <- tibble(x = sample(c(-3:3, NA), 10, replace=TRUE) ) df ## # A tibble: 10 × 1 ## x ## <int> ## 1 2 ## 2 -2 ## 3 1 ## 4 1 ## 5 -3 ## 6 NA ## 7 NA ## 8 -2 ## 9 NA ## 10 2
dplyr’s if_else() example dichotomise a numerical variable
dplyr’ss if_else() takes a condition as its first argument. When the condition is executed, we either get True or False as a result. And if_else() will select the second argument when the condition is True and the third argument when the condition is False. As if_else() is vectorised, it will apply the condition to all elements of a variable.
In thee example below, we dichotomise a numerical variable into “+ve” and “-ve” based on the numerical variable’s value. Note how if_else() treats NAs in the input numrical value. By default, it does not do anything.
df %>% mutate(sign = if_else(x > 0, "+ve", "-ve")) ## # A tibble: 10 × 2 ## x sign ## <int> <chr> ## 1 2 +ve ## 2 -2 -ve ## 3 1 +ve ## 4 1 +ve ## 5 -3 -ve ## 6 NA <NA> ## 7 NA <NA> ## 8 -2 -ve ## 9 NA <NA> ## 10 2 +ve
base R’s ifelse() function to dichotomize a numerical variable
df %>% mutate(sign = ifelse(x>0, "+ve", "-ve")) ## # A tibble: 10 × 2 ## x sign ## <int> <chr> ## 1 2 +ve ## 2 -2 -ve ## 3 1 +ve ## 4 1 +ve ## 5 -3 -ve ## 6 NA <NA> ## 7 NA <NA> ## 8 -2 -ve ## 9 NA <NA> ## 10 2 +ve
Dealing with missing values in dplyr if_else()
With dplyr’s if_else() function we can specify how to deal with NAs in the input variable. It has argument “missing” and here wee specify NAs to be coded as “missing”.
Note the new variable that we created now has “missing” wherever the input variable had NAs.
df %>% mutate(sign = if_else(x > 0, "+ve", "-ve", missing="missing")) ## # A tibble: 10 × 2 ## x sign ## <int> <chr> ## 1 2 +ve ## 2 -2 -ve ## 3 1 +ve ## 4 1 +ve ## 5 -3 -ve ## 6 NA missing ## 7 NA missing ## 8 -2 -ve ## 9 NA missing ## 10 2 +ve
base R’s ifelse() function
We can also use base R’s ifelse() function to dichotomise a numerical variable. Here is an example of using basae R’s ifelse() instead of dplyr’s if_else() function.
One of the main differences between base R’s ifelse() and dplyr’s if_else() function is that dplyr’s if_else() function is more strict. And it
checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.
In addition, base R’s ifelse() can not deal with NAs like dplyr’s if_else() function. For example, if we try the same code as above, but with base R’s ifelse, we get the following error, as the “missing” argument is not present in ifelse()
df %>% mutate(sign = ifelse(x > 0, "+ve", "-ve", "missing")) Error in `mutate()`: ! Problem while computing `sign = ifelse(x > 0, "+ve", "-ve", "missing")`. Caused by error in `ifelse()`: ! unused argument ("missing")
[…] want to convert the numerical variable into a categorical variable with just two levels, we can use if_else() function and create the categorical variable as shown […]