• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr if_else(): Create new variable from existing variable

rstats101 · November 10, 2022 ·

With dplyr’s if_else() function, we can create a new variable based on the values of another variable. if_else() function in dplyr takes in a condition as input and can assign a value based on the condition is true or false.

In this example, we will use dplyr’s if_else() to categorise or dichotomise a numerical variable into two groups first. If the value of existing variable is > 0 we assign “+ve” otherwise “-ve”. Then see how to deal with missing values using if_else() function.

First, let us load tidyverse and create an example toy dataframe with just one variable.

library(tidyverse)

We use sample() function to create a random dataset with both positive and negative values. Our toy dataset contains one numerical variable with some missing values.

df <- tibble(x = sample(c(-3:3, NA),
                        10,
                        replace=TRUE)
             )
df

## # A tibble: 10 × 1
##        x
##    <int>
##  1     2
##  2    -2
##  3     1
##  4     1
##  5    -3
##  6    NA
##  7    NA
##  8    -2
##  9    NA
## 10     2

dplyr’s if_else() example dichotomise a numerical variable

dplyr’ss if_else() takes a condition as its first argument. When the condition is executed, we either get True or False as a result. And if_else() will select the second argument when the condition is True and the third argument when the condition is False. As if_else() is vectorised, it will apply the condition to all elements of a variable.

In thee example below, we dichotomise a numerical variable into “+ve” and “-ve” based on the numerical variable’s value. Note how if_else() treats NAs in the input numrical value. By default, it does not do anything.

df %>%
  mutate(sign = if_else(x > 0,
                       "+ve", 
                       "-ve"))

## # A tibble: 10 × 2
##        x sign 
##    <int> <chr>
##  1     2 +ve  
##  2    -2 -ve  
##  3     1 +ve  
##  4     1 +ve  
##  5    -3 -ve  
##  6    NA <NA> 
##  7    NA <NA> 
##  8    -2 -ve  
##  9    NA <NA> 
## 10     2 +ve

base R’s ifelse() function to dichotomize a numerical variable

df %>%
  mutate(sign = ifelse(x>0, "+ve", 
                       "-ve"))

## # A tibble: 10 × 2
##        x sign 
##    <int> <chr>
##  1     2 +ve  
##  2    -2 -ve  
##  3     1 +ve  
##  4     1 +ve  
##  5    -3 -ve  
##  6    NA <NA> 
##  7    NA <NA> 
##  8    -2 -ve  
##  9    NA <NA> 
## 10     2 +ve

Dealing with missing values in dplyr if_else()

With dplyr’s if_else() function we can specify how to deal with NAs in the input variable. It has argument “missing” and here wee specify NAs to be coded as “missing”.

Note the new variable that we created now has “missing” wherever the input variable had NAs.

df %>%
  mutate(sign = if_else(x > 0, "+ve", 
                       "-ve", 
                       missing="missing"))

## # A tibble: 10 × 2
##        x sign   
##    <int> <chr>  
##  1     2 +ve    
##  2    -2 -ve    
##  3     1 +ve    
##  4     1 +ve    
##  5    -3 -ve    
##  6    NA missing
##  7    NA missing
##  8    -2 -ve    
##  9    NA missing
## 10     2 +ve

base R’s ifelse() function

We can also use base R’s ifelse() function to dichotomise a numerical variable. Here is an example of using basae R’s ifelse() instead of dplyr’s if_else() function.

One of the main differences between base R’s ifelse() and dplyr’s if_else() function is that dplyr’s if_else() function is more strict. And it

checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.

In addition, base R’s ifelse() can not deal with NAs like dplyr’s if_else() function. For example, if we try the same code as above, but with base R’s ifelse, we get the following error, as the “missing” argument is not present in ifelse()


df %>%
  mutate(sign = ifelse(x > 0, "+ve", 
                       "-ve", "missing"))

Error in `mutate()`:
! Problem while computing `sign = ifelse(x > 0, "+ve", "-ve",
  "missing")`.
Caused by error in `ifelse()`:
! unused argument ("missing")

Related

Filed Under: rstats101 Tagged With: dichotomize a numerical variable, dplyr if_else()

Reader Interactions

Trackbacks

  1. Convert Numerical Variable into a Categorical Variable - Rstats 101 says:
    November 23, 2022 at 11:30 am

    […] want to convert the numerical variable into a categorical variable with just two levels, we can use if_else() function and create the categorical variable as shown […]

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version