• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

dplyr case_when() to create new variable using multiple conditions

rstats101 · March 17, 2023 ·

In this tutorial, we will learn how to use dplyr’s case_when() function to create a new variable based on multiple conditions. dplyr’s case_when() function offers a genereal solution when you might needed multiple if_else() conditions.

In this blog post, we’ll explore case_when() function with multiple examples. Let us start with creating a simple data frame with one numerical column. We will create a new variable using the numerical variable with multiple conditions.

First, let us load tidyverse and check our installed version of dplyr. In this example, we will use dplyr version 1.1.0. With dplyr’s v1.1.0 case_when() function has gained default option and we will be using that in this post.

library(tidyverse)
packageVersion("dplyr")

[1] '1.1.0'

Our toy data frame with one numerical variable, exam scores, looks like this.

df <-  tibble(score = seq(10,100, by=20))
df

# A tibble: 5 × 1
  score
  <dbl>
1    10
2    30
3    50
4    70
5    90

In the first example, we will be converting exam scores into a grade ranging from A to F based on the scores. For example, when a score is less than 35 we assign grade F, when a score is in between 35 and 50 we assign grade D.

The basic syntax to use case_when() is as shown in the example below.

df %>%
  mutate(grade = case_when(
    score < 35 ~ "F",
    score < 50 ~ "D",
    score < 70 ~ "C",
    score <= 80 ~ "B",
    score < 100 ~ "A",
    )
    )

This will give us a new dataframe with an additional column as shown below.

# A tibble: 5 × 2
  score grade
  <dbl> <chr>
1    10 F    
2    30 F    
3    50 C    
4    70 B    
5    90 A    

dplyr case_when() example with default value

Sometime you may want to cover specific cases and for the rest you may want a default value. Here is a simple example using case_when() function with a default value. In the example below, we use .default argument to set default value for any condition that is not specified (We assign grade A to score above 80, be default).

df %>%
  mutate(grade = case_when(
    score <= 35 ~ "F",
    score < 50 ~ "D",
    score < 70 ~ "C",
    score <= 80 ~ "B",
    .default = "A"
    )
    )

# A tibble: 5 × 2
  score grade
  <dbl> <chr>
1    10 F    
2    30 F    
3    50 C    
4    70 B    
5    90 A

The above .default example is a bit pedantic as there is only one condition left in our example. Here is another simple example with .default argument. Here we set any score greater than 35 gets the default value specified by .default argument.

df %>%
  mutate(grade = case_when(
    score <= 35 ~ "Fail",
    .default = "Pass"
    )
    )

# A tibble: 5 × 2
  score grade
  <dbl> <chr>
1    10 Fail 
2    30 Fail 
3    50 Pass 
4    70 Pass 
5    90 Pass 

dplyr’s case_when() example with missing values

In the example below, we will learn how to deal with missing values i.e. NAs, present in the variable of interest. First, we will create a dataframe with a column containing NAs.

df <-  tibble(score = c(seq(10,100, by=20), NA))
df

# A tibble: 6 × 1
  score
  <dbl>
1    10
2    30
3    50
4    70
5    90
6    NA

By ignoring NAs and if we use case_when() function to create a new variable, we might inadvertently make a mistake. In the example below, we have assigned PASS grade to element with NA for score.


df %>%
  mutate(grade = case_when(
     score <= 35 ~ "Fail",
    .default="Pass"
    )
    )

# A tibble: 6 × 2
  score grade
  <dbl> <chr>
1    10 Fail 
2    30 Fail 
3    50 Pass 
4    70 Pass 
5    90 Pass 
6    NA Pass 

We can handle NA values, by adding a condition that checks for NA values using is.na() function and creating a specific value for NAs.

df %>%
  mutate(grade = case_when(
     score <= 35 ~ "Fail",
     is.na(score) ~ NA,
    .default="Pass"
    )
    )

# A tibble: 6 × 2
  score grade
  <dbl> <chr>
1    10 Fail 
2    30 Fail 
3    50 Pass 
4    70 Pass 
5    90 Pass 
6    NA <NA> 

Related

Filed Under: dplyr case_when(), rstats101 Tagged With: dplyr case_when() default, dplyr case_when() with NAs

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version