In this tutorial, we will learn how to convert numerical or continuous variable into categorical variable. We will start with the simple example converting a numerical variable into a categorical variable with just two levels. And then we will see an example of converting a numerical variable into a categorical variable with multiple levels.
library(tidyverse)
Let us create a simple dataframe with a numerical column. Our numerical variable is exam scores ranging from 0 to 100.
set.seed(421) df <- tibble(score= floor(runif(10, min=0, max=100))) df # A tibble: 10 × 1 score <dbl> 1 78 2 14 3 71 4 31 5 84 6 69 7 90 8 68 9 52 10 20
Convert a Numerical Variable in Categorical Variable with two levels
If we want to convert the numerical variable into a categorical variable with just two levels, we can use if_else() function and create the categorical variable as shown below.
df %>% mutate(pass=if_else(score>40, "PASS", "FAIL")) # A tibble: 10 × 2 score pass <dbl> <chr> 1 78 PASS 2 14 FAIL 3 71 PASS 4 31 FAIL 5 84 PASS 6 69 PASS 7 90 PASS 8 68 PASS 9 52 PASS 10 20 FAIL
Convert a Numerical Variable in Categorical Variable with Multiple levels
To create a categorical variable with multiple levels we use cut() function in base R. The basic use of cut() function as defined by the help page is
cut divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.
For example, if we specify the breaks as 0,20,40,60,80,100, we can create 5 level categorical variable. In the example below, we use cut() to create five-level categorical variable from score.
df %>% mutate(grade = cut(score, breaks = c(0, 20, 40, 60, 80, 100), labels = c("F", "D", "C" ,"B", "A")))
# A tibble: 10 × 2 score grade <dbl> <fct> 1 78 B 2 14 F 3 71 B 4 31 D 5 84 A 6 69 B 7 90 A 8 68 B 9 52 C 10 20 F