How to drop unused level of factor variable in R

In this post, we will learn how to drop unused level or levels of a factor variable in R. Sometimes, we may end up with a factor variable with un used levels after some data munging. Unused factor levels can sometime create issues while analyzing the data.

In this tutorial, we will show how to drop unused levels of a factor variable using two approaches: one using droplevels() function available in base R and the second using fct_drop() from forcats R package in tidyverse.

Let us load the packages needed.

library(tidyverse)
library(palmerpenguins)

We will use Palmer Penguins data to show how to drop levels of a factor variable.

penguins |>head()

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650

Levels of a factor variable

In R, factor datatype is a useful way to represent categorical data and it’s values can take unique levels or categories. We can use levels() function to find how many distinct values/levels are there in a factor variable. In the example below, the factor variable, species, has three levels or values.

levels(penguins$species)

[1] "Adelie"    "Chinstrap" "Gentoo"

A thing to notice about factor variable is that even after removing all the values to a specific level, the factor variable will still include the removed as one of its levels.

Let us see an example, by filtering out one of the levels from a factor variable. In the example below, we have remove data for the species “Gentoo” to create a new dataframe.

df <- penguins |>
  filter(species != "Gentoo")

If we check the levels of the species variable in the new dataframe, it will still include the removed level. And this may cause problems while analyzing the new dataframe.

levels(df$species)

[1] "Adelie"    "Chinstrap" "Gentoo"

droplevels() to Remove unused levels of a factor variable

One of the ways to remove the unused levels in a factor variable is to use droplevels() function available in base R.

We remove the unused levels using droplevels() function and then re-assign it as our new factor variable.

df$species <- droplevels(df$species)

Now, if we check the levels of the factor varible, we will correctly see that we have removed the unused levels.

levels(df$species)

[1] "Adelie"    "Chinstrap"

foccats’ fct_drop() to Remove unused levels of a factor variable

Another way to remove unused levels of a factor variable is to use tidyverse’ fct_drop() function from forcats R package.

Let us filter out the rows corresponding to one of the levels as before.

df2 <- penguins |>
  filter(species != "Chinstrap")

We can see that the dataframe has the unused levels.

levels(df2$species)

[1] "Adelie"    "Chinstrap" "Gentoo"

We can use fct_drop() to drop unused levels in the factor variable and update the factor variable using mutate() function.

df2 <- df2 |> 
  mutate(species = fct_drop(species))

If we check the levels used now, it will show the levels that are used in the new dataframe as we wanted.

levels(df2$species)

[1] "Adelie" "Gentoo"

Levels of a factor variable

droplevels() to Remove unused levels of a factor variable

foccats’ fct_drop() to Remove unused levels of a factor variable

Related