• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to drop unused level of factor variable in R

rstats101 · October 8, 2024 ·

In this post, we will learn how to drop unused level or levels of a factor variable in R. Sometimes, we may end up with a factor variable with un used levels after some data munging. Unused factor levels can sometime create issues while analyzing the data.

In this tutorial, we will show how to drop unused levels of a factor variable using two approaches: one using droplevels() function available in base R and the second using fct_drop() from forcats R package in tidyverse.

Let us load the packages needed.

library(tidyverse)
library(palmerpenguins)

We will use Palmer Penguins data to show how to drop levels of a factor variable.

penguins |>head()

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650

Levels of a factor variable

In R, factor datatype is a useful way to represent categorical data and it’s values can take unique levels or categories. We can use levels() function to find how many distinct values/levels are there in a factor variable. In the example below, the factor variable, species, has three levels or values.

levels(penguins$species)

[1] "Adelie"    "Chinstrap" "Gentoo" 

A thing to notice about factor variable is that even after removing all the values to a specific level, the factor variable will still include the removed as one of its levels.

Let us see an example, by filtering out one of the levels from a factor variable. In the example below, we have remove data for the species “Gentoo” to create a new dataframe.

df <- penguins |>
  filter(species != "Gentoo")

If we check the levels of the species variable in the new dataframe, it will still include the removed level. And this may cause problems while analyzing the new dataframe.

levels(df$species)

[1] "Adelie"    "Chinstrap" "Gentoo" 

droplevels() to Remove unused levels of a factor variable

One of the ways to remove the unused levels in a factor variable is to use droplevels() function available in base R.

We remove the unused levels using droplevels() function and then re-assign it as our new factor variable.

df$species <- droplevels(df$species)

Now, if we check the levels of the factor varible, we will correctly see that we have removed the unused levels.

levels(df$species)

[1] "Adelie"    "Chinstrap"

foccats’ fct_drop() to Remove unused levels of a factor variable

Another way to remove unused levels of a factor variable is to use tidyverse’ fct_drop() function from forcats R package.

Let us filter out the rows corresponding to one of the levels as before.

df2 <- penguins |>
  filter(species != "Chinstrap")

We can see that the dataframe has the unused levels.

levels(df2$species)

[1] "Adelie"    "Chinstrap" "Gentoo"   

We can use fct_drop() to drop unused levels in the factor variable and update the factor variable using mutate() function.

df2 <- df2 |> 
  mutate(species = fct_drop(species)) 

If we check the levels used now, it will show the levels that are used in the new dataframe as we wanted.

levels(df2$species)

[1] "Adelie" "Gentoo"

Related

Filed Under: droplevels(), forcats fct_drop() Tagged With: drop unused levels of factor

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version