In this tutorial, we will learn how to count unique values of a variable(s) or column using dplyr’s count() function. dplyr’s count() function is a convenient function that combines grouping operation and counting the number of observation in each group.
We will learn of 4 examples of using dplyr’s count() function with different arguments.
First, let us load tidyverse suit of R packages including dplyr.
library(tidyverse)
To illustrate the use dplyr’s count() function to count unique values of one or more columns we will use storms dataset available as part of ggplot2 package. It is already loaded for us as we have loaded tidyverse.
storms %>% head()
storms %>% head() ## # A tibble: 6 × 13 ## name year month day hour lat long status category wind pressure ## <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr> <ord> <int> <int> ## 1 Amy 1975 6 27 0 27.5 -79 tropical de… -1 25 1013 ## 2 Amy 1975 6 27 6 28.5 -79 tropical de… -1 25 1013 ## 3 Amy 1975 6 27 12 29.5 -79 tropical de… -1 25 1013 ## 4 Amy 1975 6 27 18 30.5 -79 tropical de… -1 25 1013 ## 5 Amy 1975 6 28 0 31.5 -78.8 tropical de… -1 25 1012 ## 6 Amy 1975 6 28 6 32.4 -78.7 tropical de… -1 25 1012 ## # … with 2 more variables: ts_diameter <dbl>, hu_diameter <dbl>
dplyr count(): Count the unique values of a single column or variable
If we are interested in counting the unique values of single character or categorical variables, we provide the name of the variable or column as argument to count() function.
In this example below, we are counting the number of observations of storm “status”. We see that “status” column has three different values and we get a new column with name “n” containing the number of observations for each storm value.
storms %>% count(status) ## # A tibble: 3 × 2 ## status n ## <chr> <int> ## 1 hurricane 3091 ## 2 tropical depression 2545 ## 3 tropical storm 4374
dplyr count() rename the new column column in the output
By default, count() function gives a dataframe with count values name “n”, as we saw in the above example. We can change the name of the resulting column by using “name” argument to count() function.
storms %>% count(status, name="n_storms") ## # A tibble: 3 × 2 ## status n_storms ## <chr> <int> ## 1 hurricane 3091 ## 2 tropical depression 2545 ## 3 tropical storm 4374
dplyr count() sort the counts in descending order: groups with largest count come first
We can also sort the resulting dataframe containing groups with largest count first using sort=TRUR argument to count().
Here we that “tropical storm” status is the most frequent in the dataset.
storms %>% count(status, sort=TRUE) ## # A tibble: 3 × 2 ## status n ## <chr> <int> ## 1 tropical storm 4374 ## 2 hurricane 3091 ## 3 tropical depression 2545
dplyr count(): Count the unique values of multiple columns or variables
dplyr’s count() function can table multiple variables or columns and compute the number of unique values.
storms %>% count(status, category) # # A tibble: 8 × 3 ## status category n ## <chr> <ord> <int> ## 1 hurricane 1 1684 ## 2 hurricane 2 628 ## 3 hurricane 3 363 ## 4 hurricane 4 348 ## 5 hurricane 5 68 ## 6 tropical depression -1 2545 ## 7 tropical storm 0 4373 ## 8 tropical storm 1 1
dplyr count(): Keep the empty group counts
By default, dplyr’ count() function does not report the group with zero counts. We can use the argument “.drop=FALSE” to keep the groups with zero counts.
storms %>% count(status, category, .drop=FALSE) ## # A tibble: 21 × 3 ## status category n ## <chr> <ord> <int> ## 1 hurricane -1 0 ## 2 hurricane 0 0 ## 3 hurricane 1 1684 ## 4 hurricane 2 628 ## 5 hurricane 3 363 ## 6 hurricane 4 348 ## 7 hurricane 5 68 ## 8 tropical depression -1 2545 ## 9 tropical depression 0 0 ## 10 tropical depression 1 0 ## # … with 11 more rows