dplyr count(): count unique values of a variable

In this tutorial, we will learn how to count unique values of a variable(s) or column using dplyr’s count() function. dplyr’s count() function is a convenient function that combines grouping operation and counting the number of observation in each group.

We will learn of 4 examples of using dplyr’s count() function with different arguments.

First, let us load tidyverse suit of R packages including dplyr.

library(tidyverse)

To illustrate the use dplyr’s count() function to count unique values of one or more columns we will use storms dataset available as part of ggplot2 package. It is already loaded for us as we have loaded tidyverse.

storms %>% 
  head()
storms %>% 
  head()
## # A tibble: 6 × 13
##   name   year month   day  hour   lat  long status       category  wind pressure
##   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>        <ord>    <int>    <int>
## 1 Amy    1975     6    27     0  27.5 -79   tropical de… -1          25     1013
## 2 Amy    1975     6    27     6  28.5 -79   tropical de… -1          25     1013
## 3 Amy    1975     6    27    12  29.5 -79   tropical de… -1          25     1013
## 4 Amy    1975     6    27    18  30.5 -79   tropical de… -1          25     1013
## 5 Amy    1975     6    28     0  31.5 -78.8 tropical de… -1          25     1012
## 6 Amy    1975     6    28     6  32.4 -78.7 tropical de… -1          25     1012
## # … with 2 more variables: ts_diameter <dbl>, hu_diameter <dbl>

dplyr count(): Count the unique values of a single column or variable

If we are interested in counting the unique values of single character or categorical variables, we provide the name of the variable or column as argument to count() function.

In this example below, we are counting the number of observations of storm “status”. We see that “status” column has three different values and we get a new column with name “n” containing the number of observations for each storm value.

storms %>% 
  count(status)

## # A tibble: 3 × 2
##   status                  n
##   <chr>               <int>
## 1 hurricane            3091
## 2 tropical depression  2545
## 3 tropical storm       4374

dplyr count() rename the new column column in the output

By default, count() function gives a dataframe with count values name “n”, as we saw in the above example. We can change the name of the resulting column by using “name” argument to count() function.

storms %>% 
  count(status, name="n_storms")

## # A tibble: 3 × 2
##   status              n_storms
##   <chr>                  <int>
## 1 hurricane               3091
## 2 tropical depression     2545
## 3 tropical storm          4374

dplyr count() sort the counts in descending order: groups with largest count come first

We can also sort the resulting dataframe containing groups with largest count first using sort=TRUR argument to count().

Here we that “tropical storm” status is the most frequent in the dataset.

storms %>% 
  count(status, sort=TRUE)

## # A tibble: 3 × 2
##   status                  n
##   <chr>               <int>
## 1 tropical storm       4374
## 2 hurricane            3091
## 3 tropical depression  2545

dplyr count(): Count the unique values of multiple columns or variables

dplyr’s count() function can table multiple variables or columns and compute the number of unique values.

storms %>% 
  count(status, category)

# # A tibble: 8 × 3
##   status              category     n
##   <chr>               <ord>    <int>
## 1 hurricane           1         1684
## 2 hurricane           2          628
## 3 hurricane           3          363
## 4 hurricane           4          348
## 5 hurricane           5           68
## 6 tropical depression -1        2545
## 7 tropical storm      0         4373
## 8 tropical storm      1            1

dplyr count(): Keep the empty group counts

By default, dplyr’ count() function does not report the group with zero counts. We can use the argument “.drop=FALSE” to keep the groups with zero counts.

storms %>% 
  count(status, category, .drop=FALSE)
## # A tibble: 21 × 3
##    status              category     n
##    <chr>               <ord>    <int>
##  1 hurricane           -1           0
##  2 hurricane           0            0
##  3 hurricane           1         1684
##  4 hurricane           2          628
##  5 hurricane           3          363
##  6 hurricane           4          348
##  7 hurricane           5           68
##  8 tropical depression -1        2545
##  9 tropical depression 0            0
## 10 tropical depression 1            0
## # … with 11 more rows
Exit mobile version