How to compute proportion with tidyverse

In this tutorial, we will learn how to compute proportion with tidyverse. We will see three examples of caluculating proportion. In the first, we have counts of a single column and show how to calculate proportion. The second example shows how to compute proportion of variable resulting from combination of two other variables.

library(palmerpenguins)
library(tidyverse)
packageVersion("dplyr")
[1] '1.1.4'

We will use Palmer penguin dataset to compute proportion.

penguins <- 
  penguins |>
  drop_na()

First we will show proportion of a single variable, species. Here we have counts of the three species of penguins in the data.

penguins |>
  count(species)

# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      146
2 Chinstrap    68
3 Gentoo      119

To compute proportion, we first count the number for each species and then use mutate() function with n to compute the proportion.

penguins |>
  count(species) |>
  mutate(prop = n/sum(n))

# A tibble: 3 × 3
  species       n  prop
  <fct>     <int> <dbl>
1 Adelie      146 0.438
2 Chinstrap    68 0.204
3 Gentoo      119 0.357

To compute proportion of variable generated from two other categorical variable, we will first use count() on the two categorical variables to get the counts for each combination and then use mutate as before to compute the proportion.

penguins |>
  count(species, sex) |>
  mutate(prop = n/sum(n))

# A tibble: 6 × 4
  species   sex        n  prop
  <fct>     <fct>  <int> <dbl>
1 Adelie    female    73 0.219
2 Adelie    male      73 0.219
3 Chinstrap female    34 0.102
4 Chinstrap male      34 0.102
5 Gentoo    female    58 0.174
6 Gentoo    male      61 0.183

Compute proportion within groups

In this example, we show how to compute proportion within multiple groups, i.e. proportion of male/female with in each species.

penguins |>
  count(species,sex) 

# A tibble: 6 × 3
  species   sex        n
  <fct>     <fct>  <int>
1 Adelie    female    73
2 Adelie    male      73
3 Chinstrap female    34
4 Chinstrap male      34
5 Gentoo    female    58
6 Gentoo    male      61
penguins |>
  count(species,sex) |>
  group_by(species) |>
  mutate(proportion = n / sum(n))

# A tibble: 6 × 4
# Groups:   species [3]
  species   sex        n proportion
  <fct>     <fct>  <int>      <dbl>
1 Adelie    female    73      0.5  
2 Adelie    male      73      0.5  
3 Chinstrap female    34      0.5  
4 Chinstrap male      34      0.5  
5 Gentoo    female    58      0.487
6 Gentoo    male      61      0.513

Leave a comment

Your email address will not be published. Required fields are marked *

Exit mobile version