How to sum a column by group in R

In this tutorial, we will learn how to compute the sum of a column by groups in another column in a dataframe. Basic, idea is to group the dataframe by the grouping variable/column and then find the sum for each group.

Let us get started by loading tidyverse suite of R packages. We will be using dplyr’s group_by() function and summarize() functions to find the sum/total of a variable by group.

library(tidyverse)

We will create a simple dataframe with two columns, where one is a grouping variable and the other is numerical variable. We use tibble() to create the dataframe from scratch, mainly using sample() function to creating the two variables.

set.seed(41)
df <- tibble(grp = sample(c("g1","g2"), 8, 
                          replace=TRUE),
         counts = sample(1:20,8) )

Our dataframe looks like this. Our grouping variable “grp” has two unique values and we will be computing the sum for both the values using group_by() and summarize() functions from dplyr.

df

## # A tibble: 8 × 2
##   grp   counts
##   <chr>  <int>
## 1 g2         6
## 2 g1        18
## 3 g1         2
## 4 g2        13
## 5 g2        17
## 6 g2        19
## 7 g2         5
## 8 g1        20

Computing sum of column in a dataframe based on a grouping column in R

dplyr’s group_by() function allows use to split the dataframe into smaller dataframes based on a variable of interest. The result after group_by() has all the elements of original dataframe, but with grouping information.

df %>%
  group_by(grp)

## # A tibble: 8 × 2
## # Groups:   grp [2]
##   grp   counts
##   <chr>  <int>
## 1 g2         6
## 2 g1        18
## 3 g1         2
## 4 g2        13
## 5 g2        17
## 6 g2        19
## 7 g2         5
## 8 g1        20

Then, we can use summarize() function to compute the sum of each grouping variable.

df %>%
  group_by(grp) %>%
  summarize(total = sum(counts))

## # A tibble: 2 × 2
##   grp   total
##   <chr> <int>
## 1 g1       40
## 2 g2       60