How to Compute row means

In this tutorial, we will learn how to compute means of rows with tidyverse using dplyr package. We will see multiple examples to compute row means with dplyr. Wee will start with 3. examples of computing row means using rowMeans() and dplyr’s row-wise operations on a dataframe with no missing values. And then we will see two examples using rowMeans() and row-wise operation on dataframe with missing values.

Here we load tidyverse meta package. and check the version of dplyr used here.

library(tidyverse)
# check package version
packageVersion("dplyr")

## [1] '1.1.0'

Creating data for computing row means with dplyr

Firs, let us create some toy dataframe with no missing values using sample() function. We first create some random data vector, then reshape it into a matrix and convert to a dataframe using as_tibble() function in tidyverse.

set.seed(2023)
# cread random data
data <- sample(c(1:6), 20, replace = TRUE)
# create a matrix
data_mat <- matrix(data, ncol=4)
# convert the matrix as dataframe
data_df<- as_tibble(data_mat)

Our dataframe looks like this.

data_df %>% head()
## # A tibble: 5 × 4
##      V1    V2    V3    V4
##   <int> <int> <int> <int>
## 1     5     2     1     5
## 2     1     1     5     4
## 3     3     1     5     5
## 4     2     5     2     1
## 5     4     1     3     1

Row means with dplyr using rowMeans() and across() with tidy selection

We compute mean for each row using rowMeans() function in base R in combination with across() to apply across multiple columns. We select the columns of interest using tidy select function starts_with().

data_df %>%
  mutate(rmean = rowMeans(across(starts_with("V"))))

## # A tibble: 5 × 5
##      V1    V2    V3    V4 rmean
##   <int> <int> <int> <int> <dbl>
## 1     5     2     1     5  3.25
## 2     1     1     5     4  2.75
## 3     3     1     5     5  3.5 
## 4     2     5     2     1  2.5 
## 5     4     1     3     1  2.25

Row means with dplyr using rowMeans() and pick() with tidy selection

In this example, we compute mean for each row using rowMeans() function in base R in combination with across() to apply across multiple column. We use dplyr’s new function pick() to select the columns of interest using tidy select function starts_with().

data_df %>%
  mutate(rmean = rowMeans(pick(starts_with("V"))))

## # A tibble: 5 × 5
##      V1    V2    V3    V4 rmean
##   <int> <int> <int> <int> <dbl>
## 1     5     2     1     5  3.25
## 2     1     1     5     4  2.75
## 3     3     1     5     5  3.5 
## 4     2     5     2     1  2.5 
## 5     4     1     3     1  2.25

Row means with using rowwise() function in dplyr

Another way we can compute row means with dplyr is to use row-wise operation in dplyr. To perform row-wise operation, dplyr has rowwise() function.

data_df %>%
  rowwise()

## # A tibble: 5 × 4
## # Rowwise: 
##      V1    V2    V3    V4
##   <int> <int> <int> <int>
## 1     5     2     1     5
## 2     1     1     5     4
## 3     3     1     5     5
## 4     2     5     2     1
## 5     4     1     3     1

First, we apply rowwise() function and then use mutate function to compute mean using c_across() function with some tidy select function. In this example, we use starts_with() to select the columns of interest.

data_df %>%
  rowwise() %>%
  mutate(rmean = mean(c_across(starts_with("V"))))

## # A tibble: 5 × 5
## # Rowwise: 
##      V1    V2    V3    V4 rmean
##   <int> <int> <int> <int> <dbl>
## 1     5     2     1     5  3.25
## 2     1     1     5     4  2.75
## 3     3     1     5     5  3.5 
## 4     2     5     2     1  2.5 
## 5     4     1     3     1  2.25

Row means with using rowwise() function in dplyr: Example 2

In the example below using rowwise() function in dplyr, we use start and end column names to select the columns of interest.

data_df %>%
  rowwise() %>%
  mutate(rmean = mean(c_across(V1:V4)))

## # A tibble: 5 × 5
## # Rowwise: 
##      V1    V2    V3    V4 rmean
##   <int> <int> <int> <int> <dbl>
## 1     5     2     1     5  3.25
## 2     1     1     5     4  2.75
## 3     3     1     5     5  3.5 
## 4     2     5     2     1  2.5 
## 5     4     1     3     1  2.25

Row means with using rowwise() function in dplyr: Example 3

We can also use other tidy tidy select function to select columns of interest. Here is example where use all numerical columns to compute row-wise meean.

data_df %>%
  rowwise() %>%
  mutate(rmean = mean(c_across(where(is.numeric))))

## # A tibble: 5 × 5
## # Rowwise: 
##      V1    V2    V3    V4 rmean
##   <int> <int> <int> <int> <dbl>
## 1     5     2     1     5  3.25
## 2     1     1     5     4  2.75
## 3     3     1     5     5  3.5 
## 4     2     5     2     1  2.5 
## 5     4     1     3     1  2.25

Row means with using rowwise() function in dplyr on dataframe with NAs

When you have NAs, i.e. missing values in the rows, both rowMeans() function and mean() function would result NA as the mean as they don’t remove NA before computing mean.

Here is an example showing the default behaviour of computing row means.

data <- sample(c(1:5, NA), 40, replace = TRUE)
data_mat <- matrix(data, ncol=4)
# convert the matrix as dataframe
data_df<- as_tibble(data_mat)

data_df %>% head()
## # A tibble: 6 × 4
##      V1    V2    V3    V4
##   <int> <int> <int> <int>
## 1    NA     5     3     5
## 2     2     4     4     2
## 3    NA    NA    NA     3
## 4    NA     1     1     1
## 5     5    NA     5     4
## 6     1     4    NA     2

data_df %>%
  mutate(rmean = rowMeans(across(starts_with("V"))))

## # A tibble: 10 × 5
##       V1    V2    V3    V4 rmean
##    <int> <int> <int> <int> <dbl>
##  1    NA     5     3     5    NA
##  2     2     4     4     2     3
##  3    NA    NA    NA     3    NA
##  4    NA     1     1     1    NA
##  5     5    NA     5     4    NA
##  6     1     4    NA     2    NA
##  7     2    NA     2     5    NA
##  8    NA    NA     4     2    NA
##  9    NA     2     4     4    NA
## 10     1     2     1     4     2

With the use of na.rm=TRUE we get the row means that we intended to get. In the example below we use na.rm argument to rowMeans() function.

data_df %>%
  mutate(rmean = rowMeans(across(starts_with("V")), na.rm=TRUE))

## # A tibble: 10 × 5
##       V1    V2    V3    V4 rmean
##    <int> <int> <int> <int> <dbl>
##  1    NA     5     3     5  4.33
##  2     2     4     4     2  3   
##  3    NA    NA    NA     3  3   
##  4    NA     1     1     1  1   
##  5     5    NA     5     4  4.67
##  6     1     4    NA     2  2.33
##  7     2    NA     2     5  3   
##  8    NA    NA     4     2  3   
##  9    NA     2     4     4  3.33
## 10     1     2     1     4  2

In this example below we use na.rm =TRUE argument to mean() function with row-wise operation using rowwise() function.

data_df %>%
  rowwise() %>%
  mutate(rmean = mean(c_across(where(is.numeric)), na.rm=TRUE))
## # A tibble: 10 × 5
## # Rowwise: 
##       V1    V2    V3    V4 rmean
##    <int> <int> <int> <int> <dbl>
##  1    NA     5     3     5  4.33
##  2     2     4     4     2  3   
##  3    NA    NA    NA     3  3   
##  4    NA     1     1     1  1   
##  5     5    NA     5     4  4.67
##  6     1     4    NA     2  2.33
##  7     2    NA     2     5  3   
##  8    NA    NA     4     2  3   
##  9    NA     2     4     4  3.33
## 10     1     2     1     4  2