Compute rowwise mean and standard deviation

In this post, we will learn how to compute row-wise summary statistics like mean and standard deviation using dplyr’s row_wise() function.

First, let us load tidyverse and verify the version of dplyr.

library(tidyverse)
packageVersion("dplyr")
[1] '1.1.2'

Let us create a toy dataframe with five columns. We use sample() function to create some random vector.

set.seed(2027)
# cread random data
data <- sample(c(1:100), 20, replace = TRUE)
data

 [1] 72 71 91 56 15 59 22 79 78 40 20 54  3  9 82 97 86 59  9 28

And then use matrix() function to covert the vector to matrix form.

# create a matrix
data_mat <- matrix(data, ncol=4)
data_mat

     [,1] [,2] [,3] [,4]
[1,]   72   59   20   97
[2,]   71   22   54   86
[3,]   91   79    3   59
[4,]   56   78    9    9
[5,]   15   40   82   28

Finally we will create a data frame using as_tibble() function with a column for row ID using dplyr’s row_number() function.

colnames(data_mat) <- paste0("C",seq(4))
# convert the matrix as dataframe
data_df<- as_tibble(data_mat) %>%
  mutate(row_id=paste0("R",row_number())) %>%
  relocate(row_id)

data_df

# A tibble: 5 × 5
  row_id    C1    C2    C3    C4
  <chr>  <int> <int> <int> <int>
1 R1        72    59    20    97
2 R2        71    22    54    86
3 R3        91    79     3    59
4 R4        56    78     9     9
5 R5        15    40    82    28

Compute rowwise mean and SD

Now we are ready to compute row-wise mean and standard deviation.

data_df %>%
  rowwise()

# A tibble: 5 × 5
# Rowwise: 
  row_id    C1    C2    C3    C4
  <chr>  <int> <int> <int> <int>
1 R1        72    59    20    97
2 R2        71    22    54    86
3 R3        91    79     3    59
4 R4        56    78     9     9
5 R5        15    40    82    28

After applying rowwise() function we use summarize() function to compute row mean and SD. We need to select the columns to compute the mean and SD. Here we use the column names to select the columns.

data_df %>%
  rowwise() %>%
  summarize(m = mean(C1:C4),
            std = sd(C1:C4))

# A tibble: 5 × 2
      m   std
  <dbl> <dbl>
1  84.5  7.65
2  78.5  4.76
3  75    9.67
4  32.5 14   
5  21.5  4.18

In the previous example we lost the row id or variable when we computed rowwise mean and SD. To keep the variable we use rowwise() function on the column of interest. Here we use rowwise(row_id) and that keeps the row_id column in the result.

data_df %>%
  rowwise(row_id) %>%
  summarize(m = mean(C1:C4),
            std = sd(C1:C4))

#`summarise()` has grouped output by 'row_id'. You can override using the
#`.groups` argument.


# A tibble: 5 × 3
# Groups:   row_id [5]
  row_id     m   std
  <chr>  <dbl> <dbl>
1 R1      84.5  7.65
2 R2      78.5  4.76
3 R3      75    9.67
4 R4      32.5 14   
5 R5      21.5  4.18
Exit mobile version