In this post, we will learn how to compute row-wise summary statistics like mean and standard deviation using dplyr’s row_wise() function.
First, let us load tidyverse and verify the version of dplyr.
library(tidyverse) packageVersion("dplyr") [1] '1.1.2'
Let us create a toy dataframe with five columns. We use sample() function to create some random vector.
set.seed(2027) # cread random data data <- sample(c(1:100), 20, replace = TRUE) data [1] 72 71 91 56 15 59 22 79 78 40 20 54 3 9 82 97 86 59 9 28
And then use matrix() function to covert the vector to matrix form.
# create a matrix data_mat <- matrix(data, ncol=4) data_mat [,1] [,2] [,3] [,4] [1,] 72 59 20 97 [2,] 71 22 54 86 [3,] 91 79 3 59 [4,] 56 78 9 9 [5,] 15 40 82 28
Finally we will create a data frame using as_tibble() function with a column for row ID using dplyr’s row_number() function.
colnames(data_mat) <- paste0("C",seq(4)) # convert the matrix as dataframe data_df<- as_tibble(data_mat) %>% mutate(row_id=paste0("R",row_number())) %>% relocate(row_id) data_df # A tibble: 5 × 5 row_id C1 C2 C3 C4 <chr> <int> <int> <int> <int> 1 R1 72 59 20 97 2 R2 71 22 54 86 3 R3 91 79 3 59 4 R4 56 78 9 9 5 R5 15 40 82 28
Compute rowwise mean and SD
Now we are ready to compute row-wise mean and standard deviation.
data_df %>% rowwise() # A tibble: 5 × 5 # Rowwise: row_id C1 C2 C3 C4 <chr> <int> <int> <int> <int> 1 R1 72 59 20 97 2 R2 71 22 54 86 3 R3 91 79 3 59 4 R4 56 78 9 9 5 R5 15 40 82 28
After applying rowwise() function we use summarize() function to compute row mean and SD. We need to select the columns to compute the mean and SD. Here we use the column names to select the columns.
data_df %>% rowwise() %>% summarize(m = mean(C1:C4), std = sd(C1:C4)) # A tibble: 5 × 2 m std <dbl> <dbl> 1 84.5 7.65 2 78.5 4.76 3 75 9.67 4 32.5 14 5 21.5 4.18
In the previous example we lost the row id or variable when we computed rowwise mean and SD. To keep the variable we use rowwise() function on the column of interest. Here we use rowwise(row_id) and that keeps the row_id column in the result.
data_df %>% rowwise(row_id) %>% summarize(m = mean(C1:C4), std = sd(C1:C4)) #`summarise()` has grouped output by 'row_id'. You can override using the #`.groups` argument. # A tibble: 5 × 3 # Groups: row_id [5] row_id m std <chr> <dbl> <dbl> 1 R1 84.5 7.65 2 R2 78.5 4.76 3 R3 75 9.67 4 R4 32.5 14 5 R5 21.5 4.18