How to compute row means in R

In this tutorial, we will learn about rowMeans() function in base R and use it to calculate mean of all rows in a matrix or a dataframe. We will see two examples to understand the use rowMeans() function. First, we will calculate mean of all rows in a matrix and dataframe with no missing values (NAs). Next, we will learn how to compute mean of all numerical rows when the matrix/dataframe has missing values.

Create a matrix and dataframe from scratch

Let us create a matrix and dataframe from scratch using random numbers generated using sample() function. First we create a vector of numbers.

set.seed(2023)
data <- sample(c(1:6), 50, replace = TRUE)
data
##  [1] 5 1 3 2 4 2 1 1 5 1 1 5 5 2 3 5 4 5 1 1 6 2 6 6 5 1 2 6 6 1 5 4 6 1 6 4 6 6
## [39] 2 2 3 4 6 1 5 6 2 4 4 1

And then we use matrix() function to create a matrix.

data_mat <- matrix(data, ncol=5)

Finally, we use as.data.frame() function to create a dataframe.

data_df<- as.data.frame(data_mat)

Row Means of a matrix

Let us compute the mean of all the rows using rowMeans on the matrix. Our data matrix is complete with no missing data.

head(data_mat)

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    5    1    6    5    3
## [2,]    1    5    2    4    4
## [3,]    3    5    6    6    6
## [4,]    2    2    6    1    1
## [5,]    4    3    5    6    5
## [6,]    2    5    1    4    6

We can use rowMeans() function from base R to compute mean values of each row.

rowMeans(data_mat)

##  [1] 4.0 3.2 5.2 2.4 4.6 3.6 3.0 4.4 3.6 1.2

Mean of row of a dataframe

Our data frame looks like this.

head(data_df)

##   V1 V2 V3 V4 V5
## 1  5  1  6  5  3
## 2  1  5  2  4  4
## 3  3  5  6  6  6
## 4  2  2  6  1  1
## 5  4  3  5  6  5
## 6  2  5  1  4  6

And we can compute rows of the dataframe using rowMeans() function with the dataframe as its argument.

rowMeans(data_df)
##  [1] 4.0 3.2 5.2 2.4 4.6 3.6 3.0 4.4 3.6 1.2

How to calculate Mean of Rows of a matrix with missing data (NAs)

By default, rowMeans() function does not remove if there are any missing values (NAs) in the data matrix or dataframe. Here we will learn how to compute rowmeans by removing any missing values in the data.

First, let create a matrix and dataframe with missing values.

data <- sample(c(1:5, NA), 50, replace = TRUE)
data_mat <- matrix(data, ncol=5)
data_df<- as.data.frame(data_mat)

In this example, the data matrix has missing values (NAs) in about 5 rows of the total 10 rows. Here is a look at the head of data with NAs.

head(data_mat)

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    5    4    4    2    1
## [2,]    2    3    2    5    4
## [3,]    3    1    3    5    5
## [4,]    1    1    5    3    4
## [5,]    4    3   NA    3    1
## [6,]    2   NA    4    3    1

So when we apply rowMeans() on the data matrix, it computes the mean of rows where there is no missing values. For rows containing missing values we get NAs as our result. This because, rowMeans() function has argument na.rm=FALSE by default.

rowMeans(data_mat)

##  [1] 3.2 3.2 3.4 2.8  NA  NA 3.0  NA  NA  NA

How to calculate Mean of rows of a dataframe with missing data (NAs)

Let us convert our matrix with missing values to a dataframe using as.data.frame() function as before.

data_df<- as.data.frame(data_mat)
head(data_df)
##   V1 V2 V3 V4 V5
## 1  5  4  4  2  1
## 2  2  3  2  5  4
## 3  3  1  3  5  5
## 4  1  1  5  3  4
## 5  4  3 NA  3  1
## 6  2 NA  4  3  1

As before, we will get NAs when rows has NAs when we use rowMeans() function.

rowMeans(data_df)

##  [1] 3.2 3.2 3.4 2.8  NA  NA 3.0  NA  NA  NA

By specifying na.rm=TRUE as argument to rowMeans() function we get means of all rows ignoring NA if there is any.

rowMeans(data_df, na.rm=TRUE)

##  [1] 3.200000 3.200000 3.400000 2.800000 2.750000 2.500000 3.000000 2.000000
##  [9] 3.333333 2.750000
Exit mobile version