• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

colSums in R – compute sum of all columns in a dataframe or matrix

rstats101 · January 12, 2022 ·

In this tutorial, we will learn about colSums() function in base R and use it to calculate sum of all columns in a matrix or a dataframe. We will see two examples to understand the use colSums() function. First, we will calculate sum of all columns in a matrix and dataframe with no missing values (NAs). Next, we will learn how to compute sum of all columns when the matrix/dataframe has missing values.

Create a matrix and dataframe from scratch

Let us create a matrix and dataframe from scratch using random numbers generated using sample() function. First we create a vector of numbers.

set.seed(42)
data <- sample(c(1:6), 50, replace = TRUE)
data
##  [1] 1 5 1 1 2 4 2 2 1 4 1 5 6 4 2 2 3 1 1 3 4 5 5 5 4 2 4 3 2 1 2 6 3 6 2 4 4 6
## [39] 2 5 4 5 4 2 2 3 1 5 2 2

And then we use matrix() function to create a matrix.

data_mat <- matrix(data, ncol=5)

Finally, we use as.data.frame() function to create a dataframe.

data_df<- as.data.frame(data_mat)

Sum of columns of a matrix

Let us compute the sum of all the columns using colSums() on the matrix. Our data matrix is complete with no missing data.

head(data_mat)

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    4    2    4
## [2,]    5    5    5    6    5
## [3,]    1    6    5    3    4
## [4,]    1    4    5    6    2
## [5,]    2    2    4    2    2
## [6,]    4    2    2    4    3

Applying colSums() on the matrix we get the sum of each column as a vector.

colSums(data_mat)
## [1] 23 28 35 40 30

Sum of columns of a dataframe

We can also use colSums() function to calculate sum of all columns in a dataframe. The dataframe should not have any non-numerical columns.

head(data_df)

##   V1 V2 V3 V4 V5
## 1  1  1  4  2  4
## 2  5  5  5  6  5
## 3  1  6  5  3  4
## 4  1  4  5  6  2
## 5  2  2  4  2  2
## 6  4  2  2  4  3

In our sample datafram all the columns are numerical. We get the sum of all columns in the dataframe.

colSums(data_df)

## V1 V2 V3 V4 V5 
## 23 28 35 40 30

How to calculate Sum of columns of a matrix with missing data (NAs)

First, let create a matrix and dataframe with missing values.

data <- sample(c(1:5, NA), 50, replace = TRUE)
data_mat <- matrix(data, ncol=5)
data_df<- as.data.frame(data_mat)

In this example, the data matrix has missing values (NAs) in all columns except the second column the first and fourth columns.

head(data_mat)

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   NA    2    4   NA    4
## [2,]   NA    5    1    2    2
## [3,]    2    1    3    2    2
## [4,]    4    1    3    1    3
## [5,]    3    4    5    2    5
## [6,]   NA    5    5    5    5

So when we apply colSums() on the data matrix, it computes the sum on the columns where there is no missing values. For columns containing missing values we get NAs. This because, colSums() function has argument na.rm=FALSE by default.

colSums(data_mat)

## [1] NA 30 NA NA NA

With na.rm=TRUE argument, colSums() function will calculate sum after ignoring the missing values.

colSums(data_mat, na.rm=TRUE)
## [1] 18 30 34 22 28

How to calculate Sum of columns of a dataframe with missing data (NAs)

head(data_df)

##   V1 V2 V3 V4 V5
## 1 NA  2  4 NA  4
## 2 NA  5  1  2  2
## 3  2  1  3  2  2
## 4  4  1  3  1  3
## 5  3  4  5  2  5
## 6 NA  5  5  5  5

When there is missing values, colSums() returns NAs for dataframes as well by default.

colSums(data_df)

## V1 V2 V3 V4 V5 
## NA 30 NA NA NA

We can use na.rm =TRUE argument to compute sum of all columns with missing values. And we would get sums ignoring the missing values in the dataframe columns.

colSums(data_df, na.rm=TRUE)
## V1 V2 V3 V4 V5 
## 18 30 34 22 28

Related

Filed Under: colSums() R Tagged With: Sum of Columns in R

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version