• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to apply a function on multiple columns using across()

rstats101 · September 9, 2023 ·

In this post, we will learn how to compute one or multiple functions on multiple columns using dplyr’s across() function. dplyr’s across() function can be used with summarize() or mutate() functions to operate on columns. In this example we will use summarize() function to compute mean values of multiple columns at the same time using across() function.

library(tidyverse)
packageVersion("dplyr")
[1] '1.1.2'

We will use the iris dataset that is built-in with R.

iris %>% head()

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

One of the naive ways to compute some transformation of multiple columns is to use each column separately as shown below. In this example, we compute mean values of three columns using three separate calls for each column.

iris %>%
  summarize(sepal_length_m = mean(Sepal.Length),
            sepal_width_m = mean(Sepal.Width),
            petal_length_m = mean(Petal.Length))

  sepal_length_m sepal_width_m petal_length_m
1       5.843333      3.057333          3.758

Although this is useful for computing summary function with few columns, it gets cumbersome when we want to do multiple columns or multiple functions.

dplyr’s across() function is made to make computing across columns easier. With across(), We can apply one or more functions on the columns of interest. For example we can compute mean values of the three columns as in the previous example as

iris %>%
  summarize(across(Sepal.Length:Petal.Length, mean))

 Sepal.Length Sepal.Width Petal.Length
1     5.843333    3.057333        3.758

In the above example, we provided two arguments to across() function, first argument is the columns of interest and the function we want to use to transform.

Here is a more formal way to use the across() function to compute means of multiple columns. Here we specify the names of the arguments, .cols for specifying the columns we want to compute and .fns to define the function that we want to compute.

iris %>%
  summarize(across(.cols=Sepal.Length:Petal.Length,
                   .fns = ~ mean(.x, na.rm=TRUE)))

  Sepal.Length Sepal.Width Petal.Length
1     5.843333    3.057333        3.758

We can also write custom functions to apply on each column. In the example given below we compute the sum of squared deviation from mean using lambda function notation.

iris %>%
  summarize(across(.cols= Sepal.Length:Petal.Length, 
                   .fns = ~sum((.x-mean(.x))^2)))

  Sepal.Length Sepal.Width Petal.Length
1     102.1683    28.30693     464.3254

If we wanted to compute the function per each group of another categorical variable we will use group_by() function first on the variable and then apply across() function.

Here we compute mean values for all the columns in the data for each species.

iris %>%
  group_by(Species) %>%
  summarize(across(Sepal.Length:Petal.Length, mean))

# A tibble: 3 × 4
  Species    Sepal.Length Sepal.Width Petal.Length
  <fct>             <dbl>       <dbl>        <dbl>
1 setosa             5.01        3.43         1.46
2 versicolor         5.94        2.77         4.26
3 virginica          6.59        2.97         5.55

Related

Filed Under: dplyr, dplyr across(), rstats101 Tagged With: apply a function on multiple columns, dplyr across() with summarize

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version