In this tutorial, we will learn how to create all possible combinations of two variables using tidyr’s expand_grid() function. For example, if we have two variables of interest and want to create a dataframe with all possible combinations of the values of the two variables, we can use expand_grid() function.
let us get started by loading tidyverse.
library(tidyverse)
Let us say we have two variables, each with 5 elements
var1 = letters[1:5] var1 ## [1] "a" "b" "c" "d" "e"
var2 = LETTERS[1:5] var2 ## [1] "A" "B" "C" "D" "E"
tidyr’s expand_grid() Example
And to create all possible combinations of these two variables, 25 combinations in total, we can use expand_grid() function in tidyr
combination_df <- expand_grid(var1= letters[1:5], var2 = LETTERS[1:5]) combination_df ## # A tibble: 25 × 2 ## var1 var2 ## <chr> <chr> ## 1 a A ## 2 a B ## 3 a C ## 4 a D ## 5 a E ## 6 b A ## 7 b B ## 8 b C ## 9 b D ## 10 b E ## # ℹ 15 more rows
base R’s expand.grid() Example
tidyr’s expand_grid() function is inspired by base R’s expand.grid() function and it can create a dataframe with all possible combinations of factor variables as given in the example above.
combination_df <- expand.grid(var1= letters[1:5], var2 = LETTERS[1:5]) combination_df ## var1 var2 ## 1 a A ## 2 b A ## 3 c A ## 4 d A ## 5 e A ## 6 a B ## 7 b B ## 8 c B ## 9 d B ## 10 e B
Combinations of a dataframe and a variable wiht expand_grid()
One of the advantages of using tidyr’s expand_grid() function is that it can with dataframes and matrices, not just factor variables. Here is an example of using expand_grid() function in tidyr to create all possible combinations of a dataframe and a character/factor variable.
Here is a simple dataframe that we would like to expand for multiple companies.
df = tibble(year=2021, quarter=paste0("Q",1:4)) df ## # A tibble: 4 × 2 ## year quarter ## <dbl> <chr> ## 1 2021 Q1 ## 2 2021 Q2 ## 3 2021 Q3 ## 4 2021 Q4
With tidyr’s expand_grid(), we can create the combinations of dataframe and the factor/character variable.
expand_grid(df, companies = c("GOOG", "MSFT", "NVDA")) ## # A tibble: 12 × 3 ## year quarter companies ## <dbl> <chr> <chr> ## 1 2021 Q1 GOOG ## 2 2021 Q1 MSFT ## 3 2021 Q1 NVDA ## 4 2021 Q2 GOOG ## 5 2021 Q2 MSFT ## 6 2021 Q2 NVDA ## 7 2021 Q3 GOOG ## 8 2021 Q3 MSFT ## 9 2021 Q3 NVDA ## 10 2021 Q4 GOOG ## 11 2021 Q4 MSFT ## 12 2021 Q4 NVDA