expand_grid(): Create all possible combinations of variables

In this tutorial, we will learn how to create all possible combinations of two variables using tidyr’s expand_grid() function. For example, if we have two variables of interest and want to create a dataframe with all possible combinations of the values of the two variables, we can use expand_grid() function.

let us get started by loading tidyverse.

library(tidyverse)

Let us say we have two variables, each with 5 elements

var1 = letters[1:5]
var1
## [1] "a" "b" "c" "d" "e"
var2 = LETTERS[1:5]
var2
## [1] "A" "B" "C" "D" "E"

tidyr’s expand_grid() Example

And to create all possible combinations of these two variables, 25 combinations in total, we can use expand_grid() function in tidyr

combination_df <- expand_grid(var1= letters[1:5],
                      var2 = LETTERS[1:5])
combination_df

## # A tibble: 25 × 2
##    var1  var2 
##    <chr> <chr>
##  1 a     A    
##  2 a     B    
##  3 a     C    
##  4 a     D    
##  5 a     E    
##  6 b     A    
##  7 b     B    
##  8 b     C    
##  9 b     D    
## 10 b     E    
## # ℹ 15 more rows

base R’s expand.grid() Example

tidyr’s expand_grid() function is inspired by base R’s expand.grid() function and it can create a dataframe with all possible combinations of factor variables as given in the example above.

combination_df <- expand.grid(var1= letters[1:5],
                      var2 = LETTERS[1:5])

combination_df

##    var1 var2
## 1     a    A
## 2     b    A
## 3     c    A
## 4     d    A
## 5     e    A
## 6     a    B
## 7     b    B
## 8     c    B
## 9     d    B
## 10    e    B

Combinations of a dataframe and a variable wiht expand_grid()

One of the advantages of using tidyr’s expand_grid() function is that it can with dataframes and matrices, not just factor variables. Here is an example of using expand_grid() function in tidyr to create all possible combinations of a dataframe and a character/factor variable.

Here is a simple dataframe that we would like to expand for multiple companies.

df = tibble(year=2021, quarter=paste0("Q",1:4))

df
## # A tibble: 4 × 2
##    year quarter
##   <dbl> <chr>  
## 1  2021 Q1     
## 2  2021 Q2     
## 3  2021 Q3     
## 4  2021 Q4

With tidyr’s expand_grid(), we can create the combinations of dataframe and the factor/character variable.

expand_grid(df, companies = c("GOOG", "MSFT", "NVDA"))

## # A tibble: 12 × 3
##     year quarter companies
##    <dbl> <chr>   <chr>    
##  1  2021 Q1      GOOG     
##  2  2021 Q1      MSFT     
##  3  2021 Q1      NVDA     
##  4  2021 Q2      GOOG     
##  5  2021 Q2      MSFT     
##  6  2021 Q2      NVDA     
##  7  2021 Q3      GOOG     
##  8  2021 Q3      MSFT     
##  9  2021 Q3      NVDA     
## 10  2021 Q4      GOOG     
## 11  2021 Q4      MSFT     
## 12  2021 Q4      NVDA
Exit mobile version