In this tutorial, we will learn how to convert a data matrix to tidy long table. We will use tidyr’s pivot_longer() function to reshape the matrix into a tidy table.
First, let us start with loading the tidyverse suit of packages.
library(tidyverse)
Create a simulated data matrix
Let us create a data matrix using rnorm() function generating random numbers from normal distribution.
set.seed(42) # Simulate data matrix with 10 rows and 6 columns mat <- matrix(rnorm(60), nrow=10) mat[1:5,1:5] [,1] [,2] [,3] [,4] [,5] [1,] 1.3709584 1.3048697 -0.3066386 0.4554501 0.2059986 [2,] -0.5646982 2.2866454 -1.7813084 0.7048373 -0.3610573 [3,] 0.3631284 -1.3888607 -0.1719174 1.0351035 0.7581632 [4,] 0.6328626 -0.2787888 1.2146747 -0.6089264 -0.7267048 [5,] 0.4042683 -0.1333213 1.8951935 0.5049551 -1.3682810
Let us add column names to the matrix.
colnames(mat) <- paste0("C",1:ncol(mat)) mat[1:5,1:5] C1 C2 C3 C4 C5 [1,] 1.3709584 1.3048697 -0.3066386 0.4554501 0.2059986 [2,] -0.5646982 2.2866454 -1.7813084 0.7048373 -0.3610573 [3,] 0.3631284 -1.3888607 -0.1719174 1.0351035 0.7581632 [4,] 0.6328626 -0.2787888 1.2146747 -0.6089264 -0.7267048 [5,] 0.4042683 -0.1333213 1.8951935 0.5049551 -1.3682810
First, convert the matrix to dataframe using as_tibble() function.
as_tibble(mat) # A tibble: 10 × 6 C1 C2 C3 C4 C5 C6 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1.37 1.30 -0.307 0.455 0.206 0.322 2 -0.565 2.29 -1.78 0.705 -0.361 -0.784 3 0.363 -1.39 -0.172 1.04 0.758 1.58 4 0.633 -0.279 1.21 -0.609 -0.727 0.643 5 0.404 -0.133 1.90 0.505 -1.37 0.0898 6 -0.106 0.636 -0.430 -1.72 0.433 0.277 7 1.51 -0.284 -0.257 -0.784 -0.811 0.679 8 -0.0947 -2.66 -1.76 -0.851 1.44 0.0898 9 2.02 -2.44 0.460 -2.41 -0.431 -2.99 10 -0.0627 1.32 -0.640 0.0361 0.656 0.285
And then add a column containing row number as a factor variable. We use dplyr’s row_number() function.
as_tibble(mat) %>% mutate(row_id=factor(row_number())) # A tibble: 10 × 7 C1 C2 C3 C4 C5 C6 row_id <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> 1 1.37 1.30 -0.307 0.455 0.206 0.322 1 2 -0.565 2.29 -1.78 0.705 -0.361 -0.784 2 3 0.363 -1.39 -0.172 1.04 0.758 1.58 3 4 0.633 -0.279 1.21 -0.609 -0.727 0.643 4 5 0.404 -0.133 1.90 0.505 -1.37 0.0898 5 6 -0.106 0.636 -0.430 -1.72 0.433 0.277 6 7 1.51 -0.284 -0.257 -0.784 -0.811 0.679 7 8 -0.0947 -2.66 -1.76 -0.851 1.44 0.0898 8 9 2.02 -2.44 0.460 -2.41 -0.431 -2.99 9 10 -0.0627 1.32 -0.640 0.0361 0.656 0.285 10
mutate() function creates a new variable at the end of the dataframe, as last column. Here we use relocate() function to move the row ID column to the front.
mat_df <- as_tibble(mat) %>% mutate(row_id=factor(row_number())) %>% relocate(row_id) mat_df # A tibble: 10 × 7 row_id C1 C2 C3 C4 C5 C6 <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 1.37 1.30 -0.307 0.455 0.206 0.322 2 2 -0.565 2.29 -1.78 0.705 -0.361 -0.784 3 3 0.363 -1.39 -0.172 1.04 0.758 1.58 4 4 0.633 -0.279 1.21 -0.609 -0.727 0.643 5 5 0.404 -0.133 1.90 0.505 -1.37 0.0898 6 6 -0.106 0.636 -0.430 -1.72 0.433 0.277 7 7 1.51 -0.284 -0.257 -0.784 -0.811 0.679 8 8 -0.0947 -2.66 -1.76 -0.851 1.44 0.0898 9 9 2.02 -2.44 0.460 -2.41 -0.431 -2.99 10 10 -0.0627 1.32 -0.640 0.0361 0.656 0.285
pivot_longer(): Reshape the data matrix to tidy dataframe
Now we have the data as a dataframe and ready to reshape using pivot_longer() function to a tidy long format. Here we specify where names should go to and values of the matrix should go to.
mat_df %>% pivot_longer(-row_id, names_to="sample_id", values_to="vals") # A tibble: 60 × 3 row_id sample_id vals <fct> <chr> <dbl> 1 1 C1 1.37 2 1 C2 1.30 3 1 C3 -0.307 4 1 C4 0.455 5 1 C5 0.206 6 1 C6 0.322 7 2 C1 -0.565 8 2 C2 2.29 9 2 C3 -1.78 10 2 C4 0.705 # … with 50 more rows
We have now converted to a data matrix into a long tidy dataframe.