How to convert a matrix to a tidy table

In this tutorial, we will learn how to convert a data matrix to tidy long table. We will use tidyr’s pivot_longer() function to reshape the matrix into a tidy table.

First, let us start with loading the tidyverse suit of packages.

library(tidyverse)

Create a simulated data matrix

Let us create a data matrix using rnorm() function generating random numbers from normal distribution.

set.seed(42)
# Simulate data matrix with 10 rows and 6 columns
mat <- matrix(rnorm(60), nrow=10)
mat[1:5,1:5]

           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  1.3709584  1.3048697 -0.3066386  0.4554501  0.2059986
[2,] -0.5646982  2.2866454 -1.7813084  0.7048373 -0.3610573
[3,]  0.3631284 -1.3888607 -0.1719174  1.0351035  0.7581632
[4,]  0.6328626 -0.2787888  1.2146747 -0.6089264 -0.7267048
[5,]  0.4042683 -0.1333213  1.8951935  0.5049551 -1.3682810

Let us add column names to the matrix.

colnames(mat) <- paste0("C",1:ncol(mat))
mat[1:5,1:5]

             C1         C2         C3         C4         C5
[1,]  1.3709584  1.3048697 -0.3066386  0.4554501  0.2059986
[2,] -0.5646982  2.2866454 -1.7813084  0.7048373 -0.3610573
[3,]  0.3631284 -1.3888607 -0.1719174  1.0351035  0.7581632
[4,]  0.6328626 -0.2787888  1.2146747 -0.6089264 -0.7267048
[5,]  0.4042683 -0.1333213  1.8951935  0.5049551 -1.3682810

First, convert the matrix to dataframe using as_tibble() function.

as_tibble(mat)

# A tibble: 10 × 6
        C1     C2     C3      C4     C5      C6
     <dbl>  <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1  1.37    1.30  -0.307  0.455   0.206  0.322 
 2 -0.565   2.29  -1.78   0.705  -0.361 -0.784 
 3  0.363  -1.39  -0.172  1.04    0.758  1.58  
 4  0.633  -0.279  1.21  -0.609  -0.727  0.643 
 5  0.404  -0.133  1.90   0.505  -1.37   0.0898
 6 -0.106   0.636 -0.430 -1.72    0.433  0.277 
 7  1.51   -0.284 -0.257 -0.784  -0.811  0.679 
 8 -0.0947 -2.66  -1.76  -0.851   1.44   0.0898
 9  2.02   -2.44   0.460 -2.41   -0.431 -2.99  
10 -0.0627  1.32  -0.640  0.0361  0.656  0.285 

And then add a column containing row number as a factor variable. We use dplyr’s row_number() function.

as_tibble(mat) %>%
   mutate(row_id=factor(row_number())) 

# A tibble: 10 × 7
        C1     C2     C3      C4     C5      C6 row_id
     <dbl>  <dbl>  <dbl>   <dbl>  <dbl>   <dbl> <fct> 
 1  1.37    1.30  -0.307  0.455   0.206  0.322  1     
 2 -0.565   2.29  -1.78   0.705  -0.361 -0.784  2     
 3  0.363  -1.39  -0.172  1.04    0.758  1.58   3     
 4  0.633  -0.279  1.21  -0.609  -0.727  0.643  4     
 5  0.404  -0.133  1.90   0.505  -1.37   0.0898 5     
 6 -0.106   0.636 -0.430 -1.72    0.433  0.277  6     
 7  1.51   -0.284 -0.257 -0.784  -0.811  0.679  7     
 8 -0.0947 -2.66  -1.76  -0.851   1.44   0.0898 8     
 9  2.02   -2.44   0.460 -2.41   -0.431 -2.99   9     
10 -0.0627  1.32  -0.640  0.0361  0.656  0.285  10    

mutate() function creates a new variable at the end of the dataframe, as last column. Here we use relocate() function to move the row ID column to the front.

mat_df <- as_tibble(mat) %>%
  mutate(row_id=factor(row_number())) %>%
  relocate(row_id)

mat_df 

# A tibble: 10 × 7
   row_id      C1     C2     C3      C4     C5      C6
   <fct>    <dbl>  <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1 1       1.37    1.30  -0.307  0.455   0.206  0.322 
 2 2      -0.565   2.29  -1.78   0.705  -0.361 -0.784 
 3 3       0.363  -1.39  -0.172  1.04    0.758  1.58  
 4 4       0.633  -0.279  1.21  -0.609  -0.727  0.643 
 5 5       0.404  -0.133  1.90   0.505  -1.37   0.0898
 6 6      -0.106   0.636 -0.430 -1.72    0.433  0.277 
 7 7       1.51   -0.284 -0.257 -0.784  -0.811  0.679 
 8 8      -0.0947 -2.66  -1.76  -0.851   1.44   0.0898
 9 9       2.02   -2.44   0.460 -2.41   -0.431 -2.99  
10 10     -0.0627  1.32  -0.640  0.0361  0.656  0.285 

pivot_longer(): Reshape the data matrix to tidy dataframe

Now we have the data as a dataframe and ready to reshape using pivot_longer() function to a tidy long format. Here we specify where names should go to and values of the matrix should go to.

mat_df %>% 
  pivot_longer(-row_id,
               names_to="sample_id",
               values_to="vals") 

# A tibble: 60 × 3
   row_id sample_id   vals
   <fct>  <chr>      <dbl>
 1 1      C1         1.37 
 2 1      C2         1.30 
 3 1      C3        -0.307
 4 1      C4         0.455
 5 1      C5         0.206
 6 1      C6         0.322
 7 2      C1        -0.565
 8 2      C2         2.29 
 9 2      C3        -1.78 
10 2      C4         0.705
# … with 50 more rows

We have now converted to a data matrix into a long tidy dataframe.

Exit mobile version