• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to Compute Pearson Correlation of Multiple Variables

rstats101 · August 31, 2024 ·

In this tutorial, we will learn how to compute Pearson correlation of multiple variables. We will use two approaches to compute Pearson correlation of multiple variables in a matrix or dataframe.

First we will show how to use cor() function in R to compute Pearson correlation of all variables against all variables. Then we will use matrix multiplication technique to compute Pearson correlation matrix for all variables.

library(tidyverse)
library(palmerpenguin)
theme_set(theme_bw(16)

We use numerical variables from palmer penguins data show how to compute Pearson Correlation.

df <- penguins |>
  drop_na() |>
  select(-year) |>
  select(where(is.numeric))
df |> head()

## # A tibble: 6 × 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <int>       <int>
## 1           39.1          18.7               181        3750
## 2           39.5          17.4               186        3800
## 3           40.3          18                 195        3250
## 4           36.7          19.3               193        3450
## 5           39.3          20.6               190        3650
## 6           38.9          17.8               181        3625

A key step in computing correlation is to center and scale the data, i.e. the variables. We can use scale() function to center and scale each column.

df_scaled <- df |>
  scale()
df_scaled |> head()
##      bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## [1,]     -0.8946955     0.7795590        -1.4246077  -0.5676206
## [2,]     -0.8215515     0.1194043        -1.0678666  -0.5055254
## [3,]     -0.6752636     0.4240910        -0.4257325  -1.1885721
## [4,]     -1.3335592     1.0842457        -0.5684290  -0.9401915
## [5,]     -0.8581235     1.7444004        -0.7824736  -0.6918109
## [6,]     -0.9312674     0.3225288        -1.4246077  -0.7228585

We can check that centered variables mean values close to zero.

colMeans(df_scaled)
##    bill_length_mm     bill_depth_mm flipper_length_mm       body_mass_g 
##     -3.235317e-15     -7.304801e-16      2.808064e-16     -1.260253e-16

Compute Pearson Correlation of a matrix with cor()

We can use cor() in R to compute Pearson correlation of all columns/variables in a data matrix.

cor(df_scaled)

##                   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## bill_length_mm         1.0000000    -0.2286256         0.6530956   0.5894511
## bill_depth_mm         -0.2286256     1.0000000        -0.5777917  -0.4720157
## flipper_length_mm      0.6530956    -0.5777917         1.0000000   0.8729789
## body_mass_g            0.5894511    -0.4720157         0.8729789   1.0000000

Compute Pearson Correlation of a matrix with matrix multiplication

We can compute the Pearson correlation of all variables agains all variables using matrix multiplication, by taking transpose of the matrix and multiplying with the original data matrix.

n <- nrow(df_scaled)

(t(df_scaled) %*% df_scaled) / (n - 1)

#                   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## bill_length_mm         1.0000000    -0.2286256         0.6530956   0.5894511
## bill_depth_mm         -0.2286256     1.0000000        -0.5777917  -0.4720157
## flipper_length_mm      0.6530956    -0.5777917         1.0000000   0.8729789
## body_mass_g            0.5894511    -0.4720157         0.8729789   1.0000000

Note we have identical results from the two approaches to compute Pearson correlation of all the variables in a dataframe/matrix.

Related

Filed Under: cor() in R, rstats101 Tagged With: Pearson correlation, Pearson correlation by matrix multiplication

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version