How to extract residuals from a linear regression model

In this tutorial, we will learn how to extract residual values from a linear regression model in R. Residuals are values that is remaining after adjusting or subtracting effects of variable in the model. We will see two approaches to pull residuals from linear regression model result we get after using lm() function. First we will learn how. to extract residuals directly from the linear regression fit object using residuals method.
And then we will use the R package broom’s augment() function to extract residuals from the regression model.

First let us load the packages needed.

library(tidyverse)
library(broom)

We will using the. classic iris data that is built in with R for building a simple linear regression model.

iris %>% head()
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

With lm() function, here we build a simple linear regression model between two numerical variables from iris dataset. Note that since we provide iris data set as data argument, we can refer the variables in the linear model without any quotes.

lm_fit <- lm(Sepal.Length ~ Petal.Length, data=iris)

Applying summary() function on the linear regression result object gives a number of useful information including a quick summary of the residuals.

summary(lm_fit)

## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.24675 -0.29657 -0.01515  0.27676  1.00269 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.30660    0.07839   54.94   <2e-16 ***
## Petal.Length  0.40892    0.01889   21.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4071 on 148 degrees of freedom
## Multiple R-squared:   0.76,  Adjusted R-squared:  0.7583 
## F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16

However, this still does not help us getting the residual values. Let us use str() function on the fit object to see its contents. We can find that residuals are available as “residuals”

str(lm_fit)

## List of 12
##  $ coefficients : Named num [1:2] 4.307 0.409
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "Petal.Length"
##  $ residuals    : Named num [1:150] 0.2209 0.0209 -0.1382 -0.32 0.1209 ...
##   ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:150] -71.566 8.812 -0.155 -0.337 0.104 ...
##   ..- attr(*, "names")= chr [1:150] "(Intercept)" "Petal.Length" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:150] 4.88 4.88 4.84 4.92 4.88 ...
##   ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
##  $ assign       : int [1:2] 0 1

Now we can directly extract the residuals using $ symbol on the linear regression fit as shown below.

lm_fit$residuals

##           1           2           3           4           5           6 
##  0.22090540  0.02090540 -0.13820238 -0.31998683  0.12090540  0.39822871 
##           7           8           9          10          11          12 
## -0.27909460  0.08001317 -0.47909460 -0.01998683  0.48001317 -0.16087906 
##          13          14          15          16          17          18 
## -0.07909460 -0.45641792  1.00268985  0.78001317  0.56179762  0.22090540 
##          19          20          21          22          23          24 
##  0.69822871  0.18001317  0.39822871  0.18001317 -0.11552569  0.09822871 
##          25          26          27          28          29          30 
## -0.28355574  0.03912094  0.03912094  0.28001317  0.32090540 -0.26087906 
##          31          32          33          34          35          36 
## ... 
## ...
## ...

Get residuals of linear regression model using broom’s augment() function

Another way to get residuals from a linear regression fit object is to use augment() function in broom package. By default, augment() function adds the data used to build the regression model and the key results including. residuals from the linear regression fit as a tibble.

We can see that in. the result .applying glance() function, residuals is available as .resid column in the dataframe

broom::augment(lm_fit) 

## # A tibble: 150 × 8
##    Sepal.Length Petal.Length .fitted  .resid   .hat .sigma   .cooksd .std.resid
##           <dbl>        <dbl>   <dbl>   <dbl>  <dbl>  <dbl>     <dbl>      <dbl>
##  1          5.1          1.4    4.88  0.221  0.0186  0.408 0.00285       0.548 
##  2          4.9          1.4    4.88  0.0209 0.0186  0.408 0.0000255     0.0518
##  3          4.7          1.3    4.84 -0.138  0.0197  0.408 0.00118      -0.343 
##  4          4.6          1.5    4.92 -0.320  0.0176  0.408 0.00565      -0.793 
##  5          5            1.4    4.88  0.121  0.0186  0.408 0.000854      0.300 
##  6          5.4          1.7    5.00  0.398  0.0158  0.407 0.00780       0.986 
##  7          4.6          1.4    4.88 -0.279  0.0186  0.408 0.00455      -0.692 
##  8          5            1.5    4.92  0.0800 0.0176  0.408 0.000353      0.198 
##  9          4.4          1.4    4.88 -0.479  0.0186  0.407 0.0134       -1.19  
## 10          4.9          1.5    4.92 -0.0200 0.0176  0.408 0.0000220    -0.0495
## # … with 140 more rows

And we can extract the residuals using pull() function as shown below.

broom::augment(lm_fit) 
 pull(.resid)
##   [1]  0.22090540  0.02090540 -0.13820238 -0.31998683  0.12090540  0.39822871
##   [7] -0.27909460  0.08001317 -0.47909460 -0.01998683  0.48001317 -0.16087906
##  [13] -0.07909460 -0.45641792  1.00268985  0.78001317  0.56179762  0.22090540
##  [19]  0.69822871  0.18001317  0.39822871  0.18001317 -0.11552569  0.09822871
##  [25] -0.28355574  0.03912094  0.03912094  0.28001317  0.32090540 -0.26087906
##  [31] -0.16087906  0.48001317  0.28001317  0.62090540 -0.01998683  0.20268985
##  [37]  0.66179762  0.02090540 -0.43820238  0.18001317  0.16179762 -0.33820238
##  [43] -0.43820238  0.03912094  0.01644426 -0.07909460  0.13912094 -0.27909460
##  [49]  0.38001317  0.12090540  0.77146188  0.25324634  0.58967743 -0.44229252
##  [55]  0.31235411 -0.44675366  0.07146188 -0.75604693  0.41235411 -0.70140030
## 
## 
## 
Exit mobile version