In this tutorial, we will learn how to extract residual values from a linear regression model in R. Residuals are values that is remaining after adjusting or subtracting effects of variable in the model. We will see two approaches to pull residuals from linear regression model result we get after using lm() function. First we will learn how. to extract residuals directly from the linear regression fit object using residuals method.
And then we will use the R package broom’s augment() function to extract residuals from the regression model.
First let us load the packages needed.
library(tidyverse) library(broom)
We will using the. classic iris data that is built in with R for building a simple linear regression model.
iris %>% head() ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa
With lm() function, here we build a simple linear regression model between two numerical variables from iris dataset. Note that since we provide iris data set as data argument, we can refer the variables in the linear model without any quotes.
lm_fit <- lm(Sepal.Length ~ Petal.Length, data=iris)
Applying summary() function on the linear regression result object gives a number of useful information including a quick summary of the residuals.
summary(lm_fit) ## ## Call: ## lm(formula = Sepal.Length ~ Petal.Length, data = iris) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.24675 -0.29657 -0.01515 0.27676 1.00269 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.30660 0.07839 54.94 <2e-16 *** ## Petal.Length 0.40892 0.01889 21.65 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4071 on 148 degrees of freedom ## Multiple R-squared: 0.76, Adjusted R-squared: 0.7583 ## F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
However, this still does not help us getting the residual values. Let us use str() function on the fit object to see its contents. We can find that residuals are available as “residuals”
str(lm_fit) ## List of 12 ## $ coefficients : Named num [1:2] 4.307 0.409 ## ..- attr(*, "names")= chr [1:2] "(Intercept)" "Petal.Length" ## $ residuals : Named num [1:150] 0.2209 0.0209 -0.1382 -0.32 0.1209 ... ## ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ... ## $ effects : Named num [1:150] -71.566 8.812 -0.155 -0.337 0.104 ... ## ..- attr(*, "names")= chr [1:150] "(Intercept)" "Petal.Length" "" "" ... ## $ rank : int 2 ## $ fitted.values: Named num [1:150] 4.88 4.88 4.84 4.92 4.88 ... ## ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ... ## $ assign : int [1:2] 0 1
Now we can directly extract the residuals using $ symbol on the linear regression fit as shown below.
lm_fit$residuals ## 1 2 3 4 5 6 ## 0.22090540 0.02090540 -0.13820238 -0.31998683 0.12090540 0.39822871 ## 7 8 9 10 11 12 ## -0.27909460 0.08001317 -0.47909460 -0.01998683 0.48001317 -0.16087906 ## 13 14 15 16 17 18 ## -0.07909460 -0.45641792 1.00268985 0.78001317 0.56179762 0.22090540 ## 19 20 21 22 23 24 ## 0.69822871 0.18001317 0.39822871 0.18001317 -0.11552569 0.09822871 ## 25 26 27 28 29 30 ## -0.28355574 0.03912094 0.03912094 0.28001317 0.32090540 -0.26087906 ## 31 32 33 34 35 36 ## ... ## ... ## ...
Get residuals of linear regression model using broom’s augment() function
Another way to get residuals from a linear regression fit object is to use augment() function in broom package. By default, augment() function adds the data used to build the regression model and the key results including. residuals from the linear regression fit as a tibble.
We can see that in. the result .applying glance() function, residuals is available as .resid column in the dataframe
broom::augment(lm_fit) ## # A tibble: 150 × 8 ## Sepal.Length Petal.Length .fitted .resid .hat .sigma .cooksd .std.resid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 5.1 1.4 4.88 0.221 0.0186 0.408 0.00285 0.548 ## 2 4.9 1.4 4.88 0.0209 0.0186 0.408 0.0000255 0.0518 ## 3 4.7 1.3 4.84 -0.138 0.0197 0.408 0.00118 -0.343 ## 4 4.6 1.5 4.92 -0.320 0.0176 0.408 0.00565 -0.793 ## 5 5 1.4 4.88 0.121 0.0186 0.408 0.000854 0.300 ## 6 5.4 1.7 5.00 0.398 0.0158 0.407 0.00780 0.986 ## 7 4.6 1.4 4.88 -0.279 0.0186 0.408 0.00455 -0.692 ## 8 5 1.5 4.92 0.0800 0.0176 0.408 0.000353 0.198 ## 9 4.4 1.4 4.88 -0.479 0.0186 0.407 0.0134 -1.19 ## 10 4.9 1.5 4.92 -0.0200 0.0176 0.408 0.0000220 -0.0495 ## # … with 140 more rows
And we can extract the residuals using pull() function as shown below.
broom::augment(lm_fit) pull(.resid) ## [1] 0.22090540 0.02090540 -0.13820238 -0.31998683 0.12090540 0.39822871 ## [7] -0.27909460 0.08001317 -0.47909460 -0.01998683 0.48001317 -0.16087906 ## [13] -0.07909460 -0.45641792 1.00268985 0.78001317 0.56179762 0.22090540 ## [19] 0.69822871 0.18001317 0.39822871 0.18001317 -0.11552569 0.09822871 ## [25] -0.28355574 0.03912094 0.03912094 0.28001317 0.32090540 -0.26087906 ## [31] -0.16087906 0.48001317 0.28001317 0.62090540 -0.01998683 0.20268985 ## [37] 0.66179762 0.02090540 -0.43820238 0.18001317 0.16179762 -0.33820238 ## [43] -0.43820238 0.03912094 0.01644426 -0.07909460 0.13912094 -0.27909460 ## [49] 0.38001317 0.12090540 0.77146188 0.25324634 0.58967743 -0.44229252 ## [55] 0.31235411 -0.44675366 0.07146188 -0.75604693 0.41235411 -0.70140030 ## ## ##