• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

Linear Regression in R with lm() function – A Practical Tutorial

rstats101 · June 14, 2022 ·

In this tutorial, we will learn how to perform a simple linear regression in R using lm() function.

Simple Linear regression is one of the popular and common statistical methods that is used to understand the relationship between two numerical or quantitative variables, like height and weight of humans, age and height, years of education and salary, and so on. One can think of doing simple linear regression as trying answer the question are the two numerical variables of interest are associated/related.

Statistically, the act of doing linear regression analysis amounts to this, given a data set of the form (x1,y1), (x2,y2), (x3,y3),…, (xn,yn), we are trying to fit a linear model y = mx + c, where c is intercept, where the line meets y-axis and m is the of the slope of the straight line.

We need some data to start with fitting linear regression model. Let us simulate data for both x and y as follows.

set.seed(42)
y <- rnorm(50, mean=5, sd=2)
x <- y + rnorm(50, mean=1, sd=1)

When you have data set read as vectors like we have now, we can use lm() function to do the simple linear regression analysis by writing as lm(y ~ x)

lm_fit_1 <- lm(y ~ x)

The resulting object from lm() function in R is our linear fit to the data. By printing the fit variable, we get the two parameters , intercept and slope that we estimated from our data. It will also tell you the model that was fit. In this case our model is between two variables y and x with formula specified as lm(y ~ x).

lm_fit_1

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##     -0.6944       0.9326

Another useful function to understand the result of fitting a linear model is summary() function. When use summary() function on the fit object, it gives us detailed information about the results from linear regression.

First, it tells what was model that was used, lm(formula = y~x). Looking at the coefficients we can get the intercept and slope from the linear regression analysis. And it also gives you the p-value from testing the association between the two numerical variables.

# summary of the linear fit
summary(lm_fit_1)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32189 -0.61615 -0.05203  0.71950  3.16215 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.69442    0.37363  -1.859   0.0692 .  
## x            0.93262    0.05808  16.059   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9217 on 48 degrees of freedom
## Multiple R-squared:  0.8431, Adjusted R-squared:  0.8398 
## F-statistic: 257.9 on 1 and 48 DF,  p-value: < 2.2e-16

In the example data that we used to fit simple linear regression, the association between the two variables is very strong. And we can tell that by looking at the “adjusted R-squared” value and the p-value.

We can also get the slope and intercept of the model by using coef() function on the fit object.

#Calculate slope and intercept of line of best fit
coef(lm_fit_1)

## (Intercept)           x 
##  -0.6944232   0.9326167

Let us visualize our data with the results from the linear regression analysis. Using base R plotting function plot(), we can make a scatter plot between the two variables and add linear regression line on top of it using abline() function with linear regression fit object as argument.

plot(x, y)
abline(lm_fit_1)
base R scatterplot with linear fit line
base R scatterplot with linear fit line

ANother way to visualize the data and the linear regression results is to use ggplot2 from tidyverse. Here, we make a scatter plot first using geom_point() and then add the regression line using geom_smooth() function.

library(tidyverse)
tibble(x=x,y=y) |>
  ggplot(aes(x,y))+
  geom_point()+
  theme_bw(16)+
  geom_smooth(method = "lm", se = FALSE)
Scatterplot with linear fit line
Scatterplot with linear regression line

Related

Filed Under: lm() in R Tagged With: linear regression in R

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version