• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

Logistic Regression with Single Predictor in R

rstats101 · September 18, 2023 ·

In this post, we will learn how to perform a simple logistic regression using Generalized Linear Models (glm) in R. We will work with logistic regression model between a binary categorical variable as response variable and a single numerical predictor.

Data for logistic regression

Let us load the packages needed.

library(tidyverse)
library(palmerpenguins)
theme_set(theme_bw(16))

We will use two variables from Palmer penguins dataset. To simplify let us use dataset from one of the Penguin species and select just two variables, sex and body mass. Here, sex is a categorial variable and body mass is numerical variable.

df <- penguins %>%
  drop_na() %>%
  filter(species=="Gentoo")   %>%
  select(sex, body_mass_g)
df %>% head()

# A tibble: 6 × 2
  sex    body_mass_g
  <fct>        <int>
1 female        4500
2 male          5700
3 female        4450
4 male          5700
5 male          5400
6 female        4550

Variables for logistic regression

In this logistical regression model, we are interested in classifying the sex of penguins using body mass. Let us make a simple boxplot to see how well sex variable is associated with body mass of penguins.

df   %>%
  ggplot(aes(x=sex, y= body_mass_g, color=sex))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  theme(legend.position = "none")+
  scale_color_brewer(palette="Set1")
ggsave("penguins_sex_vs_body_mass_boxplot.png")
Logistic regression with single predictor
Logistic regression to classify Penguin sex with one predictor

Building Logistic Regression with glm()

To build logistic regression model we will use glm() function from stats package that has implementations of GLMs. glm() function takes three key arguments,

  • formula, where we tell glm() the variables and how we want to model
  • family name, here we specify the exponential family that we want to use, “biomial” for logistic regression
  • data: a dataframe with all the variables specified in the formula
# single predictor
glm_res_sp <- df %>%
  glm(formula = sex ~ body_mass_g,
      family = binomial)

By printing the resulting object, we can see the basic information regarding the model we built. We get the formula we used and the coeffients from the logistic regression model.

glm_res_sp

Call:  glm(formula = sex ~ body_mass_g, family = binomial, data = .)

Coefficients:
(Intercept)  body_mass_g  
  -55.03337      0.01089  

Degrees of Freedom: 118 Total (i.e. Null);  117 Residual
Null Deviance:      164.9 
Residual Deviance: 45.1     AIC: 49.1

Results from Logistic Regression with single predictor uslng glm()

To fully investigate the results, we need to use summary() function on the result object. And this gives us the estimated coefficients and the p-value of the association between the two variables. We can see that

glm_res_sp %>% summary()
Call:
glm(formula = sex ~ body_mass_g, family = binomial, data = .)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -55.033367  11.651375  -4.723 2.32e-06 ***
body_mass_g   0.010888   0.002311   4.711 2.47e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 164.893  on 118  degrees of freedom
Residual deviance:  45.105  on 117  degrees of freedom
AIC: 49.105

Number of Fisher Scoring iterations: 7

Cleaning Logistic Regression results with broom

The results from statistical models like glm() are often hard to read. So we will use broom package’s tidy() function to convert the resulting statistical object to a tidy dataframe.

glm_res_sp %>%
   tidy()

# A tibble: 2 × 5
  term        estimate std.error statistic    p.value
  <chr>          <dbl>     <dbl>     <dbl>      <dbl>
1 (Intercept) -55.0     11.7         -4.72 0.00000232
2 body_mass_g   0.0109   0.00231      4.71 0.00000247

We can also visualize the logistic regression model by plotting the two variables and a logistic curve. Note here we use ggplot2’s geom_smooth() function to make the plot.

df %>%
  mutate(sex=as.numeric(sex)-1) %>%
  ggplot(aes(x=body_mass_g, y=sex)) + 
  geom_point() +
  geom_smooth(method="glm",
              se=TRUE,
              method.args = list(family=binomial))
ggsave("plotting_logistic_regression_model_with_glm.png")

Plotting logistic regression model with geom_smooth()
Plotting logistic regression model with geom_smooth()

Related

Filed Under: glm, logistic regression, rstats101 Tagged With: logistic regression in R, logistic regression with single predictor

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version