slice_max: get rows with highest values of a column

dplyr's slice_max(): Rows with highest values for a column
dplyr's slice_max(): Rows with highest values for a column

In this tutorial, we will learn how to get rows with maximum values of a column or variable from a dataframe. For example, from a dataframe with multiple rows and columns we will find a row (or multiple rows) with maximum values for a column.

We will use dplyr’s slice_max() function to select rows with maximum values for a column. We will also use slice_max() function in dplyr to find the top n rows with maximum values for a variable.

dplyr’s slice_max(): Rows with highest values for a column

Let us load tidyverse packages and palmer penguins dataset package.

library(tidyverse)
library(palmerpenguins)

We will select a few columns of palmer penguins dataset to easily see how slice_max() works. We will also add row number to understand slice_max().

penguins <- penguins %>%
  drop_na() %>%
  select(species, sex, body_mass_g) %>%
  mutate(row_id = row_number())

Our dataframe looks like this with four columns and over 300 rows.

penguins %>% head()

# A tibble: 6 × 4
  species sex    body_mass_g row_id
  <fct>   <fct>        <int>  <int>
1 Adelie  male          3750      1
2 Adelie  female        3800      2
3 Adelie  female        3250      3
4 Adelie  female        3450      4
5 Adelie  male          3650      5
6 Adelie  female        3625      6

dpyr’s slice_max(): To get the row with max value for a column

To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.

penguins %>%
  slice_max(body_mass_g, n =1)

# A tibble: 1 × 4
  species sex   body_mass_g row_id
  <fct>   <fct>       <int>  <int>
1 Gentoo  male         6300    164

dpyr’s slice_max(): To get the top 2 rows with max values for a column

To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.

By changing the value n, we can get top n rows with highest values of the specified column. For example, when we use n = 2 with body mass column, we get the top two rows containing heaviest penguins. We can see that both are male Gentoos.

penguins %>%
  slice_max(body_mass_g, n = 2)

# A tibble: 2 × 4
  species sex   body_mass_g row_id
  <fct>   <fct>       <int>  <int>
1 Gentoo  male         6300    164
2 Gentoo  male         6050    179

dpyr’s slice_max(): To get the top n rows with max values for a column

Similarly, we can get the top 3 rows with highest values of a column, here body mass, with n = 3. Note that by default it does not break ties, therefore we get four rows with the 3rd and fourth row has the same body mass.

penguins %>%
  slice_max(body_mass_g, n = 3)

# A tibble: 4 × 4
  species sex   body_mass_g row_id
  <fct>   <fct>       <int>  <int>
1 Gentoo  male         6300    164
2 Gentoo  male         6050    179
3 Gentoo  male         6000    222
4 Gentoo  male         6000    260

dpyr’s slice_max(): To get the top n rows with no ties

We can break ties while use slice_max() with with_ties=FALSE as argumen.

penguins %>%
  slice_max(body_mass_g, n = 3,
            with_ties = FALSE)

# A tibble: 3 × 4
  species sex   body_mass_g row_id
  <fct>   <fct>       <int>  <int>
1 Gentoo  male         6300    164
2 Gentoo  male         6050    179
3 Gentoo  male         6000    222

Check out how to use dplyr’s slice_min() function to get the bottom n rows for a specific column in a dataframe.

Exit mobile version