In this tutorial, we will learn how to get rows with maximum values of a column or variable from a dataframe. For example, from a dataframe with multiple rows and columns we will find a row (or multiple rows) with maximum values for a column.
We will use dplyr’s slice_max() function to select rows with maximum values for a column. We will also use slice_max() function in dplyr to find the top n rows with maximum values for a variable.
Let us load tidyverse packages and palmer penguins dataset package.
library(tidyverse) library(palmerpenguins)
We will select a few columns of palmer penguins dataset to easily see how slice_max() works. We will also add row number to understand slice_max().
penguins <- penguins %>% drop_na() %>% select(species, sex, body_mass_g) %>% mutate(row_id = row_number())
Our dataframe looks like this with four columns and over 300 rows.
penguins %>% head() # A tibble: 6 × 4 species sex body_mass_g row_id <fct> <fct> <int> <int> 1 Adelie male 3750 1 2 Adelie female 3800 2 3 Adelie female 3250 3 4 Adelie female 3450 4 5 Adelie male 3650 5 6 Adelie female 3625 6
dpyr’s slice_max(): To get the row with max value for a column
To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.
penguins %>% slice_max(body_mass_g, n =1) # A tibble: 1 × 4 species sex body_mass_g row_id <fct> <fct> <int> <int> 1 Gentoo male 6300 164
dpyr’s slice_max(): To get the top 2 rows with max values for a column
To find the row with the highest value of the column body_mass, we use slice_max() with the column name and n = 1 as arguments. And we get the row with highest body mass in our data. We can see that it is a male Gentoo from rownumber 164.
By changing the value n, we can get top n rows with highest values of the specified column. For example, when we use n = 2 with body mass column, we get the top two rows containing heaviest penguins. We can see that both are male Gentoos.
penguins %>% slice_max(body_mass_g, n = 2) # A tibble: 2 × 4 species sex body_mass_g row_id <fct> <fct> <int> <int> 1 Gentoo male 6300 164 2 Gentoo male 6050 179
dpyr’s slice_max(): To get the top n rows with max values for a column
Similarly, we can get the top 3 rows with highest values of a column, here body mass, with n = 3. Note that by default it does not break ties, therefore we get four rows with the 3rd and fourth row has the same body mass.
penguins %>% slice_max(body_mass_g, n = 3) # A tibble: 4 × 4 species sex body_mass_g row_id <fct> <fct> <int> <int> 1 Gentoo male 6300 164 2 Gentoo male 6050 179 3 Gentoo male 6000 222 4 Gentoo male 6000 260
dpyr’s slice_max(): To get the top n rows with no ties
We can break ties while use slice_max() with with_ties=FALSE as argumen.
penguins %>% slice_max(body_mass_g, n = 3, with_ties = FALSE) # A tibble: 3 × 4 species sex body_mass_g row_id <fct> <fct> <int> <int> 1 Gentoo male 6300 164 2 Gentoo male 6050 179 3 Gentoo male 6000 222
Check out how to use dplyr’s slice_min() function to get the bottom n rows for a specific column in a dataframe.