• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Rstats 101

Learn R Programming Tips & Tricks for Statistics and Data Science

  • Home
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to select top and bottom rows by a column simultaneously

rstats101 · September 2, 2024 ·

In this tutorial, we will learn how to select top and bottom rows of a dataframe based on the values of a column with tidyverse in R. Here we will see a specific approach to get top and bottom rows when the values of the column of interest has both positive and negative values. This would not work if the values are not both positive and negative.

Let us load the packages needed.

library(tidyverse)
theme_set(theme_bw(16)

We will create a sample dataframe with a column containing both positive and negative values.

set.seed(2024)
df <- tibble(symbol=sample(letters,10),
             value = rnorm(10, mean = 5,sd=10))
df

# A tibble: 10 × 2
   symbol   value
   <chr>    <dbl>
 1 b       -0.376
 2 e       -7.16 
 3 m       15.4  
 4 q       -0.467
 5 x       14.0  
 6 i      -15.3  
 7 z       -0.525
 8 k       24.3  
 9 p        9.70 
10 n       23.3  

Our goal is to select the rows of the dataframe containing top n rows with positive values and bottom b nrows with negative values.

For example, if we are manually pick the both the top 3 and bottom 3 rows, that would give us the first 3 and last 3 rows after sorting as shown below.

df |> arrange(value)

# A tibble: 10 × 2
   symbol   value
   <chr>    <dbl>
 1 i      -15.3  
 2 e       -7.16 
 3 z       -0.525
 4 q       -0.467
 5 b       -0.376
 6 p        9.70 
 7 x       14.0  
 8 m       15.4  
 9 n       23.3  
10 k       24.3  

Let us see how we can use tidyverse to select both the top and bottom rows. First, let create another column specifying if the values are positive or negative. This helps us creating a grouping variable to use to select top and bottom rows.

df |>
  mutate(direction = ifelse(value>0,"+ve","-ve")) 

# A tibble: 10 × 3
   symbol   value direction
   <chr>    <dbl> <chr>    
 1 b       -0.376 -ve      
 2 e       -7.16  -ve      
 3 m       15.4   +ve      
 4 q       -0.467 -ve      
 5 x       14.0   +ve      
 6 i      -15.3   -ve      
 7 z       -0.525 -ve      
 8 k       24.3   +ve      
 9 p        9.70  +ve      
10 n       23.3   +ve     

Select Top and bottom rows simultaenously with slice_max() and rank()

We can group by the variable and then select the top and bottom rows by rank/magnitude using slice_max() with rank() function as shown below. Note the “-” sign before the rank function.

df |>
  mutate(direction = ifelse(value>0,"+ve","-ve")) |>
  group_by(direction) |>
  slice_max(n=3,-rank(value))

# A tibble: 6 × 3
# Groups:   direction [2]
  symbol   value direction
  <chr>    <dbl> <chr>    
1 p        9.70  +ve      
2 x       14.0   +ve      
3 m       15.4   +ve      
4 i      -15.3   -ve      
5 e       -7.16  -ve      
6 z       -0.525 -ve     

Without the “-” sign before the rank() function we will get incorrect results like shown below.

df |>
  mutate(direction = ifelse(value>0,"+ve","-ve")) |>
  group_by(direction) |>
  slice_max(n=3, rank(value))

# A tibble: 6 × 3
# Groups:   direction [2]
  symbol  value direction
  <chr>   <dbl> <chr>    
1 k      24.3   +ve      
2 n      23.3   +ve      
3 m      15.4   +ve      
4 b      -0.376 -ve      
5 q      -0.467 -ve      
6 z      -0.525 -ve   


Related

Filed Under: dplyr slice_max(), rank(), rstats101 Tagged With: select top and bottom rows

Primary Sidebar

Recent Posts

  • How to create a nested dataframe with lists
  • How to compute proportion with tidyverse
  • How to Compute Z-Score of Multiple Columns
  • How to drop unused level of factor variable in R
  • How to compute Z-score

Categories

%in% arrange() as.data.frame as_tibble built-in data R colSums() R cor() in R data.frame dplyr dplyr across() dplyr group_by() dplyr rename() dplyr rowwise() dplyr row_number() dplyr select() dplyr slice_max() dplyr slice_sample() drop_na R duplicated() gsub head() impute with mean values is.element() linear regression matrix() function na.omit R NAs in R near() R openxlsx pivot_longer() prod() R.version replace NA replace NAs tidyverse R Function rstats rstats101 R version scale() sessionInfo() t.test() tidyr tidyselect tidyverse write.xlsx

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version