How to Select Rows of a dataframe by position

In this post, we will see how to subset a dataframe or select rows based on their position. We will first use dplyr’s slice() function to slice a dataframe by location and then show how to select rows by position using base R

Let us load tidyverse suite of R packages including dplyr.

library(tidyvrerse)
packageVersion("dplyr")

## [1] '1.0.9'

We will use the classic iris dataset to subset the rows by location. Iris dataset is built in with R and we can readily access by using the name “iris”.

iris %>%
  head()

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

dplyr slice(): select rows by position/location/integer index

With slice() function in tidyverse’s dplyr package we can select rows by location. slice() function is one of the family of slice functions available in dplyr to select rows in different use case scenarios.

To select rows from 10th position to 20th position, we can use slice() as

iris %>% 
  slice(10:20)

##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           4.9         3.1          1.5         0.1  setosa
## 2           5.4         3.7          1.5         0.2  setosa
## 3           4.8         3.4          1.6         0.2  setosa
## 4           4.8         3.0          1.4         0.1  setosa
## 5           4.3         3.0          1.1         0.1  setosa
## 6           5.8         4.0          1.2         0.2  setosa
## 7           5.7         4.4          1.5         0.4  setosa
## 8           5.4         3.9          1.3         0.4  setosa
## 9           5.1         3.5          1.4         0.3  setosa
## 10          5.7         3.8          1.7         0.3  setosa
## 11          5.1         3.8          1.5         0.3  setosa

Within slice we can also use n(), which is a context dependent expression that gives size or number of rows of the dataframe here. In the example below we are using slice() as equivalent to tail() function and getting the last 5 rows of a dataframe.

iris %>% 
  slice(145:n())

##   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 1          6.7         3.3          5.7         2.5 virginica
## 2          6.7         3.0          5.2         2.3 virginica
## 3          6.3         2.5          5.0         1.9 virginica
## 4          6.5         3.0          5.2         2.0 virginica
## 5          6.2         3.4          5.4         2.3 virginica
## 6          5.9         3.0          5.1         1.8 virginica

Select rows by position with base R

In base R, we can use subset operation, square bracket and specify indices of interest to select rows by location. For example, if we want to select 10th row to 20th row, we can use [10:20,]


iris[10:20, ]
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 10          4.9         3.1          1.5         0.1  setosa
## 11          5.4         3.7          1.5         0.2  setosa
## 12          4.8         3.4          1.6         0.2  setosa
## 13          4.8         3.0          1.4         0.1  setosa
## 14          4.3         3.0          1.1         0.1  setosa
## 15          5.8         4.0          1.2         0.2  setosa
## 16          5.7         4.4          1.5         0.4  setosa
## 17          5.4         3.9          1.3         0.4  setosa
## 18          5.1         3.5          1.4         0.3  setosa
## 19          5.7         3.8          1.7         0.3  setosa
## 20          5.1         3.8          1.5         0.3  setosa

Here is the base R equivalent of subsetting last 5 rows from a dataframe. Here we use nrow() function to get the number of rows in the dataframe.

iris[145:nrow(iris), ]

##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica

One of the big differences in output from base R subsetting approach with that of dplyr’s slice() function is that dplyr’s slice output is always a dataframe no matter how many columns are in the dataframe.

Exit mobile version