sample() function in R for random sampling

Often while working with some dataset, we would like to randomly select samples. sample() function in base R is one of the most useful functions to get “Random Samples and Permutations” in numerous settings. In this post we will learn how to use sample() function in multiple ways with examples.

sample() function to randomize ordered vector

We can use sample() function to randomize an ordered vector. For example, if we have a vector with elements from 1 to 10, we can use sample() function to randomize the order of the elements in the original vector.

sample() to randomize order
# sample(1:10)
##  [1] 10  9  5  2  7  3  8  6  4  1

sample() function with random seed to reproduce results

Often we might want to randomize a vector in a reproducible way. We can reproduce the randomized vector by setting seed using “set.seed” function for sampling/randomizing with sample() function.

For example, here we use a random seed 12 and then use sample() function to randomize the order of elements in the vector.

set.seed(12)
sample(1:10)
##  [1]  2  7  3  6  5  9  4 10  8  1

By using the random seed, we can see that we can reproduce the randomized vector.

set.seed(12)
sample(1:10)
##  [1]  2  7  3  6  5  9  4 10  8  1

sample() function to randomly select n rows in a dataframe

Let us see example of using sample function to randomly select n rows with sample().

First, we look at the original dataframe from palmer penguins dataset.

# sample to randomly select n rows in a dataframe
palmerpenguins::penguins 
## # A tibble: 344 x 8
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <fct>   <fct>           <dbl>         <dbl>            <int>       <int>
##  1 Adelie  Torge…           39.1          18.7              181        3750
##  2 Adelie  Torge…           39.5          17.4              186        3800
##  3 Adelie  Torge…           40.3          18                195        3250
##  4 Adelie  Torge…           NA            NA                 NA          NA
##  5 Adelie  Torge…           36.7          19.3              193        3450
##  6 Adelie  Torge…           39.3          20.6              190        3650
##  7 Adelie  Torge…           38.9          17.8              181        3625
##  8 Adelie  Torge…           39.2          19.6              195        4675
##  9 Adelie  Torge…           34.1          18.1              193        3475
## 10 Adelie  Torge…           42            20.2              190        4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>

Let us use sample function to shuffle/randomize the row numbers and then select n=10 rows.

# select n random rows from penguin dataset
sample_ind <- sample(1:nrow(palmerpenguins::penguins))[1:10]

Now we have the random indices corresponding to rows, now we can select the rows using subetting.

palmerpenguins::penguins[sample_ind, ]

## # A tibble: 10 x 8
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <fct>   <fct>           <dbl>         <dbl>            <int>       <int>
##  1 Adelie  Torge…           42.9          17.6              196        4700
##  2 Adelie  Torge…           36.2          17.2              187        3150
##  3 Chinst… Dream            58            17.8              181        3700
##  4 Adelie  Biscoe           38.2          20                190        3900
##  5 Adelie  Biscoe           39.7          18.9              184        3550
##  6 Gentoo  Biscoe           49.8          16.8              230        5700
##  7 Gentoo  Biscoe           44.5          15.7              217        4875
##  8 Adelie  Dream            41.5          18.5              201        4000
##  9 Chinst… Dream            47.6          18.3              195        3850
## 10 Adelie  Torge…           39.7          18.4              190        3900
## # … with 2 more variables: sex <fct>, year <int>

sample() function to get bootstrapped samples with replacement

In the previous examples, we used sample() function to randomize the order of a vector or rows. Often you might want to randomly sample vectors or rows with replacements. This is idea of sampling with replacements is known as bootstrapping or bootstrapped samples.

We can sample with replacement using the argument “replace=TRUE” to sample() function. In the example below, we sample 10 numbers with replacement from vector containing 1 to 5. We can see that the sampled 10 numbers have repetition, because we sampled with replacements.

# sample n numbers with replacement 
sample(5,10,replace=TRUE)

##  [1] 3 2 4 3 4 4 4 2 5 2

Here is another example of sampling with replacement. Here we sample with replacements the same number of elements in the original data.

# sample a vector with replacement
sample(1:10,replace=TRUE)
##  [1]  3  3  1  7 10  9  6  2  2  3

sample.int to generate random numbers

sample.int() function is another function in R that is related to sample() function. Here we use it generate big random integers with replacements.

sample.int(1e10, 12, replace = TRUE)
##  [1] 9860540436  264671267  543751261 8824381746 6534781564 5141180310
##  [7] 6874446014 6740694711 8162659195  498536657 6331688458  554735489

Coin toss with sample()

So far we seen multiple examples of using sample() function to create random numbers or integers or selecting random rows from a dataframe. However, sample function is useful in other ways as well.

In this example below, we show how can we use sample() function to simulate coin toss experiment.

First let us simulate a single coin toss with “H” for head and “T” for Tail. Here we see Tail from a single toss of a fair coin.
sample() function

# single fair coin toss
sample(c("H","T"), size=1)

## [1] "T"

repeat coin toss n times with sample()

We can also simulate tossing a fair coin 10 times and seeing multiple Heads and Tails.

# sample with replacement
sample(c("H","T"), 10, replace = TRUE)

##  [1] "H" "T" "H" "H" "H" "H" "H" "H" "H" "H"

Here is another example of tossing a fair coin 10 times.

sample(c("H","T"),10, replace = TRUE)

##  [1] "H" "H" "T" "T" "H" "H" "T" "H" "T" "H"

coin toss with biased coin

sample() function also has the argument “prob” that lets you assign probabilities for sampling. Here we use prob to specify the head/tail probability to create a biased coin.

In this example, we simulate a single toss using a biased coin with probability for seeing a head is 0.8

sample(c("H","T"), size=1, prob=c(0.8,0.2))

## [1] "H"

Now we use the same biased coin to toss 10 times. And we can see that results contain predominantly heads.

sample(c("H","T"),10, prob=c(0.8 ,0.2), replace = TRUE)
##  [1] "H" "H" "T" "H" "H" "T" "H" "H" "H" "H"

By changing the probability values, now we have created a biased coin, but this time biased towards tail with probability 0.8.

sample(c("H","T"),10, prob=c(0.2,0.8), replace = TRUE)
##  [1] "H" "T" "T" "T" "T" "T" "T" "T" "T" "T"

sum(sample(c("H","T"),
           1000, prob=c(0.2,0.8), replace = TRUE) =="H")
## [1] 212