Often while working with some dataset, we would like to randomly select samples. sample() function in base R is one of the most useful functions to get “Random Samples and Permutations” in numerous settings. In this post we will learn how to use sample() function in multiple ways with examples.
sample() function to randomize ordered vector
We can use sample() function to randomize an ordered vector. For example, if we have a vector with elements from 1 to 10, we can use sample() function to randomize the order of the elements in the original vector.
sample() to randomize order # sample(1:10) ## [1] 10 9 5 2 7 3 8 6 4 1
sample() function with random seed to reproduce results
Often we might want to randomize a vector in a reproducible way. We can reproduce the randomized vector by setting seed using “set.seed” function for sampling/randomizing with sample() function.
For example, here we use a random seed 12 and then use sample() function to randomize the order of elements in the vector.
set.seed(12) sample(1:10) ## [1] 2 7 3 6 5 9 4 10 8 1
By using the random seed, we can see that we can reproduce the randomized vector.
set.seed(12) sample(1:10) ## [1] 2 7 3 6 5 9 4 10 8 1
sample() function to randomly select n rows in a dataframe
Let us see example of using sample function to randomly select n rows with sample().
First, we look at the original dataframe from palmer penguins dataset.
# sample to randomly select n rows in a dataframe palmerpenguins::penguins ## # A tibble: 344 x 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
Let us use sample function to shuffle/randomize the row numbers and then select n=10 rows.
# select n random rows from penguin dataset sample_ind <- sample(1:nrow(palmerpenguins::penguins))[1:10]
Now we have the random indices corresponding to rows, now we can select the rows using subetting.
palmerpenguins::penguins[sample_ind, ] ## # A tibble: 10 x 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torge… 42.9 17.6 196 4700 ## 2 Adelie Torge… 36.2 17.2 187 3150 ## 3 Chinst… Dream 58 17.8 181 3700 ## 4 Adelie Biscoe 38.2 20 190 3900 ## 5 Adelie Biscoe 39.7 18.9 184 3550 ## 6 Gentoo Biscoe 49.8 16.8 230 5700 ## 7 Gentoo Biscoe 44.5 15.7 217 4875 ## 8 Adelie Dream 41.5 18.5 201 4000 ## 9 Chinst… Dream 47.6 18.3 195 3850 ## 10 Adelie Torge… 39.7 18.4 190 3900 ## # … with 2 more variables: sex <fct>, year <int>
sample() function to get bootstrapped samples with replacement
In the previous examples, we used sample() function to randomize the order of a vector or rows. Often you might want to randomly sample vectors or rows with replacements. This is idea of sampling with replacements is known as bootstrapping or bootstrapped samples.
We can sample with replacement using the argument “replace=TRUE” to sample() function. In the example below, we sample 10 numbers with replacement from vector containing 1 to 5. We can see that the sampled 10 numbers have repetition, because we sampled with replacements.
# sample n numbers with replacement sample(5,10,replace=TRUE) ## [1] 3 2 4 3 4 4 4 2 5 2
Here is another example of sampling with replacement. Here we sample with replacements the same number of elements in the original data.
# sample a vector with replacement sample(1:10,replace=TRUE) ## [1] 3 3 1 7 10 9 6 2 2 3
sample.int to generate random numbers
sample.int() function is another function in R that is related to sample() function. Here we use it generate big random integers with replacements.
sample.int(1e10, 12, replace = TRUE) ## [1] 9860540436 264671267 543751261 8824381746 6534781564 5141180310 ## [7] 6874446014 6740694711 8162659195 498536657 6331688458 554735489
Coin toss with sample()
So far we seen multiple examples of using sample() function to create random numbers or integers or selecting random rows from a dataframe. However, sample function is useful in other ways as well.
In this example below, we show how can we use sample() function to simulate coin toss experiment.
First let us simulate a single coin toss with “H” for head and “T” for Tail. Here we see Tail from a single toss of a fair coin.
sample() function
# single fair coin toss sample(c("H","T"), size=1) ## [1] "T"
repeat coin toss n times with sample()
We can also simulate tossing a fair coin 10 times and seeing multiple Heads and Tails.
# sample with replacement sample(c("H","T"), 10, replace = TRUE) ## [1] "H" "T" "H" "H" "H" "H" "H" "H" "H" "H"
Here is another example of tossing a fair coin 10 times.
sample(c("H","T"),10, replace = TRUE) ## [1] "H" "H" "T" "T" "H" "H" "T" "H" "T" "H"
coin toss with biased coin
sample() function also has the argument “prob” that lets you assign probabilities for sampling. Here we use prob to specify the head/tail probability to create a biased coin.
In this example, we simulate a single toss using a biased coin with probability for seeing a head is 0.8
sample(c("H","T"), size=1, prob=c(0.8,0.2)) ## [1] "H"
Now we use the same biased coin to toss 10 times. And we can see that results contain predominantly heads.
sample(c("H","T"),10, prob=c(0.8 ,0.2), replace = TRUE) ## [1] "H" "H" "T" "H" "H" "T" "H" "H" "H" "H"
By changing the probability values, now we have created a biased coin, but this time biased towards tail with probability 0.8.
sample(c("H","T"),10, prob=c(0.2,0.8), replace = TRUE) ## [1] "H" "T" "T" "T" "T" "T" "T" "T" "T" "T"
sum(sample(c("H","T"), 1000, prob=c(0.2,0.8), replace = TRUE) =="H") ## [1] 212
[…] us create some data vector with duplicates. Here we use sample() function to get bootstrapped samples with […]