In this tutorial, we will see how to randomly replace values in a matrix to NAs, missing values.
We will first create some data matrix by simulation. Here we create a matrix with 20 rows and 5 columns.
data_mat <- matrix(round(rnorm(mean=5, sd=4, 100), 1), ncol=5) dim(data_mat) ## [1] 20 5
The data matrix is complete without any missing values.
data_mat ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1.2 3.8 0.4 2.1 2.3 ## [2,] 0.0 1.6 5.1 1.9 6.9 ## [3,] 6.5 -0.4 8.8 6.2 12.0 ## [4,] 3.0 5.0 5.1 1.6 2.8 ## [5,] 4.8 8.4 7.6 8.9 3.6 ## [6,] 0.1 4.6 4.4 7.0 7.4 ## [7,] 5.6 5.7 4.3 0.8 1.2 ## [8,] 3.6 9.6 2.0 0.5 3.2 ## [9,] 6.0 5.9 -0.1 6.5 4.3 ## [10,] 2.7 5.3 4.6 3.5 2.0 ## [11,] 7.0 7.0 1.5 1.9 2.0 ## [12,] 2.7 0.7 4.1 3.7 1.6 ## [13,] 4.7 -5.5 9.4 -3.5 9.5 ## [14,] 6.6 4.9 -2.0 -0.7 -0.3 ## [15,] 8.3 8.8 5.0 1.0 7.9 ## [16,] 6.2 8.0 2.6 2.1 5.7 ## [17,] 1.8 6.6 16.6 4.6 3.3 ## [18,] 9.4 8.8 11.6 12.3 9.1 ## [19,] 2.2 5.6 -1.3 9.9 3.4 ## [20,] 4.6 4.2 1.7 3.2 4.3
To randomly introduce NAs, first we randomly select rows to introduce missing values using sample() function.
n_NAs <- 15 na_ind_rows <- sample(1:nrow(data_mat), n_NAs) na_ind_rows ## [1] 16 13 7 14 6 2 17 10 3 15 11 9 18 5 19
And then, we randomly select select columns to introduce NAs.
na_ind_cols <- sample(1:ncol(data_mat), n_NAs, replace=TRUE) na_ind_cols ## [1] 1 2 1 5 5 1 1 5 5 3 3 5 4 3 4
By combining the row index for NAs and column index for NAs, we have the exact index location where we need to replace its value to NAs. For example, the first row tells us the 16th row and the first column should be an NA, and so on.
na_inds <- cbind(na_ind_rows, na_ind_cols) na_inds ## na_ind_rows na_ind_cols ## [1,] 16 1 ## [2,] 13 2 ## [3,] 7 1 ## [4,] 14 5 ## [5,] 6 5 ## [6,] 2 1 ## [7,] 17 1 ## [8,] 10 5 ## [9,] 3 5 ## [10,] 15 3 ## [11,] 11 3 ## [12,] 9 5 ## [13,] 18 4 ## [14,] 5 3 ## [15,] 19 4
Now we can use the index to replace their valies to NAs
data_mat[na_inds]<- NA
We can check the new data matrix with random NAs
data_mat ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1.2 3.8 0.4 2.1 2.3 ## [2,] NA 1.6 5.1 1.9 6.9 ## [3,] 6.5 -0.4 8.8 6.2 NA ## [4,] 3.0 5.0 5.1 1.6 2.8 ## [5,] 4.8 8.4 NA 8.9 3.6 ## [6,] 0.1 4.6 4.4 7.0 NA ## [7,] NA 5.7 4.3 0.8 1.2 ## [8,] 3.6 9.6 2.0 0.5 3.2 ## [9,] 6.0 5.9 -0.1 6.5 NA ## [10,] 2.7 5.3 4.6 3.5 NA ## [11,] 7.0 7.0 NA 1.9 2.0 ## [12,] 2.7 0.7 4.1 3.7 1.6 ## [13,] 4.7 NA 9.4 -3.5 9.5 ## [14,] 6.6 4.9 -2.0 -0.7 NA ## [15,] 8.3 8.8 NA 1.0 7.9 ## [16,] NA 8.0 2.6 2.1 5.7 ## [17,] NA 6.6 16.6 4.6 3.3 ## [18,] 9.4 8.8 11.6 NA 9.1 ## [19,] 2.2 5.6 -1.3 NA 3.4 ## [20,] 4.6 4.2 1.7 3.2 4.3