In this post, we will learn how top perform t test in R and understand when and why to use it. A t-test is one of the commonly used statistical tests, when one is interested in comparing two groups of measures and determine if there is a significant difference between the mean values of two groups.
We will start with an example using simulated data, where there is clear difference in the mean values between two groups. And the we will use simulated data of two groups where there is no difference in mean.
Let us load the packages needed.
library(tidyverse) theme_set(theme_bw(16)
Applying t.test() in R: Example 1
Here we simulate two variables x and y from random normal distributions, corresponding to two groups of interest.
x <- rnorm(n=15,mean=10, sd = 1) y <- rnorm(n=15,mean=15, sd = 1)
The variable x has about a mean of 10.
mean(x) [1] 10.15238
The variable y has about a mean of 14.
mean(y) [1] 14.75341
One of the most ways to use t.test() function is to provide the two group values as argument to it. Here we provide x and y vectors as arguments to t.test() function available in R to determine if the means of these two groups are different.
t_test_res <- t.test(x, y)
The resulting object shows the quick summary of the results from applying t-test.
t_test_res Welch Two Sample t-test data: x and y t = -12.9, df = 26.34, p-value = 6.816e-13 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.333730 -3.868317 sample estimates: mean of x mean of y 10.15238 14.75341
We can access the results using $ notation. For example, we can get the p.value from t-test
t_test_res$p.value [1] 6.816459e-13
The ow p-value shows that the mean difference is statistically significant. We can use broom package’s tidy() function and get all the results in a dataframe as shown below.
t_test_res |> broom::tidy() # A tibble: 1 × 10 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 -4.60 10.2 14.8 -12.9 6.82e-13 26.3 -5.33 -3.87 # ℹ 2 more variables: method <chr>, alternative <chr>
A important thing to remember while applying any statistical test is to actually visualize the data and see if the results from the test matches with the actual data. Here we use boxplot visualize the two group’s distribution. We can clearly see the two groups are distinct with different mean/median values.
tibble(group=c(rep("g1",15), rep("g2",15)), data =c(x,y) ) |> ggplot(aes(x=group, y=data,fill=group )) + geom_boxplot(outlier.shape = NA)+ geom_jitter(width=0.1)+ theme(legend.position = "none")
Applying t.test(): Example 2
In the previous example, we simulated two groups with different means, thus t-test correctly determining that the difference in mean values of the groups is statistically significant.
Let us simulate data where there is strong difference in mean values of the two groups. And check the results of t-test. We could see that p.value from the t-test is closer to 1 showing that the mean difference is not statistically significant.
x <- rnorm(n=15,mean=10, sd = 1) y <- rnorm(n=15,mean=10, sd = 2) t_test_res <- t.test(x,y) t_test_res |> broom::tidy() # A tibble: 1 × 10 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 -0.131 10.2 10.3 -0.287 0.776 23.5 -1.07 0.812 # ℹ 2 more variables: method <chr>, alternative <chr>
We can also verify this by visualizing the data as a boxplot and see that the two distribution overlap clearly with no difference in mean values.
tibble(group=c(rep("g1",15), rep("g2",15)), data =c(x,y) ) |> ggplot(aes(x=group, y=data,fill=group )) + geom_boxplot(outlier.shape = NA)+ geom_jitter(width=0.1)+ theme(legend.position = "none")
[…] on a real dataset and use tidyverse framework to access results from t-test. Check out the post how to do t-test to learn base R approach to […]