From Wikili
Jump to: navigation, search

t-test in R

The t-test is probably the best know statistical test.
Baiscally the t-test can be used to compare a) if the avarage of a given sample is different from 0 or b) if the averages of two (independent) samples are different.

The individual values in each sample should follow the normal distribution and the samples should be independet. for testing Normality in R you may use the Shapiro-test

Before launching the test it is essential to define the hypothesis to be tested and the Ho (hypothesis of the inverse). Averages may be tested "two-sided" for (not-)equality (the hypothesis doesnt specify if average_1 is larger or smaller than average_2), or single-sided (where larger or samller has to be chosen). The initial t-test assumes equal variance in both samples, if you think this is not the case the Welch-correction allows to use for each sample individual estimations of the standard deviation. in fact, the default implementation in R does already the Welch-correction.

Run the test in R as :

samp1 <- c(2:10,4:6)
samp2 <-  c(6:11,9,10,14)
# test the hypothesis that the averages of samp1 and samp2 are equal (ie Ho aver(samp1) equal aver(samp2) )
t.test(samp1, samp2)

will return the t-value, the degrees of freedom, the p-value, the 95% confidence interval and the sample (estimated) means. If you simply want the p-values type :

t.test(samp1, samp2)$p.value

In this particular example the probability (p-value) for the hypothesis of both averages being equal is quite samll, therefore one may consider the averages of both samples as significaltly different (ie below the calssical a=5% threshold) since :

t.test(samp1, samp2)$p.value < 0.05

Special cases and Assumptions :

As mentioned before, t-test assumes INDEPENDENCE of the variables to be tested ! Note, that in many settings in Bioinformatics such independence is not entirely granted (eg genes may potentially be co-regulated...).

When running many t-test a special correction for the multiple testing should be applied. For example this is the case with many testing situation many genes present on a single microarray.