Apan’s Notes: 2013-12

Here I introduce a R function made by myself to calculate several confidence intervals of mean/median including exact CI, basic bootstrap CI, percentile bootstrap CI & studentized bootstrap CI.

R code

Arguments

There are five arguments in this R function bootCI():

x — a numeric vector specifying $x_i$
alpha = 0.05 — a numeric specifying $\alpha$ value
alternative = c("t", "l", "g") — a single character string specifying two-sided, less or greater tail
B = 1999 — a integer specifying number of bootstrap samples
quantileAlgorithm = 7 — a integer specifying the argument type passed to quantile()

Examples

We can define a numeric vector $x = [5\, 2\, 3\, 6\, 8\, 19\, 1.5]$ including $N=7$ items as num:

> num <- c(5, 2, 3, 6, 8, 19, 1.5)

After loading the above function, we can call the function to calculate CIs:

> bootCI(num)
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                     2.5%     97.5%
$CI.exact       0.7778728 11.936413
$CI.basic       1.7125000  9.714286
$CI.percentile  3.0000000 11.001786
$CI.studentized 2.6785623 17.669412

The above results show that

\begin{aligned} \text{exact CI} & = ( \hat{\mu} + t_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\ \text{basic CI} & = ( 2\hat{\mu} - \hat{\mu}_{1-\frac{\alpha}{2}}^{*} \;,\; 2\hat{\mu} - \hat{\mu}_{\frac{\alpha}{2}}^{*} ) \\ \text{percentile CI} & = ( \hat{\mu}_{\frac{\alpha}{2}}^* \;,\; \hat{\mu}_{1-\frac{\alpha}{2}}^* ) \\ \text{studentized CI} & = ( \hat{\mu} + t^*_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t^*_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\ \end{aligned}

where $\hat{\mu}$ is mean of $x$, $t_{\frac{\alpha}{2}}$ is lower $\alpha/2$ critical value for the $t$ distribution given $\text{df} = N-1$, $\hat{se}_{\hat{\mu}}$ is the standard error of the mean of $x$, $\hat{\mu}_{\frac{\alpha}{2}}^*$ is $\alpha/2$ percentile of the mean of bootstrapped $x$, and $t^*_{\frac{\alpha}{2}} = \frac{\hat{\mu^*}-\hat{\mu}}{\hat{se}^*_{\hat{\mu^*}}}$ is t-value based on bootstrapped $x$. See

J. Carpenter and J. Bithell (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statist. Med. 19:1141-1164
Wikipedia contributors, 'Bootstrapping (statistics)', Wikipedia, The Free Encyclopedia, 17 December 2013, 21:28 UTC, <http://en.wikipedia.org/w/index.php?title=Bootstrapping_(statistics)&oldid=586549893> [accessed 31 December 2013]

for more details.

We can assign a different $\alpha$, direction or number of bootstrap samples:

> bootCI(num, alpha=0.01)
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                      0.5%    99.5%
$CI.exact       -2.0962642 14.81055
$CI.basic       -0.1428571 10.28571
$CI.percentile   2.4285714 12.85714
$CI.studentized  1.1394952 25.22030

> bootCI(num, alternative="g")
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                      5% 100%
$CI.exact       1.926445  Inf
$CI.basic       2.714286  Inf
$CI.percentile  3.285714  Inf
$CI.studentized 3.332740  Inf

> bootCI(num, B=99)
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                     2.5%     97.5%
$CI.exact       0.7778728 11.936413
$CI.basic       1.6053571  9.330357
$CI.percentile  3.3839286 11.108929
$CI.studentized 2.5532298 15.532786

The returned list may be helpful for someone:

> myCI <- bootCI(num)
> myCI$CI.percentile
    2.5%    97.5% 
 3.00000 10.85714 
> str(myCI)
List of 8
 $ x             : num [1:7] 5 2 3 6 8 19 1.5
 $ alpha         : num 0.05
 $ alternative   : chr "t"
 $ B             : num 1999
 $ CI.exact      : Named num [1:2] 0.778 11.936
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 $ CI.basic      : Named num [1:2] 1.86 9.71
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 $ CI.percentile : Named num [1:2] 3 10.9
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 $ CI.studentized: Named num [1:2] 2.65 17.73
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 - attr(*, "class")= chr "bootCI"

> x1 <- c(3,4,3,5,6,6,6,7,7,6,8,9,11) > x2 <- c(3,4,5,5,8,6,8,7,9,8,8,9,14) > cbind(x1,x2) x1 x2 [1,] 3 3 [2,] 4 4 [3,] 3 5 [4,] 5 5 [5,] 6 8 [6,] 6 6 [7,] 6 8 [8,] 7 7 [9,] 7 9 [10,] 6 8 [11,] 8 8 [12,] 9 9 [13,] 11 14

> t.test(x1, x2, paired=T) Paired t-test data: x1 and x2 t = -3.1225, df = 12, p-value = 0.008814 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.6977786 -0.3022214 sample estimates: mean of the differences -1

> y <- c(x1,x2) > trt <- gl(2,13,26) > subject <- gl(13,1,26) > cbind(y, trt, subject) y trt subject [1,] 3 1 1 [2,] 4 1 2 [3,] 3 1 3 [4,] 5 1 4 [5,] 6 1 5 [6,] 6 1 6 [7,] 6 1 7 [8,] 7 1 8 [9,] 7 1 9 [10,] 6 1 10 [11,] 8 1 11 [12,] 9 1 12 [13,] 11 1 13 [14,] 3 2 1 [15,] 4 2 2 [16,] 5 2 3 [17,] 5 2 4 [18,] 8 2 5 [19,] 6 2 6 [20,] 8 2 7 [21,] 7 2 8 [22,] 9 2 9 [23,] 8 2 10 [24,] 8 2 11 [25,] 9 2 12 [26,] 14 2 13

> require(lme4) > mod <- lmer(y ~ trt + (1|subject) ) > summary(mod) Linear mixed model fit by REML ['lmerMod'] Formula: y ~ trt + (1 | subject) REML criterion at convergence: 98.5708 Random effects: Groups Name Variance Std.Dev. subject (Intercept) 5.8590 2.4205 Residual 0.6667 0.8165 Number of obs: 26, groups: subject, 13 Fixed effects: Estimate Std. Error t value (Intercept) 6.2308 0.7085 8.794 trt2 1.0000 0.3203 3.123 Correlation of Fixed Effects: (Intr) trt2 -0.226

> require(car) > Anova(mod, type=3, test.statistic="F") Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) Response: y F Df Df.res Pr(>F) (Intercept) 77.34 1 13.288 6.636e-07 *** trt 9.75 1 12.000 0.008814 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Apan’s Notes

Pages

An R function for bootstrap confidence intervals

R code

Arguments

Examples

Paired t-test vs general mixed model with R

Dataset

Paired t-test

Mixed model for paired sample