An R function for bootstrap confidence intervals

Here I introduce a R function made by myself to calculate several confidence intervals of mean/median including exact CI, basic bootstrap CI, percentile bootstrap CI & studentized bootstrap CI.

R code

Arguments

There are five arguments in this R function bootCI():

  • x — a numeric vector specifying $x_i$
  • alpha = 0.05 — a numeric specifying $\alpha$ value
  • alternative = c("t", "l", "g") — a single character string specifying two-sided, less or greater tail
  • B = 1999 — a integer specifying number of bootstrap samples
  • quantileAlgorithm = 7 — a integer specifying the argument type passed to quantile()

Examples

We can define a numeric vector $x = [5\, 2\, 3\, 6\, 8\, 19\, 1.5]$ including $N=7$ items as num:

> num <- c(5, 2, 3, 6, 8, 19, 1.5)

After loading the above function, we can call the function to calculate CIs:

> bootCI(num)
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                     2.5%     97.5%
$CI.exact       0.7778728 11.936413
$CI.basic       1.7125000  9.714286
$CI.percentile  3.0000000 11.001786
$CI.studentized 2.6785623 17.669412

The above results show that

\begin{aligned} \text{exact CI} & = ( \hat{\mu} + t_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\ \text{basic CI} & = ( 2\hat{\mu} - \hat{\mu}_{1-\frac{\alpha}{2}}^{*} \;,\; 2\hat{\mu} - \hat{\mu}_{\frac{\alpha}{2}}^{*} ) \\ \text{percentile CI} & = ( \hat{\mu}_{\frac{\alpha}{2}}^* \;,\; \hat{\mu}_{1-\frac{\alpha}{2}}^* ) \\ \text{studentized CI} & = ( \hat{\mu} + t^*_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t^*_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\ \end{aligned}

where $\hat{\mu}$ is mean of $x$, $t_{\frac{\alpha}{2}}$ is lower $\alpha/2$ critical value for the $t$ distribution given $\text{df} = N-1$, $\hat{se}_{\hat{\mu}}$ is the standard error of the mean of $x$, $\hat{\mu}_{\frac{\alpha}{2}}^*$ is $\alpha/2$ percentile of the mean of bootstrapped $x$, and $t^*_{\frac{\alpha}{2}} = \frac{\hat{\mu^*}-\hat{\mu}}{\hat{se}^*_{\hat{\mu^*}}}$ is t-value based on bootstrapped $x$. See

for more details.

We can assign a different $\alpha$, direction or number of bootstrap samples:

> bootCI(num, alpha=0.01)
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                      0.5%    99.5%
$CI.exact       -2.0962642 14.81055
$CI.basic       -0.1428571 10.28571
$CI.percentile   2.4285714 12.85714
$CI.studentized  1.1394952 25.22030

> bootCI(num, alternative="g")
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                      5% 100%
$CI.exact       1.926445  Inf
$CI.basic       2.714286  Inf
$CI.percentile  3.285714  Inf
$CI.studentized 3.332740  Inf

> bootCI(num, B=99)
Summary of x
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.500   2.500   5.000   6.357   7.000  19.000 
CIs of mu
                     2.5%     97.5%
$CI.exact       0.7778728 11.936413
$CI.basic       1.6053571  9.330357
$CI.percentile  3.3839286 11.108929
$CI.studentized 2.5532298 15.532786

The returned list may be helpful for someone:

> myCI <- bootCI(num)
> myCI$CI.percentile
    2.5%    97.5% 
 3.00000 10.85714 
> str(myCI)
List of 8
 $ x             : num [1:7] 5 2 3 6 8 19 1.5
 $ alpha         : num 0.05
 $ alternative   : chr "t"
 $ B             : num 1999
 $ CI.exact      : Named num [1:2] 0.778 11.936
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 $ CI.basic      : Named num [1:2] 1.86 9.71
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 $ CI.percentile : Named num [1:2] 3 10.9
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 $ CI.studentized: Named num [1:2] 2.65 17.73
  ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
 - attr(*, "class")= chr "bootCI"

Paired t-test vs general mixed model with R

Dataset

Consider the following dataset:

> x1 <- c(3,4,3,5,6,6,6,7,7,6,8,9,11)
> x2 <- c(3,4,5,5,8,6,8,7,9,8,8,9,14)
> cbind(x1,x2)
      x1 x2
 [1,]  3  3
 [2,]  4  4
 [3,]  3  5
 [4,]  5  5
 [5,]  6  8
 [6,]  6  6
 [7,]  6  8
 [8,]  7  7
 [9,]  7  9
[10,]  6  8
[11,]  8  8
[12,]  9  9
[13,] 11 14

where x1 and x2 are paired sample.

Paired t-test

The process of a typical paired t-test with R is following:

> t.test(x1, x2, paired=T)

 Paired t-test

data:  x1 and x2
t = -3.1225, df = 12, p-value = 0.008814
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.6977786 -0.3022214
sample estimates:
mean of the differences 
                     -1

The results show that 1) average of $x_1 - x_2$ equals to -1; 2) $t_{12} = -3.1225$, $p = 0.008814$.

Mixed model for paired sample

We can build a mixed model for this paired-sample example. Consider the following dataset:

> y <- c(x1,x2)
> trt <- gl(2,13,26)
> subject <- gl(13,1,26)
> cbind(y, trt, subject)
       y trt subject
 [1,]  3   1       1
 [2,]  4   1       2
 [3,]  3   1       3
 [4,]  5   1       4
 [5,]  6   1       5
 [6,]  6   1       6
 [7,]  6   1       7
 [8,]  7   1       8
 [9,]  7   1       9
[10,]  6   1      10
[11,]  8   1      11
[12,]  9   1      12
[13,] 11   1      13
[14,]  3   2       1
[15,]  4   2       2
[16,]  5   2       3
[17,]  5   2       4
[18,]  8   2       5
[19,]  6   2       6
[20,]  8   2       7
[21,]  7   2       8
[22,]  9   2       9
[23,]  8   2      10
[24,]  8   2      11
[25,]  9   2      12
[26,] 14   2      13

where y is dependent variable, trt is a two-level fixed factor, and subject is experimental units considered as blocks (random factor). A mixed model can fit this dataset:

\[ y_{ij} = \tau_i + b_{j} + \varepsilon_{ij} \]

where $i=\{1,2\}$ showing two-level in fixed factor $\tau$, $j=\{1,2,\ldots , 13\}$ showing 13 subjects as a random factor $b$.

We can find the solution of this mixed model by using lme4:lmer():

> require(lme4)
> mod <- lmer(y ~ trt + (1|subject) )
> summary(mod)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ trt + (1 | subject) 

REML criterion at convergence: 98.5708 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 5.8590   2.4205  
 Residual             0.6667   0.8165  
Number of obs: 26, groups: subject, 13

Fixed effects:
            Estimate Std. Error t value
(Intercept)   6.2308     0.7085   8.794
trt2          1.0000     0.3203   3.123

Correlation of Fixed Effects:
     (Intr)
trt2 -0.226

As you can see, the estimate of trt2 equals 1 showing the average of $x_2 - x_1$; the statistic $t=3.123$ is equivalent to the t-value I get in paired t-test.

We can find the p-value by using car:Anova():

> require(car)
> Anova(mod, type=3, test.statistic="F")
Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)

Response: y
                F Df Df.res    Pr(>F)    
(Intercept) 77.34  1 13.288 6.636e-07 ***
trt          9.75  1 12.000  0.008814 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

and then we can find that $F_{1,12} = 9.75 = 3.123^2$ and $p=0.008814$. These outcomes are all equivalent to what I got in paired t-test.