Here I introduce a R function made by myself to calculate several confidence intervals of mean/median including exact CI, basic bootstrap CI, percentile bootstrap CI & studentized bootstrap CI.
R code
# Copyright 2013 Chen-Pan Liao | |
# This program is free software: you can redistribute it and/or modify | |
# it under the terms of the GNU General Public License as published by | |
# the Free Software Foundation, either version 3 of the License, or | |
# any later version. | |
# | |
# This program is distributed in the hope that it will be useful, | |
# but WITHOUT ANY WARRANTY; without even the implied warranty of | |
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
# GNU General Public License for more details. | |
# | |
# You should have received a copy of the GNU General Public License | |
# along with this program. If not, see <http://www.gnu.org/licenses/>. | |
# See http://apansharing.blogspot.tw/2013/12/an-r-function-for-bootstrap-confidence.html | |
# to learn how to use this function | |
## main function | |
bootCI <- function( | |
x, | |
alpha=0.05, | |
alternative=c("t", "l", "g"), # see arg "alternative" in help(t.test) | |
B=1999, # number of bootstrap samples | |
quantileAlgorithm = 7 # passed to quantile(type) | |
){ | |
#validation | |
if(mode(x) != "numeric"){ | |
stop("mode(x) must be \"numeric\"") | |
} | |
if(class(x) != "numeric" && class(x) != "integer"){ | |
stop("class(x) must be \"numeric\" or \"integer\"") | |
} | |
if(length(x) < 2){ | |
stop("length(x) must be larger than 1") | |
} | |
# initiation | |
alternative <- alternative[1] | |
stat.length <- length(x) | |
stat.mean <- mean(x) | |
stat.sem <- sqrt(var(x) / stat.length) | |
# alternative | |
probs <- c(alpha/2, 1-alpha/2) | |
if(alternative == "l") probs <- c(0, 1-alpha) | |
if(alternative == "g") probs <- c(alpha, 1) | |
# bootstrap | |
boot.mat <- matrix(0.0, B+1, stat.length) | |
boot.mat[1,] <- x | |
for (i in 2:(B+1)) { | |
boot.mat[i,] <- sample(x, replace=T) | |
} | |
# exact (traditional) CI based on t distribution | |
CI.exact <- stat.mean + qt(probs, stat.length - 1) * stat.sem | |
names(CI.exact) <- paste(probs*100, "%", sep="") | |
# basic bootstrap CI | |
CI.basic <- 2 * stat.mean - quantile( | |
apply(boot.mat, 1, mean), probs=1-probs, type=quantileAlgorithm | |
) | |
names(CI.basic) <- rev(names(CI.basic)) | |
# percentile bootstrap CI | |
CI.percentile <- quantile( | |
apply(boot.mat, 1, mean), probs=probs, type=quantileAlgorithm | |
) | |
# studentized bootstrap CI | |
tmp.mean <- apply(boot.mat, 1, mean) | |
tmp.sem <- apply( | |
boot.mat, 1, function(.){sqrt(var(.) / length(.))} | |
) | |
tmp.t <- (tmp.mean - stat.mean) / tmp.sem | |
expr <- expression( | |
CI.studentized <- | |
stat.mean - quantile( | |
tmp.t, probs=1-probs, type=quantileAlgorithm | |
) * stat.sem | |
) | |
a.try <- try(eval(expr), T) | |
if("try-error" %in% class(a.try)) { | |
warning("studentized bootstrap CI cannot | |
be done well due to boostrap sem = 0") | |
} | |
names(CI.studentized) <- rev(names(CI.studentized)) | |
# exceptions of one-tail | |
if(alternative == "g") { | |
CI.exact[2] = Inf | |
CI.basic[2] = Inf | |
CI.percentile[2] = Inf | |
CI.studentized[2] = Inf | |
} else if(alternative == "l") { | |
CI.exact[1] = -Inf | |
CI.basic[1] = -Inf | |
CI.percentile[1] = -Inf | |
CI.studentized[1] = -Inf | |
} | |
output <- list( | |
x = x, | |
alpha = alpha, | |
alternative = alternative, | |
B = B, | |
CI.exact = CI.exact, | |
CI.basic = CI.basic, | |
CI.percentile = CI.percentile, | |
CI.studentized = CI.studentized | |
) | |
class(output) <- "bootCI" | |
return(output) | |
} | |
## print function | |
print.bootCI <- function(w){ | |
cat("Summary of x\n") | |
print(summary(w$x)) | |
cat("CIs of mu\n") | |
mat <- rbind(w$CI.exact, w$CI.basic, w$CI.percentile, w$CI.studentized) | |
rownames(mat) <- c("$CI.exact", "$CI.basic", "$CI.percentile", "$CI.studentized") | |
print(mat) | |
} |
Arguments
There are five arguments in this R function bootCI()
:
x
— a numeric vector specifying $x_i$alpha = 0.05
— a numeric specifying $\alpha$ valuealternative = c("t", "l", "g")
— a single character string specifying two-sided, less or greater tailB = 1999
— a integer specifying number of bootstrap samplesquantileAlgorithm = 7
— a integer specifying the argumenttype
passed toquantile()
Examples
We can define a numeric vector $x = [5\, 2\, 3\, 6\, 8\, 19\, 1.5]$ including $N=7$ items as num
:
> num <- c(5, 2, 3, 6, 8, 19, 1.5)
After loading the above function, we can call the function to calculate CIs:
> bootCI(num) Summary of x Min. 1st Qu. Median Mean 3rd Qu. Max. 1.500 2.500 5.000 6.357 7.000 19.000 CIs of mu 2.5% 97.5% $CI.exact 0.7778728 11.936413 $CI.basic 1.7125000 9.714286 $CI.percentile 3.0000000 11.001786 $CI.studentized 2.6785623 17.669412
The above results show that
where $\hat{\mu}$ is mean of $x$, $t_{\frac{\alpha}{2}}$ is lower $\alpha/2$ critical value for the $t$ distribution given $\text{df} = N-1$, $\hat{se}_{\hat{\mu}}$ is the standard error of the mean of $x$, $\hat{\mu}_{\frac{\alpha}{2}}^*$ is $\alpha/2$ percentile of the mean of bootstrapped $x$, and $t^*_{\frac{\alpha}{2}} = \frac{\hat{\mu^*}-\hat{\mu}}{\hat{se}^*_{\hat{\mu^*}}}$ is t-value based on bootstrapped $x$. See
- J. Carpenter and J. Bithell (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statist. Med. 19:1141-1164
- Wikipedia contributors, 'Bootstrapping (statistics)', Wikipedia, The Free Encyclopedia, 17 December 2013, 21:28 UTC, <http://en.wikipedia.org/w/index.php?title=Bootstrapping_(statistics)&oldid=586549893> [accessed 31 December 2013]
for more details.
We can assign a different $\alpha$, direction or number of bootstrap samples:
> bootCI(num, alpha=0.01) Summary of x Min. 1st Qu. Median Mean 3rd Qu. Max. 1.500 2.500 5.000 6.357 7.000 19.000 CIs of mu 0.5% 99.5% $CI.exact -2.0962642 14.81055 $CI.basic -0.1428571 10.28571 $CI.percentile 2.4285714 12.85714 $CI.studentized 1.1394952 25.22030 > bootCI(num, alternative="g") Summary of x Min. 1st Qu. Median Mean 3rd Qu. Max. 1.500 2.500 5.000 6.357 7.000 19.000 CIs of mu 5% 100% $CI.exact 1.926445 Inf $CI.basic 2.714286 Inf $CI.percentile 3.285714 Inf $CI.studentized 3.332740 Inf > bootCI(num, B=99) Summary of x Min. 1st Qu. Median Mean 3rd Qu. Max. 1.500 2.500 5.000 6.357 7.000 19.000 CIs of mu 2.5% 97.5% $CI.exact 0.7778728 11.936413 $CI.basic 1.6053571 9.330357 $CI.percentile 3.3839286 11.108929 $CI.studentized 2.5532298 15.532786
The returned list may be helpful for someone:
> myCI <- bootCI(num) > myCI$CI.percentile 2.5% 97.5% 3.00000 10.85714 > str(myCI) List of 8 $ x : num [1:7] 5 2 3 6 8 19 1.5 $ alpha : num 0.05 $ alternative : chr "t" $ B : num 1999 $ CI.exact : Named num [1:2] 0.778 11.936 ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%" $ CI.basic : Named num [1:2] 1.86 9.71 ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%" $ CI.percentile : Named num [1:2] 3 10.9 ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%" $ CI.studentized: Named num [1:2] 2.65 17.73 ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%" - attr(*, "class")= chr "bootCI"