## Pages

### An R function for bootstrap confidence intervals

Here I introduce a R function made by myself to calculate several confidence intervals of mean/median including exact CI, basic bootstrap CI, percentile bootstrap CI & studentized bootstrap CI.

#### Arguments

There are five arguments in this R function bootCI():

• x — a numeric vector specifying $x_i$
• alpha = 0.05 — a numeric specifying $\alpha$ value
• alternative = c("t", "l", "g") — a single character string specifying two-sided, less or greater tail
• B = 1999 — a integer specifying number of bootstrap samples
• quantileAlgorithm = 7 — a integer specifying the argument type passed to quantile()

#### Examples

We can define a numeric vector $x = [5\, 2\, 3\, 6\, 8\, 19\, 1.5]$ including $N=7$ items as num:

> num <- c(5, 2, 3, 6, 8, 19, 1.5)

After loading the above function, we can call the function to calculate CIs:

> bootCI(num)
Summary of x
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.500   2.500   5.000   6.357   7.000  19.000
CIs of mu
2.5%     97.5%
$CI.exact 0.7778728 11.936413$CI.basic       1.7125000  9.714286
$CI.percentile 3.0000000 11.001786$CI.studentized 2.6785623 17.669412

The above results show that

\begin{aligned} \text{exact CI} & = ( \hat{\mu} + t_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\ \text{basic CI} & = ( 2\hat{\mu} - \hat{\mu}_{1-\frac{\alpha}{2}}^{*} \;,\; 2\hat{\mu} - \hat{\mu}_{\frac{\alpha}{2}}^{*} ) \\ \text{percentile CI} & = ( \hat{\mu}_{\frac{\alpha}{2}}^* \;,\; \hat{\mu}_{1-\frac{\alpha}{2}}^* ) \\ \text{studentized CI} & = ( \hat{\mu} + t^*_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t^*_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\ \end{aligned}

where $\hat{\mu}$ is mean of $x$, $t_{\frac{\alpha}{2}}$ is lower $\alpha/2$ critical value for the $t$ distribution given $\text{df} = N-1$, $\hat{se}_{\hat{\mu}}$ is the standard error of the mean of $x$, $\hat{\mu}_{\frac{\alpha}{2}}^*$ is $\alpha/2$ percentile of the mean of bootstrapped $x$, and $t^*_{\frac{\alpha}{2}} = \frac{\hat{\mu^*}-\hat{\mu}}{\hat{se}^*_{\hat{\mu^*}}}$ is t-value based on bootstrapped $x$. See

for more details.

We can assign a different $\alpha$, direction or number of bootstrap samples:

> bootCI(num, alpha=0.01)
Summary of x
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.500   2.500   5.000   6.357   7.000  19.000
CIs of mu
0.5%    99.5%
$CI.exact -2.0962642 14.81055$CI.basic       -0.1428571 10.28571
$CI.percentile 2.4285714 12.85714$CI.studentized  1.1394952 25.22030

> bootCI(num, alternative="g")
Summary of x
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.500   2.500   5.000   6.357   7.000  19.000
CIs of mu
5% 100%
$CI.exact 1.926445 Inf$CI.basic       2.714286  Inf
$CI.percentile 3.285714 Inf$CI.studentized 3.332740  Inf

> bootCI(num, B=99)
Summary of x
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.500   2.500   5.000   6.357   7.000  19.000
CIs of mu
2.5%     97.5%
$CI.exact 0.7778728 11.936413$CI.basic       1.6053571  9.330357
$CI.percentile 3.3839286 11.108929$CI.studentized 2.5532298 15.532786

The returned list may be helpful for someone:

> myCI <- bootCI(num)
> myCI$CI.percentile 2.5% 97.5% 3.00000 10.85714 > str(myCI) List of 8$ x             : num [1:7] 5 2 3 6 8 19 1.5
$alpha : num 0.05$ alternative   : chr "t"
$B : num 1999$ CI.exact      : Named num [1:2] 0.778 11.936
..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
$CI.basic : Named num [1:2] 1.86 9.71 ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"$ CI.percentile : Named num [1:2] 3 10.9
..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
$CI.studentized: Named num [1:2] 2.65 17.73 ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%" - attr(*, "class")= chr "bootCI" ### Paired t-test vs general mixed model with R #### Dataset Consider the following dataset: > x1 <- c(3,4,3,5,6,6,6,7,7,6,8,9,11) > x2 <- c(3,4,5,5,8,6,8,7,9,8,8,9,14) > cbind(x1,x2) x1 x2 [1,] 3 3 [2,] 4 4 [3,] 3 5 [4,] 5 5 [5,] 6 8 [6,] 6 6 [7,] 6 8 [8,] 7 7 [9,] 7 9 [10,] 6 8 [11,] 8 8 [12,] 9 9 [13,] 11 14 where x1 and x2 are paired sample. #### Paired t-test The process of a typical paired t-test with R is following: > t.test(x1, x2, paired=T) Paired t-test data: x1 and x2 t = -3.1225, df = 12, p-value = 0.008814 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.6977786 -0.3022214 sample estimates: mean of the differences -1 The results show that 1) average of$x_1 - x_2$equals to -1; 2)$t_{12} = -3.1225$,$p = 0.008814$. #### Mixed model for paired sample We can build a mixed model for this paired-sample example. Consider the following dataset: > y <- c(x1,x2) > trt <- gl(2,13,26) > subject <- gl(13,1,26) > cbind(y, trt, subject) y trt subject [1,] 3 1 1 [2,] 4 1 2 [3,] 3 1 3 [4,] 5 1 4 [5,] 6 1 5 [6,] 6 1 6 [7,] 6 1 7 [8,] 7 1 8 [9,] 7 1 9 [10,] 6 1 10 [11,] 8 1 11 [12,] 9 1 12 [13,] 11 1 13 [14,] 3 2 1 [15,] 4 2 2 [16,] 5 2 3 [17,] 5 2 4 [18,] 8 2 5 [19,] 6 2 6 [20,] 8 2 7 [21,] 7 2 8 [22,] 9 2 9 [23,] 8 2 10 [24,] 8 2 11 [25,] 9 2 12 [26,] 14 2 13 where y is dependent variable, trt is a two-level fixed factor, and subject is experimental units considered as blocks (random factor). A mixed model can fit this dataset: $y_{ij} = \tau_i + b_{j} + \varepsilon_{ij}$ where$i=\{1,2\}$showing two-level in fixed factor$\tau$,$j=\{1,2,\ldots , 13\}$showing 13 subjects as a random factor$b$. We can find the solution of this mixed model by using lme4:lmer(): > require(lme4) > mod <- lmer(y ~ trt + (1|subject) ) > summary(mod) Linear mixed model fit by REML ['lmerMod'] Formula: y ~ trt + (1 | subject) REML criterion at convergence: 98.5708 Random effects: Groups Name Variance Std.Dev. subject (Intercept) 5.8590 2.4205 Residual 0.6667 0.8165 Number of obs: 26, groups: subject, 13 Fixed effects: Estimate Std. Error t value (Intercept) 6.2308 0.7085 8.794 trt2 1.0000 0.3203 3.123 Correlation of Fixed Effects: (Intr) trt2 -0.226 As you can see, the estimate of trt2 equals 1 showing the average of$x_2 - x_1$; the statistic$t=3.123$is equivalent to the t-value I get in paired t-test. We can find the p-value by using car:Anova(): > require(car) > Anova(mod, type=3, test.statistic="F") Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) Response: y F Df Df.res Pr(>F) (Intercept) 77.34 1 13.288 6.636e-07 *** trt 9.75 1 12.000 0.008814 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 and then we can find that$F_{1,12} = 9.75 = 3.123^2$and$p=0.008814$. These outcomes are all equivalent to what I got in paired t-test. ### Multiple comparisons for generalized linear model in R #### An example ##### Defining y and x > y <- c (rpois(10,1), rpois(10,2), rpois(10,3), rpois(10,5)) > y  0 1 0 2 1 2 0 1 1 0 3 2 2 4 2 3 3 1 1 1 3 0 4 4 2 3 1 4 3 6 3 8 6 3 2 6 5 5 2 8 > x <- gl(4,10) > x  1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 Levels: 1 2 3 4 ##### Fitting a Poisson regression > f <- glm(y~x, family=poisson) > summary(f) Call: glm(formula = y ~ x, family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -2.44949 -0.90724 0.04533 0.52699 1.52242 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.2231 0.3536 -0.631 0.527945 x2 1.0116 0.4129 2.450 0.014277 * x3 1.3218 0.3979 3.322 0.000895 *** x4 1.7918 0.3819 4.692 2.71e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 67.967 on 39 degrees of freedom Residual deviance: 34.884 on 36 degrees of freedom AIC: 142.6 Number of Fisher Scoring iterations: 5 ##### Multiple comparisons > # install.packages("multcomp") > require(multcomp) > f.mc <- glht(f, linfct = mcp(x = "Tukey")) > summary(f.mc) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: glm(formula = y ~ x, family = poisson) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) 2 - 1 == 0 1.0116 0.4129 2.450 0.06433 . 3 - 1 == 0 1.3218 0.3979 3.322 0.00481 ** 4 - 1 == 0 1.7918 0.3819 4.692 < 0.001 *** 3 - 2 == 0 0.3102 0.2807 1.105 0.67729 4 - 2 == 0 0.7802 0.2575 3.030 0.01198 * 4 - 3 == 0 0.4700 0.2327 2.019 0.17303 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method) > confint(f.mc) Simultaneous Confidence Intervals Multiple Comparisons of Means: Tukey Contrasts Fit: glm(formula = y ~ x, family = poisson) Quantile = 2.5471 95% family-wise confidence level Linear Hypotheses: Estimate lwr upr 2 - 1 == 0 1.01160 -0.04002 2.06322 3 - 1 == 0 1.32176 0.30822 2.33529 4 - 1 == 0 1.79176 0.81905 2.76447 3 - 2 == 0 0.31015 -0.40481 1.02512 4 - 2 == 0 0.78016 0.12436 1.43596 4 - 3 == 0 0.47000 -0.12281 1.06282 > coef(f.mc) 2 - 1 3 - 1 4 - 1 3 - 2 4 - 2 4 - 3 1.0116009 1.3217558 1.7917595 0.3101549 0.7801586 0.4700036 > vcov(f.mc) 2 - 1 3 - 1 4 - 1 3 - 2 4 - 2 4 - 3 2 - 1 1.704542e-01 1.249996e-01 1.249996e-01 -4.545455e-02 -4.545455e-02 9.714451e-17 3 - 1 1.249996e-01 1.583330e-01 1.249996e-01 3.333333e-02 6.938894e-17 -3.333333e-02 4 - 1 1.249996e-01 1.249996e-01 1.458330e-01 -2.775558e-17 2.083333e-02 2.083333e-02 3 - 2 -4.545455e-02 3.333333e-02 -2.775558e-17 7.878788e-02 4.545455e-02 -3.333333e-02 4 - 2 -4.545455e-02 6.938894e-17 2.083333e-02 4.545455e-02 6.628788e-02 2.083333e-02 4 - 3 9.714451e-17 -3.333333e-02 2.083333e-02 -3.333333e-02 2.083333e-02 5.416667e-02 #### Read more ### 在 R 中配置 Generalized Logit Model 已知資料 data.csv 如下: treat, r.a, r.b, r.c t1, 10, 20, 5 t2, 11, 22, 5 t3, 21, 40, 11 t4, 21, 5, 10 其中 treat 為具有 4 個水準的固定因子; r.a, r.br.c 分別為 3 個類型的反應類別, 其下為頻率. 以下為 R code: # 系統第一次運作請先安裝 package nnet install.packages("nnet") # 引入 package nnet require(nnet) # 讀取資料 dat <- read.csv("data.csv") # 建立 generalized logit model m <- multinom( cbind(r.a, r.b, r.c) ~ treat, # 其中 r.a 寫在最前面即可作為比較基準 data=dat, contrasts=list(treat=contr.treatment), # 其中採用 treatment contrast 方便解釋 Hess=T # 其中採用 Hessian Matrix 求標準誤 ) # 係數與標準誤 (s.m <- summary(m)) # odds ratio exp(s.m$coefficients)

# chi-square
(chisq <- (s.m$coefficients / s.m$standard.error)^2)

# p-value
pchisq(chisq, df=1, lower.tail=F)

> (s.m <- summary(m))
Call:
multinom(formula = cbind(r.a, r.b, r.c) ~ treat, data = dat,
contrasts = list(treat = contr.treatment), Hess = T)

Coefficients:
(Intercept)        treat2      treat3      treat4
r.b   0.6931308  2.158365e-05 -0.04877868 <strong>-2.12818025</strong>
r.c  -0.6931750 -9.525513e-02  0.04656602 -0.04876199

Std. Errors:
(Intercept)    treat2    treat3    treat4
r.b   0.3872968 0.5351291 0.4718241 0.6305649
r.c   0.5477240 0.7687048 0.6622145 0.6690451

Residual Deviance: 346.6049
AIC: 362.6049

> # odds ratio
> exp(s.m$coefficients) (Intercept) treat2 treat3 treat4 r.b 1.9999671 1.000022 0.9523919 <strong>0.1190537</strong> r.c 0.4999861 0.909141 1.0476672 0.9524078 > # chi-square > (chisq <- (s.m$coefficients / s.m$standard.error)^2) (Intercept) treat2 treat3 treat4 r.b 3.202894 1.626795e-09 0.010688081 <strong>11.390878935</strong> r.c 1.601630 1.535527e-02 0.004944712 0.005311928 > # p-value > pchisq(chisq, df=1, lower.tail=F) (Intercept) treat2 treat3 treat4 r.b 0.07350811 0.9999678 0.9176589 <strong>0.000738056</strong> r.c 0.20567234 0.9013815 0.9439400 0.941899243 可以發現, 唯一達顯著之係數為 r.b 對比 r.a 時在 treat4 對比 treat1 的情況, 其係數$\beta = -2.12818025$, 勝算比$= \exp(-2.12818025) = 0.1190537$, 檢定量$\chi^2_1 = 11.390878935$,$p = 0.000738056$, 顯示 treat4treat1r.b / r.a 達顯著差異. 由實際資料$\frac{5/21}{20/10} = 0.1190476$也和模型所預測的勝算比 0.1190537 相當接近. ### G-test for 2-way contingency data in R A R function applying G-test for 2-way contingency data in R is demonstrated. Yates’ correction and Williams’ correction are also included in the output. #### R Source #### Example ##### R code my.2way.contingency.table <- matrix( c( 5,0,3,12,6, 4,2,7,23,11 ), nrow = 2, ncol=5, byrow=T ) g.test.result <- g.test.2way(my.2way.contingency.table) g.test.result str(g.test.result) ##### R output > my.2way.contingency.table <- matrix( + c( + 5,0,3,12,6, + 4,2,7,23,11 + ), + nrow = 2, ncol=5, byrow=T + ) > > g.test.result <- g.test.2way(my.2way.contingency.table) > g.test.result Observed value: [,1] [,2] [,3] [,4] [,5] [1,] 5 0 3 12 6 [2,] 4 2 7 23 11 Expected value: [,1] [,2] [,3] [,4] [,5] [1,] 3.205479 0.7123288 3.561644 12.46575 6.054795 [2,] 5.794521 1.2876712 6.438356 22.53425 10.945205 Observed value with Yates' correction: [,1] [,2] [,3] [,4] [,5] [1,] 4.5 0.5 3.5 12.5 6.5 [2,] 4.5 1.5 6.5 22.5 10.5 Degree of freedom = 4 G-test: G = 3.41127, p = 0.491497 G-test with Yates' correction: G = 1.28744, p = 0.863503 G-test with Williams' correction: G = 3.07349, q_min = 1.1099, p = 0.545603 > str(g.test.result) List of 11$ data.observed      : num [1:2, 1:5] 5 4 0 2 3 7 12 23 6 11
$data.expected : num [1:2, 1:5] 3.205 5.795 0.712 1.288 3.562 ...$ data.observed.Yates: num [1:2, 1:5] 4.5 4.5 0.5 1.5 3.5 6.5 12.5 22.5 6.5 10.5
$df : num 4$ q.min              : num 1.11
$g : num 3.41$ g.Yates            : num 1.29
$g.Williams : num 3.07$ p                  : num 0.491
$p.Yates : num 0.864$ p.Williams         : num 0.546
- attr(*, "class")= chr "g.test.2way"

### Convert taxa matrix to guild matrix with R

#### 例子

"Plot","Leptonetidae","Clubionidae","Others","Araneidae","Oonopidae","Ctenizidae"
1,2,0,0,1,1,2
2,1,2,3,1,0,0
3,1,0,1,1,1,1
4,2,3,2,1,0,1
5,0,1,1,2,1,4
6,2,0,0,0,1,3
7,1,0,1,2,2,1
8,0,0,1,2,3,1

"taxa","Guild1","Guild2","Guild3"
"Oonopidae","Ground runner","runner","Ground"
"Araneidae","Orb weaver","weaver","Space"
"Clubionidae","Foliage runner","runner","Foliage"
"Ctenidae","Ground runner","runner","Ground"
"Ctenizidae","Burrow dweller","dweller","Burrow"
"Gnaphosidae","Ground runner","runner","Ground"
"Leptonetidae","Ground weaver","weaver","Ground"
"Linyphiidae","Space weaver","weaver","Space"

source("taxaToGuild.r")
new.dataframe <- taxaToGuild(
taxa.file.csv = "taxa.csv",
taxa.file.csv.row.num = 1,
guild.file.csv = "guild.csv",
guild.file.csv.row.num = 1,
guild.system.row.num = 1
)
new.dataframe

> source("taxaToGuild.r")
> new.dataframe <- taxaToGuild(
+   taxa.file.csv = "taxa.csv",
+   taxa.file.csv.row.num = 1,
+   guild.file.csv = "guild.csv",
+   guild.file.csv.row.num = 1,
+   guild.system.row.num = 1
+ )
The guild system is Guild1.
Warning: the following taxa is merged into guild NA due to no reference: Others.
> new.dataframe
Ground weaver Foliage runner NA Orb weaver Ground runner Burrow dweller
1             2              0  0          1             1              2
2             1              2  3          1             0              0
3             1              0  1          1             1              1
4             2              3  2          1             0              1
5             0              1  1          2             1              4
6             2              0  0          0             1              3
7             1              0  1          2             2              1
8             0              0  1          2             3              1