## Pages

### 邏輯斯迴歸、勝算與勝算比的關係 (logistic regression, odds and odds ratio)

#### 公式一覽

$\log ( \text{odd} ) = \text{logit} (p) = \log ( \frac{\Pr (y=1) }{1 - \Pr (y=1)} ) = \beta_0 + \sum_{i=1}^{n} \beta_i x_i$ $\text{odd} = \frac{\Pr (y=1) }{1 - \Pr (y=1)} = \exp ( \beta_0 + \sum_{i=1}^{n} \beta_i x_i )$ $\Pr (y=1) = \frac{\text{odd}}{1+\text{odd}} = \frac{1}{1+\text{odd}^{-1}} = \frac{1}{1+\exp (-( \beta_0 + \sum_{i=1}^{n}\beta_i x_i)) }$ $\log (\text{odds ratio}) = \log \left( \frac{\text{odd}_1}{\text{odd}_2} \right) = \log(\text{odd}_1) - \log(\text{odd}_2)$

#### 名詞翻譯

• logistic regression: 邏級式迴歸;
• odds: 勝算;
• odds ratio: 勝算比.

#### 僅包含常數項的迴歸式

> isPhone<-c(rep(1,6),rep(0,4))
> m<-glm(isPhone~1,family=binomial);summary(m)

Call:
glm(formula = isPhone ~ 1, family = binomial)

Deviance Residuals:
Min      1Q  Median      3Q     Max
-1.354  -1.354   1.011   1.011   1.011

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.4055     0.6455   0.628     0.53

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 13.46  on 9  degrees of freedom
Residual deviance: 13.46  on 9  degrees of freedom
AIC: 15.46

Number of Fisher Scoring iterations: 4

> exp(0.4055)
[1] 1.500052

#### 包含一個二元類別自變數的迴歸式

“男生有手機” 對 “男生無手機” 的勝算為 $\frac{5/6}{1/6} = 5$; “女生有手機” 對 “女生無手機” 的勝算為 $\frac{1/4}{3/4} = 0.3333$; 二者的勝算比為 $5 / 0.3333 = 15$.

> isMale<-c(1,1,1,1,1,1,0,0,0,0)
> isPhone<-as.factor(c(1,1,1,1,1,0,0,0,0,1))
> m<-glm(isPhone~isMale,family=binomial);summary(m)

Call:
glm(formula = isMale ~ isPhone, family = binomial)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.8930  -0.7585   0.6039   0.6039   1.6651

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -1.099      1.155  -0.951   0.3414
isMale         2.708      1.592   1.701   0.0889 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 13.4602  on 9  degrees of freedom
Residual deviance:  9.9054  on 8  degrees of freedom
AIC: 13.905

Number of Fisher Scoring iterations: 4

> exp(-1.099)
[1] 0.3332041
> exp(2.708)
[1] 14.99925

#### 包含一連續變數作為自變數的迴歸式

1100
1120
190
1110
1110
080
0100
090
070
0110

> isPhone<-c(1,1,1,1,1,0,0,0,0,0)
> money<-c(100, 120, 90, 110, 110, 80, 100, 90, 70, 110)
> m<-glm(isPhone~money,family=binomial);summary(m)
Call:
glm(formula = isPhone ~ money, family = binomial)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-1.67078  -0.76661   0.07113   0.75438   1.55604

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.71278    6.59477  -1.473    0.141
money        0.09840    0.06574   1.497    0.134

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 13.863  on 9  degrees of freedom
Residual deviance: 10.481  on 8  degrees of freedom
AIC: 14.481

Number of Fisher Scoring iterations: 4

#### 結論

• 邏輯斯迴歸的迴歸式 $\log ( \text{odd} ) = \text{logit} (p) = \log \left( \frac{\Pr (y=1) }{1 - \Pr (y=1)} \right) = \beta_0 + \sum_{i=1}^{n} \beta_i x_i$ 所加總得到的數值是 log(勝算), 故 exp(logit(p)) = 勝算;
• 將二個勝算相除可以得到勝算比;
• 可由勝算回推機率, 公式為機率 = 勝算 / (1 + 勝算);
• 因為指數與對數在運算上的特性, log(勝算比) 常成為二個 logit(p) 相加減之結果.
• 若不解, 一切回歸到 “log(勝算) = log (p/(1 - p)) = 迴歸式” 及 “勝算比 = 勝算1 / 勝算2” 的基本概念, 再往下推導.

### Least-significant difference (LSD) in R

How to apply LSD (as a kind of multiple comparisons after ANOVA) in R?

A dataset for typical one-way ANOVA is given:

y <- rnorm(40)
x <- gl(4,10)
m <- aov(y~x)

Applying LSD by using function pairwise.t.test:

pairwise.t.test(y,x,p.adj="none")

Applying LSD by using function LSD.test from package agricolae:

library(agricolae)
LSD.test(m,"x",group=T) # with grouping
LSD.test(m,"x",group=F) # without grouping

### Specifying a reference level in R: function relevel

How to specify a specific level as a reference level in GNU R while applying (generalized) linear regression?

A dataframe dt is given:

y<-rpois(40,1)
x1<-gl(4,10)
x2<-gl(4,1,40)
dt <- data.frame(cbind(y,x1,x2))

Applying poisson regression:

m1<-glm(y~factor(x1)+factor(x2),data=dt)
summary(m1)

The output shows x1=1 and x2=1 are both reference level defined by R automatically. To switch the reference level from x1=1 to x1=2, the function relevel could be used:

m2<-glm(y~relevel(factor(dt\$x1),"2")+factor(x2),data=dt)
summary(m2)