tag:blogger.com,1999:blog-11013503612805937192024-02-07T03:25:26.515-08:00Apan’s NotesMac, LATEX, HTML, Statistics, and Programing.apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.comBlogger34125tag:blogger.com,1999:blog-1101350361280593719.post-89940126158369035242018-05-21T05:57:00.001-07:002018-05-21T06:01:56.742-07:00An R function: OLS/Robust scaled mass index<p class="abstract"></p>
<h4></h4>
<h5></h5>
<h4>R code</h4>
<script src="https://gist.github.com/chenpanliao/8ef652404d024c2c4191b8f159400a37.js"></script>
<a name='more'></a>
<h4>Example</h4>
<pre class="prettyprint">library(magrittr)
library(data.table)
library(ggplot2)
library(ggpubr)
set.seed(123)
dt <-
data.table(`Body length` = seq(1, 5, 0.5)) %>%
.[, `Body weight` := 2 * `Body length` ^ (3 + rnorm(length(`Body length`), sd = 0.001))] %>%
rbind(., data.table(`Body length` = 6, `Body weight` = 200)) %>%
.[order(`Body length`), ]
dt.SMI <-
scaledMassIndex(dt$`Body length`, dt$`Body weight`, x.0 = 2)
summary(dt.SMI)
g1 <-
ggplot(dt, aes(`Body length`, `Body weight`)) +
geom_point() +
geom_line(aes(x, y, color = Method),
data = attr(dt.SMI, "pred") %>% melt(., "x", c("y.ols", "y.rob"), "Method", "y")) +
scale_colour_discrete(labels = c("OLS", "Robust (M-estimation)"))
g2 <-
ggplot(dt, aes(log10(`Body length`), log10(`Body weight`))) +
geom_point() +
geom_line(
aes(log10(x), log10(y), color = Method),
data = attr(dt.SMI, "pred") %>% melt(., "x", c("y.ols", "y.rob"), "Method", "y")
) +
scale_colour_discrete(labels = c("OLS", "Robust (M-estimation)"))
g3 <-
ggplot(melt(dt.SMI, "x", c("SMI.ols", "SMI.rob"), "Method"),
aes(x, value)) +
geom_point(aes(color = Method)) +
xlab("Body length") +
ylab("SMI (at body length = 2)") +
scale_colour_discrete(labels = c("OLS", "Robust (M-estimation)"))
ggarrange(g1,
g2,
g3,
ncol = 1,
nrow = 3,
align = "hv")</pre>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid7lqT6ime3L0o_zQCiZfrLbDnIpNiKJo09uOwZ-HppzxDUEt19n2UBxLEhvstpDKTqMzG63YLBoHmFe2gPC_2g80uHXdHFpauC-oH48bdHsT9Kz1HuzXGAjSwwzdMUHBchIFnBWFSRu4/s1600/Rplot.png" /></p>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com5tag:blogger.com,1999:blog-1101350361280593719.post-29517081789695176332015-12-19T09:39:00.000-08:002015-12-21T09:40:55.015-08:00R 繪圖:柱狀圖的低階繪圖範例<p class="abstract">R 繪圖時的柱狀圖如何加上誤差線?如何使用線條填色?如何把圖示放在外部?本文提供一個例子。</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmz1RdA1oqvuMeLR_IDbkuPHCCHxbvqQupgdu6CoSNjxNKfF9xPVGB1yLz7zEg-ioLy2w1J4B6zzgUAUo7t9PsZvTfzwRg7PcwBRNBatjiYt-cv3u4DM-3GFyQ4eZULqG11wqv-A9faZ4/s800/R-bar-plot-example.png" /></p>
<p>以下是產生上圖的原始碼。</p>
<script src="https://gist.github.com/chenpanliao/416255b9a4ae03e71ac0.js"></script>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-46528560417242936112015-06-19T12:57:00.000-07:002015-06-19T12:58:40.091-07:00「在 95% 信心水準,抽樣誤差於正負 3.1% 以內」到底是什麼意思?<p>如果你對統計學有點興趣,可以往下看看這篇文。不知不覺寫得有點長。</p>
<p>故事:「某民調訪問 1000 名受訪者,有 40% 比例的人支持;在 95% 信心水準,抽樣誤差於正負 3.1% 以內。」現實中,民調常常出現相似的句子。我相信很多人看不懂這句話到底是什麼意思,或是誤解了這句話。所以現在來聊聊這到底是什麼鬼好了。</p>
<p>我們先把上面這個故事改成另一種情境好了。</p>
<p>例子:「你有一個大袋子。袋子裡有無數顆球。球只可能是黑色或白色。你希望知道一件事:到底黑球在大袋子中的個數比例是多少。現在,你抽出 1000 顆球,然後你發現有 400 顆是黑球(40%)。在 95% 信心水準,抽樣誤差於正負 3.1% 以內,也就是黑球比例的區間估計是 40% − 3.1% 到 40% + 3.1%。」</p>
<p>如果你可以看出這個例子其實和民調結果的故事是同一件事,那就可以往下看了。</p>
<p>在例子中,每一次調查就是抽出 n 顆球(例子中的 n = 1000)。袋子中黑球的比例,也就是我們希望知道的未知數,我們叫 p 好了。因為球實在太多了,我們不可能知道 p 的大小,只有抽出 n 顆球來估 p 的大小。</p>
<p>在統計學理,我們可以在還沒抽出球之前就預知「每次調查有多準」。什麼叫「有多準」呢?統計學的辦法是決定一個機率和設計一個區間,並估計 p 有多大的機率會座落在這個設計出來的區間之內。如果你每次調查抽出的球數 n 很大很大,那這個區間會很容易計算。</p>
<p>有點難懂?我試著換個方式說。例如,如果我可以做 100 次相同調查,我將會得到 100 個黑球比例和 100 個區間。我可以保證,這 100 次調查中,約有 95 個區間會包括了那個我們不可能知道的 p。這就是「在 95% 信心水準」的實際意義。</p>
<p>你也許會問,可不可能算 100 個區間有 99 個區間會包括 p 的調查?可以,但每個區間大小會變寬(由 ± 3.1% 變成 ± 4.1%),但結果也變得更值得信賴(因為區間將有 99% 的機率可以包括了真實的 p)。</p>
<p>如果你每次抽出更多顆球,那每個區間也都會變窄(因為抽出越多球就準嘛)。例如,每次抽出 5000 顆球,那這個區間大小會由 ± 3.1% 變成 ± 1.4%。</p>
<p>回到原本的例子。如果以相同的調查方法(一樣訪問 1000 名受訪者的情況下),並且重覆很多很多次(當然,實際上只進行了一次)。我們可以保證,那個我們想估計的支持度將座落在 95% 次所算出的區間之內。在故事中,這次調查結果的區間是 40% − 3.1% 到 40% + 3.1%。我們 *永遠不可能* 知道這次得到的區間會不會包括了真實的支持度 p,但它很可能會包括,因為這個調查方法就保證了進行 100 次約有 95 次是成立的。</p>
<p>有些人會把 40% − 3.1% 到 40% + 3.1% 理解成因為調查進行所造成的誤差,例如打錯資料、拒絕受訪、無效結果之類的。這都不是信賴區間的意義。</p>
<p>我並不想把這中間的數學式寫出來啦。只想說個概念。</p>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com1tag:blogger.com,1999:blog-1101350361280593719.post-11932876151486107022015-03-16T06:41:00.000-07:002015-03-16T06:41:00.093-07:00《初學 R 語言的 60 分鐘》投影片<p class="abstract">2015 年 3 月 14 日於蓮華池研究中心所舉行的台灣生態研究網年會中,我針對 R 語言的初學者所設計的課程投影片。</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/44171070" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/chenpanliao/slide-44171070" title="初學R語言的60分鐘" target="_blank">初學R語言的60分鐘</a> </strong> from <strong><a href="//www.slideshare.net/chenpanliao" target="_blank">Chen-Pan Liao</a></strong> </div>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0Taiwan, 南投縣魚池鄉五城村23.918712892212021 120.8851671596764823.918259392212022 120.88453665967648 23.91916639221202 120.88579765967648tag:blogger.com,1999:blog-1101350361280593719.post-54651204114305199542014-12-06T12:54:00.002-08:002014-12-06T16:13:19.260-08:00統計檢驗簡介:一個二項式分配的簡單例子<p>最近回應了一篇在 <a href="https://www.ptt.cc/bbs/Statistics/M.1417877983.A.554.html">PTT 統計板的問題:「[問題] 顯著水準的意思」</a>。</p>
<blockquote>
<pre>
※ 引述《mrlee112233 (小史)》之銘言:
: 不好意思
: 我想問一下顯著水準
: 有沒有簡單一點的解釋
: 我上網查過資料
: 但因爲我不是相關科系
: 所以我跟本看不懂在寫什麼
: 虛無假設.type l error .type ll error..等
: 跟本不懂= =
: 我文獻報告裏
: 他假設alpha=0.05 然後用二相分佈去做計算
: 所以想問有沒有比較淺顯易懂的解釋
: 謝謝
</pre>
</blockquote>
<p>以下則是我的回應,也歡迎從 <a href="https://www.ptt.cc/bbs/Statistics/M.1417897430.A.89F.html">PTT 上我原本的回應</a>觀看相同的內容。</p>
<blockquote>
<p>我會建議你從教科書或聽課來學習這一連串的概念,但我也可以體會非相關科系的朋友不容易了解這些概念,所以寫一個例子給你參考,並省略一些太艱澀的話語。</p>
<p>假如有一個袋子,裡有無數顆球,其中球不是黑的就是白的。你可以獨立抽出 10 顆球,並記錄每顆球是什麼顏色。在抽出球之前,你可以設立一個虛無假說:「袋中的黑球和白球比例是 1:1」。當然,你並不知道這是不是對的,但就先這麼假設吧。</p>
<p>假如這個虛無假說是真實的,又因為你會抽出 10 顆球,所以你可以預先算出你抽出 0 顆黑球到 10 個顆黑球的機率。例如,抽出 0 顆黑球的機率是 0.0009766,抽出 1 顆黑球的機率是 0.009766,抽出 5 顆黑球的機率是 0.2461,…… 記得,這在還沒有抽球之前,就可以算得的。</p>
<p>接下來,你可以設立一個和虛無假說相對的假說,叫對立假說。我們就說這個對立假說是「袋中的黑球和白球比例不是 1:1」。這裡會跳出單尾和雙尾檢驗的概念,但我就不多說了,反正我們就做雙尾檢驗吧。另外,我們也要設定顯著水準,常見為 0.05。</p>
<p>這時候,你可以抽出 10 顆球了。如果你抽出 5 顆黑球和 5 顆白球,那你可能會相信虛無假說是對的。但如果你抽出 0 顆黑球和 10 顆白球,或是 10 顆黑球和 0 顆白球,那你可能會大大地懷疑虛無假說,而相信對立假說才是對的。問題來了:到底要多麼地違背虛無假說,你才相信對立假說?這就是顯著水準 = 0.05 的作用。</p>
<p>例如,如果你真的抽出 1 顆黑球和 9 顆白球好了。發生這種情況的機率是 0.009766(假如虛無假說是對的),另外,還有三種情況和 1 黑 9 白一樣程度或更違反虛無假說,分別是 0 黑 10 白(機率是 0.0009766)、9 黑 1 白(機率是 0.009766)、10 黑 0 白(機率是 0.0009766)。這四種情況的機率總共是 0.02148,稱為 p-value。</p>
<p>這時候,因為這個 p-value = 0.02148 比你設定的顯著水準小,所以你可以下一個結論:在 0.05 的顯著水準下,虛無假說不被接受。</p>
<p>當然,你也可能猜錯了,因為即使虛無假說是真的,你還是有個很小的機率抽到 1 黑 9 白或更偏激的結果。這種猜錯的情況就稱為型一錯誤。不過因為通常我們會設定一個滿小的顯著水準,所以型一錯誤不甚容易發生。</p>
<p>換一個情況,假如你抽出的是 2 黑 8 白呢?這時候,下列的所有情況的機率和就是 p-value:0 黑 10 白(機率是 0.0009766)、1 黑 9 白(機率是 0.009766)、2 黑 8 白(機率是 0.0439)、8 黑 2 白(機率是 0.0439)、9 黑 1 白(機率是 0.009766)、10 黑 0 白(機率是 0.0009766)。加起來的機率是 p-value = 0.1094,比顯著水準大了。這時候,你可以下一個結論:沒有證據指出虛無假說錯了,在 0.05 的顯著水準下。</p>
<p>再一次地,你也可能猜錯了,因為,說不定虛無假說並不正確(例如真實情況是 30% 黑球 70% 白球)。這種猜錯的情形就叫型二錯誤。要儘量避免型二錯誤是可能的(關鍵字:檢定力),但就不多說了。</p>
<p>整個故事其實不複雜,好啦,我寫得讓它變得有點複雜了。在虛無假說為真的條件下,取得目前及更偏離虛無假說的結果之機率叫 p-value,並拿它與顯著水準相比較。</p>
<p> 因此,顯著水準被當做一個門檻、一個標準,來決定我們拒絕或不拒絕虛無假說。</p>
<p>我原本以為可以用最簡單的例子說明這一連串的概念,結果還是用了這麼多字。算是騙 p 幣好了。</p>
</blockquote>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-72239433285381263332014-01-02T18:11:00.000-08:002014-09-14T01:18:08.876-07:00An R function: Exact one/paired sample test for mean(s)<p class="abstract">Here I provide a <em>exact</em> one sample or paired sample test for mean(s), which is the <em>exact</em> version of permutation test to compare a single sample against a mean or to compare paired samples’ differences against a mean.</p>
<p>
The algorithm in this function is based on Bryan F. J. Manly. 1997.
<em>Randomization, bootstrap and Monte Carlo methods in biology</em>. 2nd edition. pp. 91-97.
That is, all possible permutations (i.e. all possible cases of exchange between $x_i$ and $\mu$ in one-sample test or all possible cases of exchange between $x_{1i}$ and $x_{2i}$) in paired-sample test are processed to calculate a <em>exact</em> p-value.
</p>
<h4>R code</h4>
<script src="https://gist.github.com/chenpanliao/75ecb122e3eb814176c2.js"></script>
<h4>Arguments</h4>
<p>The usage of this function <code>exactOneOrPairedSampleTest()</code> is very similar to the R built-in function <code>t.test()</code>. There are four arguments:</p>
<ul>
<li><code>x1</code>: a numeric vector specifying $x_{1i}$</li>
<li><code>x2 = NULL</code>: a numeric vector specifying $x_{2i}$ (in paired-sample case)</li>
<li><code>mu = 0</code>: a number specifying true mean or true difference $\mu$</li>
<li><code>alternative = c("t","g","l")</code>a single character specifying $H_0$: true mean of $x_1$ or mean of $x_{1i} - x_{2i}$ is equal, greater or less than $\mu$, respectively.</li>
</ul>
<h4>Example 1: one-sample test</h4>
<p>Let $x_i = \{43,67,64,64,51,53,53,26,36,48,34,48,6 \}$, $i=1 \ldots 13$:</p>
<pre class="prettyprint">> x <- c(43,67,64,64,51,53,53,26,36,48,34,48,6)</pre>
<p>Now we can call the above function <code>exactOneOrPairedSampleTest()</code> to test $H_0$: true mean of $x_i = 56$ against $H_A$: true mean of $x_i \neq 56$:</p>
<pre class="prettyprint">> test1 <- exactOneOrPairedSampleTest(x, alternative="t", mu=56)
> test1
Exact one sample test
Alternative hypothesis: true mean is not equal to 56
mean(x) - mu = -10.3846153846154
Number of total permutation = 8192
Number of rejected permutation = 364
P-value = 0.04443359</pre>
<p>The results show that:</p>
<ul>
<li>$\sum_{i=1}^{13} x_i / 13 - \mu = 45.61538 - 56 = -10.38462$;</li>
<li>Totally $8192 = 2^{13}$ permutations are processed;</li>
<li>364 permutations do not support $H_0$;</li>
<li>$P = 364/2^{13} = 0.0444$.</li>
</ul>
<p>We can also call the function <code>hist.exactOneOrPairedSampleTest()</code>:</p>
<pre class="prettyprint">> hist(test1)</pre>
<!--
<p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3s_zKlxO_KPhK8qkuibo-JHDepQBPkf5XG1eyXTGRLHC7EZN8bZwWWypNf7PStL7oVvgJFUWtQQ5ZpQELCZXeJkwxTGJwdtUVr6FET3gFp6N5f-uRSu8CDvEny44D6BlWEiSN8rWBUBo/s1600/Rplot1.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3s_zKlxO_KPhK8qkuibo-JHDepQBPkf5XG1eyXTGRLHC7EZN8bZwWWypNf7PStL7oVvgJFUWtQQ5ZpQELCZXeJkwxTGJwdtUVr6FET3gFp6N5f-uRSu8CDvEny44D6BlWEiSN8rWBUBo/s400/Rplot1.png" /></a></p>
-->
<p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3s_zKlxO_KPhK8qkuibo-JHDepQBPkf5XG1eyXTGRLHC7EZN8bZwWWypNf7PStL7oVvgJFUWtQQ5ZpQELCZXeJkwxTGJwdtUVr6FET3gFp6N5f-uRSu8CDvEny44D6BlWEiSN8rWBUBo/s1600/Rplot1.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3s_zKlxO_KPhK8qkuibo-JHDepQBPkf5XG1eyXTGRLHC7EZN8bZwWWypNf7PStL7oVvgJFUWtQQ5ZpQELCZXeJkwxTGJwdtUVr6FET3gFp6N5f-uRSu8CDvEny44D6BlWEiSN8rWBUBo/s1600/Rplot1.png" /></a></p>
<p>As you can see, the vertical lines $x=\pm 10.38462$ shows the critical boundary of rejecting $H_0$. Note that it is a two-tail test.</p>
<p>Finally, you may call the returned object:</p>
<pre class="prettyprint">> str(test1)
List of 14
$ x1 : num [1:13] 43 67 64 64 51 53 53 26 36 48 ...
$ x2 : NULL
$ x1.name : chr "x"
$ x2.name : chr "NULL"
$ n : int 13
$ mu : num 56
$ test.0 : num -10.4
$ is.onesample: logi TRUE
$ alternative : chr "t"
$ test.perm : num [1:8192] -10.38 -8.38 -12.08 -10.08 -11.62 ...
$ DF :'data.frame': 13 obs. of 3 variables:
..$ x1 : num [1:13] 43 67 64 64 51 53 53 26 36 48 ...
..$ mu : num [1:13] 56 56 56 56 56 56 56 56 56 56 ...
..$ diff: num [1:13] -13 11 8 8 -5 -3 -3 -30 -20 -8 ...
$ N : num 8192
$ p.value : num 0.0444
$ rejected.N : int 364
- attr(*, "class")= chr "exactOneOrPairedSampleTest"</pre>
<p>to get more details of the results if you need them.</p>
<h4>Example 2: paired-sample test</h4>
<p>Let $x_{1i} = \{92, 0,72,80,57,76,81,67,50,77,90\}$ and $x_{2i} = \{43,67,64,64,51,53,53,26,36,48,34\}$, where each $i$ is paired and $i=1 \ldots 11$:</p>
<pre class="prettyprint">> x1 <- c(92, 0,72,80,57,76,81,67,50,77,90)
> x2 <- c(43,67,64,64,51,53,53,26,36,48,34)</pre>
<p>Now we can call the function <code>exactOneOrPairedSampleTest()</code> to test $H_0$: true mean of $x_{1i} - x_{2i} \leq 10$ against $H_A$: true mean of $x_{1i} - x_{2i} > 10$:</p>
<pre class="prettyprint">> test2 <- exactOneOrPairedSampleTest(x1, x2, alternative="g", mu=10)
> # equivalent to test2 <- exactOneOrPairedSampleTest(x1 - x2, alternative="g", mu=10)
> test2
Exact paired sample test
Alternative hypothesis: means of x1 - x2 is greater than 10
mean((x1)-(x2)) - mu = 8.45454545454546
Number of total permutation = 2048
Number of rejected permutation = 445
P-value = 0.2172852 </pre>
<p>The results show that:</p>
<ul>
<li>$\sum_{i=1}^{11} (x_{1i}-x_{2i}) / 11 - 10 = 8.45455$;</li>
<li>Totally $2048 = 2^{11}$ permutations are processed;</li>
<li>445 permutations do not support $H_0$;</li>
<li>$P = 445/2^{11} = 0.2172852$.</li>
</ul>
<p>We can also call the function <code>hist.exactOneOrPairedSampleTest()</code>:</p>
<pre class="prettyprint">> hist(test2)</pre>
<p>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj16YvC8NPQ2xi6Mm8VW1S1Jxzl9dSrsVaovSI9tuGF-An0RYVkQ_80qiR9ES89HSVZWqNZjUDLNVU1oDcJWfelKNsVdJZkf6vuFajGNKXzAy_M94k8QpWcJ-WZrMYJsC5iNnxbmGSAD-k/s1600/Rplot2.png" />
</p>
<p>As you can see, the vertical lines $x=8.45455$ shows the critical boundary of rejecting $H_0$. Note that this is a right-tail test.</p>
<p>Again, you may call the returned object:</p>
<pre class="prettyprint">> str(test2)
List of 14
$ x1 : num [1:11] 92 0 72 80 57 76 81 67 50 77 ...
$ x2 : num [1:11] 43 67 64 64 51 53 53 26 36 48 ...
$ x1.name : chr "x1"
$ x2.name : chr "x2"
$ n : int 11
$ mu : num 10
$ test.0 : num 8.45
$ is.onesample: logi FALSE
$ alternative : chr "g"
$ test.perm : num [1:2048] 8.45 1.36 22.45 15.36 8.82 ...
$ DF :'data.frame': 11 obs. of 3 variables:
..$ x1 : num [1:11] 92 0 72 80 57 76 81 67 50 77 ...
..$ x2 : num [1:11] 43 67 64 64 51 53 53 26 36 48 ...
..$ diff: num [1:11] 39 -77 -2 6 -4 13 18 31 4 19 ...
$ N : num 2048
$ p.value : num 0.217
$ rejected.N : int 445
- attr(*, "class")= chr "exactOneOrPairedSampleTest"</pre>
<p>to get more details of the results if you need them.</p>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-68287082587366297002013-12-31T14:44:00.001-08:002014-09-07T12:36:22.896-07:00An R function for bootstrap confidence intervals<p class="abstract">Here I introduce a R function made by myself to calculate several confidence intervals of mean/median including exact CI, basic bootstrap CI, percentile bootstrap CI & studentized bootstrap CI.</p>
<h4>R code</h4>
<script src="https://gist.github.com/chenpanliao/d93fcbaf5117dd53aaa8.js"></script>
<!--
<pre class="prettyprint"># Copyright 2013 Chen-Pan Liao <andrew.43@gmail.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
## main function
bootCI <- function(
x,
alpha=0.05,
alternative=c("t", "l", "g"), # see arg "alternative" in help(t.test)
B=1999, # number of bootstrap samples
quantileAlgorithm = 7 # passed to quantile(type)
){
#validation
if(mode(x) != "numeric"){
stop("mode(x) must be \"numeric\"")
}
if(class(x) != "numeric" && class(x) != "integer"){
stop("class(x) must be \"numeric\" or \"integer\"")
}
if(length(x) < 2){
stop("length(x) must be larger than 1")
}
# initiation
alternative <- alternative[1]
stat.length <- length(x)
stat.mean <- mean(x)
stat.sem <- sqrt(var(x) / stat.length)
# alternative
probs <- c(alpha/2, 1-alpha/2)
if(alternative == "l") probs <- c(0, 1-alpha)
if(alternative == "g") probs <- c(alpha, 1)
# bootstrap
boot.mat <- matrix(0.0, B+1, stat.length)
boot.mat[1,] <- x
for (i in 2:(B+1)) {
boot.mat[i,] <- sample(x, replace=T)
}
# exact (traditional) CI based on t distribution
CI.exact <- stat.mean + qt(probs, stat.length - 1) * stat.sem
names(CI.exact) <- paste(probs*100, "%", sep="")
# basic bootstrap CI
CI.basic <- 2 * stat.mean - quantile(
apply(boot.mat, 1, mean), probs=1-probs, type=quantileAlgorithm
)
names(CI.basic) <- rev(names(CI.basic))
# percentile bootstrap CI
CI.percentile <- quantile(
apply(boot.mat, 1, mean), probs=probs, type=quantileAlgorithm
)
# studentized bootstrap CI
tmp.mean <- apply(boot.mat, 1, mean)
tmp.sem <- apply(
boot.mat, 1, function(.){sqrt(var(.) / length(.))}
)
tmp.t <- (tmp.mean - stat.mean) / tmp.sem
expr <- expression(
CI.studentized <-
stat.mean - quantile(
tmp.t, probs=1-probs, type=quantileAlgorithm
) * stat.sem
)
a.try <- try(eval(expr), T)
if("try-error" %in% class(a.try)) {
warning("studentized bootstrap CI cannot
be done well due to boostrap sem = 0")
}
names(CI.studentized) <- rev(names(CI.studentized))
# exceptions of one-tail
if(alternative == "g") {
CI.exact[2] = Inf
CI.basic[2] = Inf
CI.percentile[2] = Inf
CI.studentized[2] = Inf
} else if(alternative == "l") {
CI.exact[1] = -Inf
CI.basic[1] = -Inf
CI.percentile[1] = -Inf
CI.studentized[1] = -Inf
}
output <- list(
x = x,
alpha = alpha,
alternative = alternative,
B = B,
CI.exact = CI.exact,
CI.basic = CI.basic,
CI.percentile = CI.percentile,
CI.studentized = CI.studentized
)
class(output) <- "bootCI"
return(output)
}
## print function
print.bootCI <- function(w){
cat("Summary of x\n")
print(summary(w$x))
cat("CIs of mu\n")
mat <-
rbind(w$CI.exact, w$CI.basic, w$CI.percentile, w$CI.studentized)
rownames(mat) <-
c("$CI.exact", "$CI.basic", "$CI.percentile", "$CI.studentized")
print(mat)
}</pre>
-->
<h4>Arguments</h4>
<p>There are five arguments in this R function <code>bootCI()</code>:</p>
<ul>
<li><code>x</code> — a numeric vector specifying $x_i$</li>
<li><code>alpha = 0.05</code> — a numeric specifying $\alpha$ value</li>
<li><code>alternative = c("t", "l", "g")</code> — a single character string specifying two-sided, less or greater tail</li>
<li><code>B = 1999</code> — a integer specifying number of bootstrap samples</li>
<li><code>quantileAlgorithm = 7</code> — a integer specifying the argument <code>type</code> passed to <code>quantile()</code></li>
</ul>
<h4>Examples</h4>
<p>We can define a numeric vector $x = [5\, 2\, 3\, 6\, 8\, 19\, 1.5]$ including $N=7$ items as <code>num</code>:</p>
<pre>> num <- c(5, 2, 3, 6, 8, 19, 1.5)</pre>
<p>After loading the above function, we can call the function to calculate CIs:</p>
<pre class="prettyprint">> bootCI(num)
Summary of x
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.500 2.500 5.000 6.357 7.000 19.000
CIs of mu
2.5% 97.5%
$CI.exact 0.7778728 11.936413
$CI.basic 1.7125000 9.714286
$CI.percentile 3.0000000 11.001786
$CI.studentized 2.6785623 17.669412</pre>
<p>The above results show that</p>
\begin{aligned}
\text{exact CI} & = ( \hat{\mu} + t_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\
\text{basic CI} & = ( 2\hat{\mu} - \hat{\mu}_{1-\frac{\alpha}{2}}^{*} \;,\; 2\hat{\mu} - \hat{\mu}_{\frac{\alpha}{2}}^{*} ) \\
\text{percentile CI} & = ( \hat{\mu}_{\frac{\alpha}{2}}^* \;,\; \hat{\mu}_{1-\frac{\alpha}{2}}^* ) \\
\text{studentized CI} & = ( \hat{\mu} + t^*_{\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} \;,\; \hat{\mu} + t^*_{1-\frac{\alpha}{2}} \cdot \hat{se}_{\hat{\mu}} ) \\
\end{aligned}
<p>where $\hat{\mu}$ is mean of $x$, $t_{\frac{\alpha}{2}}$ is lower $\alpha/2$ critical value for the $t$ distribution given $\text{df} = N-1$, $\hat{se}_{\hat{\mu}}$ is the standard error of the mean of $x$, $\hat{\mu}_{\frac{\alpha}{2}}^*$ is $\alpha/2$ percentile of the mean of bootstrapped $x$, and $t^*_{\frac{\alpha}{2}} = \frac{\hat{\mu^*}-\hat{\mu}}{\hat{se}^*_{\hat{\mu^*}}}$ is t-value based on bootstrapped $x$. See</p>
<ul>
<li>J. Carpenter and J. Bithell (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statist. Med. 19:1141-1164</li>
<li>Wikipedia contributors, 'Bootstrapping (statistics)', <i>Wikipedia, The Free Encyclopedia,</i> 17 December 2013, 21:28 UTC, <<a class="external free" href="//en.wikipedia.org/w/index.php?title=Bootstrapping_(statistics)&oldid=586549893">http://en.wikipedia.org/w/index.php?title=Bootstrapping_(statistics)&oldid=586549893</a>> [accessed 31 December 2013]</li>
</ul>
<p>for more details.</p>
<p>We can assign a different $\alpha$, direction or number of bootstrap samples:</p>
<pre class="prettyprint">> bootCI(num, alpha=0.01)
Summary of x
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.500 2.500 5.000 6.357 7.000 19.000
CIs of mu
0.5% 99.5%
$CI.exact -2.0962642 14.81055
$CI.basic -0.1428571 10.28571
$CI.percentile 2.4285714 12.85714
$CI.studentized 1.1394952 25.22030
> bootCI(num, alternative="g")
Summary of x
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.500 2.500 5.000 6.357 7.000 19.000
CIs of mu
5% 100%
$CI.exact 1.926445 Inf
$CI.basic 2.714286 Inf
$CI.percentile 3.285714 Inf
$CI.studentized 3.332740 Inf
> bootCI(num, B=99)
Summary of x
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.500 2.500 5.000 6.357 7.000 19.000
CIs of mu
2.5% 97.5%
$CI.exact 0.7778728 11.936413
$CI.basic 1.6053571 9.330357
$CI.percentile 3.3839286 11.108929
$CI.studentized 2.5532298 15.532786</pre>
<p>The returned list may be helpful for someone:</p>
<pre class="prettyprint">> myCI <- bootCI(num)
> myCI$CI.percentile
2.5% 97.5%
3.00000 10.85714
> str(myCI)
List of 8
$ x : num [1:7] 5 2 3 6 8 19 1.5
$ alpha : num 0.05
$ alternative : chr "t"
$ B : num 1999
$ CI.exact : Named num [1:2] 0.778 11.936
..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
$ CI.basic : Named num [1:2] 1.86 9.71
..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
$ CI.percentile : Named num [1:2] 3 10.9
..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
$ CI.studentized: Named num [1:2] 2.65 17.73
..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
- attr(*, "class")= chr "bootCI"</pre>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-60435353248682212702013-12-14T19:34:00.001-08:002014-01-02T09:24:30.022-08:00Paired t-test vs general mixed model with R<h4>Dataset</h4>
<p>Consider the following dataset:<p>
<pre class="prettyprint">> x1 <- c(3,4,3,5,6,6,6,7,7,6,8,9,11)
> x2 <- c(3,4,5,5,8,6,8,7,9,8,8,9,14)
> cbind(x1,x2)
x1 x2
[1,] 3 3
[2,] 4 4
[3,] 3 5
[4,] 5 5
[5,] 6 8
[6,] 6 6
[7,] 6 8
[8,] 7 7
[9,] 7 9
[10,] 6 8
[11,] 8 8
[12,] 9 9
[13,] 11 14</pre>
<p>where <code>x1</code> and <code>x2</code> are paired sample.
</p>
<h4>Paired <em>t</em>-test </h4>
<p>The process of a typical paired <em>t</em>-test with R is following:</p>
<pre class="prettyprint">> t.test(x1, x2, paired=T)
Paired t-test
data: x1 and x2
t = -3.1225, df = 12, p-value = 0.008814
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.6977786 -0.3022214
sample estimates:
mean of the differences
-1</pre>
<p>The results show that 1) average of $x_1 - x_2$ equals to -1; 2) $t_{12} = -3.1225$, $p = 0.008814$.</p>
<h4>Mixed model for paired sample</h4>
<p>We can build a mixed model for this paired-sample example. Consider the following dataset:</p>
<pre class="prettyprint">> y <- c(x1,x2)
> trt <- gl(2,13,26)
> subject <- gl(13,1,26)
> cbind(y, trt, subject)
y trt subject
[1,] 3 1 1
[2,] 4 1 2
[3,] 3 1 3
[4,] 5 1 4
[5,] 6 1 5
[6,] 6 1 6
[7,] 6 1 7
[8,] 7 1 8
[9,] 7 1 9
[10,] 6 1 10
[11,] 8 1 11
[12,] 9 1 12
[13,] 11 1 13
[14,] 3 2 1
[15,] 4 2 2
[16,] 5 2 3
[17,] 5 2 4
[18,] 8 2 5
[19,] 6 2 6
[20,] 8 2 7
[21,] 7 2 8
[22,] 9 2 9
[23,] 8 2 10
[24,] 8 2 11
[25,] 9 2 12
[26,] 14 2 13</pre>
<p>
where <code>y</code> is dependent variable, <code>trt</code> is a two-level fixed factor, and <code>subject</code> is experimental units considered as blocks (random factor). A mixed model can fit this dataset:</p>
\[
y_{ij} = \tau_i + b_{j} + \varepsilon_{ij}
\]
<p>
where $i=\{1,2\}$ showing two-level in fixed factor $\tau$, $j=\{1,2,\ldots , 13\}$ showing 13 subjects as a random factor $b$.
</p>
<p>We can find the solution of this mixed model by using <code>lme4:lmer()</code>:</p>
<pre class="prettyprint">> require(lme4)
> mod <- lmer(y ~ trt + (1|subject) )
> summary(mod)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ trt + (1 | subject)
REML criterion at convergence: 98.5708
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 5.8590 2.4205
Residual 0.6667 0.8165
Number of obs: 26, groups: subject, 13
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.2308 0.7085 8.794
trt2 1.0000 0.3203 3.123
Correlation of Fixed Effects:
(Intr)
trt2 -0.226</pre>
<p>
As you can see, the estimate of <code>trt2</code> equals 1 showing the average of $x_2 - x_1$; the statistic $t=3.123$ is equivalent to the <em>t</em>-value I get in paired <em>t</em>-test.</p>
<p>We can find the <em>p</em>-value by using <code>car:Anova()</code>:</p>
<pre class="prettyprint">> require(car)
> Anova(mod, type=3, test.statistic="F")
Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
Response: y
F Df Df.res Pr(>F)
(Intercept) 77.34 1 13.288 6.636e-07 ***
trt 9.75 1 12.000 0.008814 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</pre>
<p>and then we can find that $F_{1,12} = 9.75 = 3.123^2$ and $p=0.008814$. These outcomes are all equivalent to what I got in paired <em>t</em>-test.</p>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-3471142602430371882013-11-01T06:40:00.001-07:002014-01-02T09:36:10.389-08:00Multiple comparisons for generalized linear model in R<h4>An example</h4>
<H5>Defining y and x</h5>
<pre class="prettyprint">> y <- c (rpois(10,1), rpois(10,2), rpois(10,3), rpois(10,5))
> y
[1] 0 1 0 2 1 2 0 1 1 0 3 2 2 4 2 3 3 1 1 1 3 0 4 4 2 3 1 4 3 6 3 8 6 3 2 6 5 5 2 8
> x <- gl(4,10)
> x
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4
Levels: 1 2 3 4</pre>
<h5>Fitting a Poisson regression</h5>
<pre class="prettyprint">> f <- glm(y~x, family=poisson)
> summary(f)
Call:
glm(formula = y ~ x, family = poisson)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.44949 -0.90724 0.04533 0.52699 1.52242
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.2231 0.3536 -0.631 0.527945
x2 1.0116 0.4129 2.450 0.014277 *
x3 1.3218 0.3979 3.322 0.000895 ***
x4 1.7918 0.3819 4.692 2.71e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 67.967 on 39 degrees of freedom
Residual deviance: 34.884 on 36 degrees of freedom
AIC: 142.6
Number of Fisher Scoring iterations: 5</pre>
<h5>Multiple comparisons</h5>
<pre class="prettyprint">> # install.packages("multcomp")
> require(multcomp)
> f.mc <- glht(f, linfct = mcp(x = "Tukey"))
> summary(f.mc)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: glm(formula = y ~ x, family = poisson)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
2 - 1 == 0 1.0116 0.4129 2.450 0.06433 .
3 - 1 == 0 1.3218 0.3979 3.322 0.00481 **
4 - 1 == 0 1.7918 0.3819 4.692 < 0.001 ***
3 - 2 == 0 0.3102 0.2807 1.105 0.67729
4 - 2 == 0 0.7802 0.2575 3.030 0.01198 *
4 - 3 == 0 0.4700 0.2327 2.019 0.17303
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
> confint(f.mc)
Simultaneous Confidence Intervals
Multiple Comparisons of Means: Tukey Contrasts
Fit: glm(formula = y ~ x, family = poisson)
Quantile = 2.5471
95% family-wise confidence level
Linear Hypotheses:
Estimate lwr upr
2 - 1 == 0 1.01160 -0.04002 2.06322
3 - 1 == 0 1.32176 0.30822 2.33529
4 - 1 == 0 1.79176 0.81905 2.76447
3 - 2 == 0 0.31015 -0.40481 1.02512
4 - 2 == 0 0.78016 0.12436 1.43596
4 - 3 == 0 0.47000 -0.12281 1.06282
> coef(f.mc)
2 - 1 3 - 1 4 - 1 3 - 2 4 - 2 4 - 3
1.0116009 1.3217558 1.7917595 0.3101549 0.7801586 0.4700036
> vcov(f.mc)
2 - 1 3 - 1 4 - 1 3 - 2 4 - 2 4 - 3
2 - 1 1.704542e-01 1.249996e-01 1.249996e-01 -4.545455e-02 -4.545455e-02 9.714451e-17
3 - 1 1.249996e-01 1.583330e-01 1.249996e-01 3.333333e-02 6.938894e-17 -3.333333e-02
4 - 1 1.249996e-01 1.249996e-01 1.458330e-01 -2.775558e-17 2.083333e-02 2.083333e-02
3 - 2 -4.545455e-02 3.333333e-02 -2.775558e-17 7.878788e-02 4.545455e-02 -3.333333e-02
4 - 2 -4.545455e-02 6.938894e-17 2.083333e-02 4.545455e-02 6.628788e-02 2.083333e-02
4 - 3 9.714451e-17 -3.333333e-02 2.083333e-02 -3.333333e-02 2.083333e-02 5.416667e-02</pre>
<h4>Read more</h4>
<ul>
<li><a href="http://www.stat.wmich.edu/wang/664/egs/Rmice.html"><em>Simultaneous Inferences in R — Using package:multcomp</em> by J. C. Wang</a></li>
<li><a href="http://cran.r-project.org/web/packages/multcomp/vignettes/generalsiminf.pdf"><em>Simultaneous Inference in General Parametric Models</em> by T. Hothorn et al.</a></li>
</ul>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-28110732301253602122013-08-07T10:39:00.000-07:002014-01-13T19:10:29.438-08:00在 R 中配置 Generalized Logit Model<div lang="zh-Hant">
<p>已知資料 <code>data.csv</code> 如下: </p>
<pre class="prettyprint">treat, r.a, r.b, r.c
t1, 10, 20, 5
t2, 11, 22, 5
t3, 21, 40, 11
t4, 21, 5, 10</pre>
<p>其中 <code>treat</code> 為具有 4 個水準的固定因子; <code>r.a</code>, <code>r.b</code> 及 <code>r.c</code> 分別為 3 個類型的反應類別, 其下為頻率.</p>
<p>以下為 R code:</p>
<pre class="prettyprint"># 系統第一次運作請先安裝 package nnet
install.packages("nnet")
# 引入 package nnet
require(nnet)
# 讀取資料
dat <- read.csv("data.csv")
# 建立 generalized logit model
m <- multinom(
cbind(r.a, r.b, r.c) ~ treat, # 其中 r.a 寫在最前面即可作為比較基準
data=dat,
contrasts=list(treat=contr.treatment), # 其中採用 treatment contrast 方便解釋
Hess=T # 其中採用 Hessian Matrix 求標準誤
)
# 係數與標準誤
(s.m <- summary(m))
# odds ratio
exp(s.m$coefficients)
# chi-square
(chisq <- (s.m$coefficients / s.m$standard.error)^2)
# p-value
pchisq(chisq, df=1, lower.tail=F)</pre>
<p>可得到以下結果: </p>
<pre class="prettyprint">> (s.m <- summary(m))
Call:
multinom(formula = cbind(r.a, r.b, r.c) ~ treat, data = dat,
contrasts = list(treat = contr.treatment), Hess = T)
Coefficients:
(Intercept) treat2 treat3 treat4
r.b 0.6931308 2.158365e-05 -0.04877868 <strong>-2.12818025</strong>
r.c -0.6931750 -9.525513e-02 0.04656602 -0.04876199
Std. Errors:
(Intercept) treat2 treat3 treat4
r.b 0.3872968 0.5351291 0.4718241 0.6305649
r.c 0.5477240 0.7687048 0.6622145 0.6690451
Residual Deviance: 346.6049
AIC: 362.6049
> # odds ratio
> exp(s.m$coefficients)
(Intercept) treat2 treat3 treat4
r.b 1.9999671 1.000022 0.9523919 <strong>0.1190537</strong>
r.c 0.4999861 0.909141 1.0476672 0.9524078
> # chi-square
> (chisq <- (s.m$coefficients / s.m$standard.error)^2)
(Intercept) treat2 treat3 treat4
r.b 3.202894 1.626795e-09 0.010688081 <strong>11.390878935</strong>
r.c 1.601630 1.535527e-02 0.004944712 0.005311928
> # p-value
> pchisq(chisq, df=1, lower.tail=F)
(Intercept) treat2 treat3 treat4
r.b 0.07350811 0.9999678 0.9176589 <strong>0.000738056</strong>
r.c 0.20567234 0.9013815 0.9439400 0.941899243</pre>
<p>可以發現, 唯一達顯著之係數為 <code>r.b</code> 對比 <code>r.a</code> 時在 <code>treat4</code> 對比 <code>treat1</code> 的情況, 其係數 $\beta = -2.12818025$, 勝算比 $= \exp(-2.12818025) = 0.1190537$, 檢定量 $\chi^2_1 = 11.390878935$, $p = 0.000738056$, 顯示 <code>treat4</code> 比 <code>treat1</code> 的 <code>r.b / r.a</code> 達顯著差異. 由實際資料 $\frac{5/21}{20/10} = 0.1190476$ 也和模型所預測的勝算比 0.1190537 相當接近.</p>
</div>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-76218378329517754062013-06-05T20:41:00.002-07:002014-10-17T12:48:00.438-07:00G-test for 2-way contingency data in R<p class="abstract">A R function applying <em>G</em>-test for 2-way contingency data in R is demonstrated. Yates’ correction and Williams’ correction are also included in the output.</p>
<h4>R Source</h4>
<script src="https://gist.github.com/chenpanliao/90a3062b5e54347768d3.js"></script>
<!--
<pre class="prettyprint"># Copyright 2013 Chen-Pan Liao <andrew.43@gmail.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
g.test.2way <- function(
x # must be a 2-way xtabs/matrix
)
{
## initiating
data.o <- x
sum.total <- sum(x)
sum.row <- rowSums(x)
sum.col <- colSums(x)
n.row <- dim(x)[1]
n.col <- dim(x)[2]
chisq.df <- (n.col - 1) * (n.row - 1)
## check
if(n.row == 1) {stop("Numbers of row must be larger than 1")}
if(n.col == 1) {stop("Numbers of column must be larger than 1")}
if(mode(data.o) != "numeric") {stop("Mode of x must be numeric")}
if(
sum(class(data.o) != c("matrix", "xtabs")) == 0
) {
stop("Class of x must be matrix or xtabs")
}
## matrix of expected value
data.e <- matrix(NA, ncol = n.col, nrow = n.row)
for (m in 1:n.row){
for (n in 1:n.col){
data.e[m,n] <- sum.col[n] * sum.row[m] / sum.total
}
}
## matrix of ovserved value with Yates' correction
data.o.y <- matrix(NA, ncol = n.col, nrow = n.row)
for (m in 1:n.row){
for (n in 1:n.col){
if (data.e[m,n] - data.o[m,n] > 0){
data.o.y[m,n] <- data.o[m,n] + 0.5
} else if (data.e[m,n] - data.o[m,n] < 0){
data.o.y[m,n] <- data.o[m,n] - 0.5
} else {
data.o.y[m,n] <- data.o[m,n]
}
}
}
## q_min for Williams' correction
q.min <- 1 +
(sum.total * sum(1 / sum.col) - 1) *
(sum.total * sum(1 / sum.row) - 1) /
(6 * (chisq.df) * sum.total)
## G-value
isnot.na.index <- !is.na(data.o * log(data.o / data.e))
g <- 2 * sum((data.o * log(data.o / data.e))[isnot.na.index])
isnot.na.index.y <- !is.na(data.o.y * log(data.o.y / data.e))
g.y <- 2 * sum((data.o.y * log(data.o.y / data.e))[isnot.na.index])
g.w <- g / q.min
## P-value
p.g <- 1 - pchisq(g, chisq.df)
p.g.y <- 1 - pchisq(g.y, chisq.df)
p.g.w <- 1 - pchisq(g.w, chisq.df)
## return
z <- list(
data.observed = data.o,
data.expected = data.e,
data.observed.Yates = data.o.y,
df = chisq.df,
q.min = q.min,
g = g,
g.Yates = g.y,
g.Williams = g.w,
p = p.g,
p.Yates = p.g.y,
p.Williams = p.g.w
)
class(z) <- "g.test.2way"
return(z)
}
## print function
print.g.test.2way <- function(x){
cat("Observed value:\n")
print(x$data.observed)
cat("Expected value:\n")
print(x$data.expected)
cat("Observed value with Yates' correction:\n")
print(x$data.observed.Yates)
cat("Degree of freedom =", x$df, "\n")
cat(sprintf("G-test:\n\tG = %g,\tp = %g\n", x$g, x$p))
cat(sprintf(
"G-test with Yates' correction:\n\tG = %g,\tp = %g\n",
x$g.Yates, x$p.Yates
))
cat(sprintf(
"G-test with Williams' correction:\n\tG = %g,\tq_min = %g,\tp = %g\n",
x$g.Williams, x$q.min, x$p.Williams
))
}</pre>
-->
<h4>Example</h4>
<h5>R code</h5>
<pre class="prettyprint">my.2way.contingency.table <- matrix(
c(
5,0,3,12,6,
4,2,7,23,11
),
nrow = 2, ncol=5, byrow=T
)
g.test.result <- g.test.2way(my.2way.contingency.table)
g.test.result
str(g.test.result)</pre>
<h5>R output</h5>
<pre class="prettyprint">> my.2way.contingency.table <- matrix(
+ c(
+ 5,0,3,12,6,
+ 4,2,7,23,11
+ ),
+ nrow = 2, ncol=5, byrow=T
+ )
>
> g.test.result <- g.test.2way(my.2way.contingency.table)
> g.test.result
Observed value:
[,1] [,2] [,3] [,4] [,5]
[1,] 5 0 3 12 6
[2,] 4 2 7 23 11
Expected value:
[,1] [,2] [,3] [,4] [,5]
[1,] 3.205479 0.7123288 3.561644 12.46575 6.054795
[2,] 5.794521 1.2876712 6.438356 22.53425 10.945205
Observed value with Yates' correction:
[,1] [,2] [,3] [,4] [,5]
[1,] 4.5 0.5 3.5 12.5 6.5
[2,] 4.5 1.5 6.5 22.5 10.5
Degree of freedom = 4
G-test:
G = 3.41127, p = 0.491497
G-test with Yates' correction:
G = 1.28744, p = 0.863503
G-test with Williams' correction:
G = 3.07349, q_min = 1.1099, p = 0.545603
> str(g.test.result)
List of 11
$ data.observed : num [1:2, 1:5] 5 4 0 2 3 7 12 23 6 11
$ data.expected : num [1:2, 1:5] 3.205 5.795 0.712 1.288 3.562 ...
$ data.observed.Yates: num [1:2, 1:5] 4.5 4.5 0.5 1.5 3.5 6.5 12.5 22.5 6.5 10.5
$ df : num 4
$ q.min : num 1.11
$ g : num 3.41
$ g.Yates : num 1.29
$ g.Williams : num 3.07
$ p : num 0.491
$ p.Yates : num 0.864
$ p.Williams : num 0.546
- attr(*, "class")= chr "g.test.2way"</pre>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-26997947244585020212013-03-12T10:37:00.000-07:002014-10-17T12:56:43.833-07:00Convert taxa matrix to guild matrix with R<p class="abstract">在 R 中如何把分類群資料小計成為功能群資料? 可利用以下介紹的 R function taxaToGuild() 將一分類群資料檔及一分類群對應功能群資料檔轉化出一個功能群資料框. 一般來說, 該 function 也提供一個資料框資料以系統化的方式進行欄小計.</p>
<h4>taxaToGuild.R 原始檔</h4>
<script src="https://gist.github.com/chenpanliao/db0aa37e2b24863eef85.js"></script>
<!--
<pre class="prettyprint"># Copyright 2013 Chen-Pan Liao <andrew.43@gmail.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
############################################
## taxaToGuild.r
## Author: Chen-Pan Liao (2013)
## License: Public domain
## Environment: R (ver. 2.15+)
## Usage:
## new.dataframe <- taxaToGuild(
## taxa.file.csv,
## taxa.file.csv.row.num,
## guild.file.csv,
## guild.file.csv.row.num,
## guild.system.row.num
## )
############################################
taxaToGuild <- function (
taxa.file.csv,
taxa.file.csv.row.num,
guild.file.csv,
guild.file.csv.row.num,
guild.system.row.num
# taxa.file.csv = "taxa.csv",
# taxa.file.csv.row.num = 1,
# guild.file.csv = "guild.csv",
# guild.file.csv.row.num = 1,
# guild.system.row.num = 1
){
guild.data <- read.csv(guild.file.csv, row.names = guild.file.csv.row.num)
taxa.data <- read.csv("taxa.csv", row.names = taxa.file.csv.row.num)
guild.list.taxa <- row.names(guild.data)
guild.list.guild <- guild.data[, guild.system.row.num]
taxa.list <- names(taxa.data)
taxa.list.ind <- numeric(length(taxa.list))
taxa.unknown <- 0
for(n in 1:length(taxa.list)){
if (sum(taxa.list[n] == guild.list.taxa) > 0) {
taxa.list.ind[n] <- which(taxa.list[n] == guild.list.taxa)
} else {
taxa.list.ind[n] <- NA
taxa.unknown[length(taxa.unknown) + 1] <- n
}
}
new.guild.list <- unique(guild.list.guild[taxa.list.ind])
taxa.unknown <- taxa.unknown[-1]
#new.taxa.data <- taxa.data
#names(new.taxa.data) <- guild.list.guild[taxa.list.ind]
## new taxa matrix initiation
newer.taxa.data <- data.frame (
matrix(
0,
nrow = nrow(taxa.data),
ncol = length(new.guild.list)
),
row.names = row.names(taxa.data)
)
names(newer.taxa.data) <- new.guild.list
## subsum of new taxa matrix
for(n in 1:length(new.guild.list)){
if (!is.na(new.guild.list[n])) {
newer.taxa.data[,n] <- rowSums(
data.frame(
taxa.data[
,
which(new.guild.list[n] == guild.list.guild[taxa.list.ind])
]
)
)
} else {
newer.taxa.data[,n] <- rowSums(
data.frame(
taxa.data[ , taxa.unknown]
)
)
}
}
cat(
sprintf(
"The guild system is %s.\n",
names(guild.data)[guild.system.row.num]
)
)
cat(
sprintf(
"Warning: the following taxa is merged into guild NA due to no reference: %s.\n",
taxa.list[taxa.unknown]
)
)
return(newer.taxa.data)
}</pre>
-->
<h4>例子</h4>
<p>假設有一蜘蛛以科為分類群的物種數資料,以 CSV 檔儲存後如下 (taxa.csv):</p>
<pre>"Plot","Leptonetidae","Clubionidae","Others","Araneidae","Oonopidae","Ctenizidae"
1,2,0,0,1,1,2
2,1,2,3,1,0,0
3,1,0,1,1,1,1
4,2,3,2,1,0,1
5,0,1,1,2,1,4
6,2,0,0,0,1,3
7,1,0,1,2,2,1
8,0,0,1,2,3,1</pre>
<p>其中第一列皆為變數名且第一欄為可辦視的樣點名。另有一科名對照功能群的資料檔以 CSV 檔儲存後如下 (guild.csv):</p>
<pre>"taxa","Guild1","Guild2","Guild3"
"Oonopidae","Ground runner","runner","Ground"
"Araneidae","Orb weaver","weaver","Space"
"Clubionidae","Foliage runner","runner","Foliage"
"Ctenidae","Ground runner","runner","Ground"
"Ctenizidae","Burrow dweller","dweller","Burrow"
"Gnaphosidae","Ground runner","runner","Ground"
"Leptonetidae","Ground weaver","weaver","Ground"
"Linyphiidae","Space weaver","weaver","Space"</pre>
<p>其中第一列皆為變數名且第一欄為可辦視的科名, 第 2 至 4 欄為三套不同的功能群系統. 在將 taxa.csv, guild.csv, taxaToGuild.r 三檔案置於相同某路徑後,於 R 環境中以 <code>setwd("某路徑")</code> 後進行以下動作.</p>
<pre class="prettyprint">source("taxaToGuild.r")
new.dataframe <- taxaToGuild(
taxa.file.csv = "taxa.csv",
taxa.file.csv.row.num = 1,
guild.file.csv = "guild.csv",
guild.file.csv.row.num = 1,
guild.system.row.num = 1
)
new.dataframe</pre>
<p>其中 <code>taxa.file.csv = "taxa.csv"</code> 表示引入 taxa.csv 為分類群資料, <code>taxa.file.csv.row.num = 1</code> 表示 taxa.csv 的第一欄為列名, <code>guild.file.csv = "guild.csv"</code> 表示引入 guild.csv 為功能群對照檔, <code>guild.file.csv.row.num = 1</code> 表示 guild.csv 的第一欄為列名, <code>guild.system.row.num = 1</code> 表示以 guild.csv 的第 1 組對照表 (實際上是此例的第 2 欄). 新產生的功能群資料框賦予變數 <code>new.dataframe</code>. 操作可得以下結果.</p>
<pre class="prettyprint">> source("taxaToGuild.r")
> new.dataframe <- taxaToGuild(
+ taxa.file.csv = "taxa.csv",
+ taxa.file.csv.row.num = 1,
+ guild.file.csv = "guild.csv",
+ guild.file.csv.row.num = 1,
+ guild.system.row.num = 1
+ )
The guild system is Guild1.
Warning: the following taxa is merged into guild NA due to no reference: Others.
> new.dataframe
Ground weaver Foliage runner NA Orb weaver Ground runner Burrow dweller
1 2 0 0 1 1 2
2 1 2 3 1 0 0
3 1 0 1 1 1 1
4 2 3 2 1 0 1
5 0 1 1 2 1 4
6 2 0 0 0 1 3
7 1 0 1 2 2 1
8 0 0 1 2 3 1</pre>
<p>值得注意的是, 因為 "Others" 無法藉由 "Guild1" 這項對應表成功對應至某個功能群, 所以被丟棄至 new.dataframe 中的 "NA" 欄. 在本例中若欲刪去 new.dataframe 中的 "NA" 欄, 可利用 <code>new.dataframe[,-3]</code>.</p>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-31449019926558187482012-12-12T00:51:00.000-08:002014-01-02T09:47:49.675-08:00One-dimensional interpolation with R<p>The function <code>approx()</code> and <code>spline()</code> provide one-dimensional linear and non-linear interpolation with R, respectively. Syntax of them are similar, for example,</p>
<pre class="prettyprint">x.old <- 1:10
y.old <- rnorm(10)
x.new <- seq(1, 10, 0.001)
# Linear interpolation
y.new.linear <- approx(x.old, y.old, xout = x.new)$y
# Forsythe, Malcolm and Moler's spline (default of spline)
y.new.fmm <- spline(x.old, y.old, xout = x.new, method = "fmm")$y
# Natural splines
y.new.natural <-
spline(x.old, y.old, xout = x.new, method = "natural")$y
# Periodic splines
y.new.periodic <-
spline(x.old, y.old, xout = x.new, method = "periodic")$y
# Plot
plot(
y.new.fmm ~ x.new,
type="l", col=2, xlab="x", ylab="y",
main="One-dimensional Interpolation with R"
)
lines(y.new.periodic ~ x.new, col=3)
lines(y.new.natural ~ x.new, col=4)
lines(y.new.linear ~ x.new, col=5)
points(y.old ~ x.old)
legend(8, 2, c("fmm", "periodic", "natural", "linear", "old"),
col = c(2:5,1), lty = c(1,1,1,1,0), pch = c(NA,NA,NA,NA,1), merge = T
)</pre>
<p>
<a href="https://picasaweb.google.com/lh/photo/9pQ0rK6V2w_L0C58venZCJGnzz1qy_nbzwK82mACLgY?feat=directlink"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjr-ZHffz_Tc6EqZmd5V29IvH4qc5f49lZ_puMzbjCGmlWtVRKxPE5AXb3kis_wVOcw3eDpD3f_bwkeqPgr0Re_agz0Pghl-KYWSRBo96v7YVQuBy9qT55DmU0uXaZ65u_i8FoFqDMRR_8/s800/one-dimensional-interpolation-with-R.png" alt="One-dimensional Interpolation with R" style="max-width:680px" /></a>
</p>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-65395641143161523372012-05-27T11:13:00.001-07:002014-10-17T13:20:53.850-07:00Nested ANOVA permutation test in R<p class="abstract">A R function for the permutation test for onw-way nested design ANOVA.</p>
<p>
This work is inspired by Dr. David C. Howell, University of Vermont.
In Dr. Howell’s webpage “<a href="http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html">Permutation Tests for Factorial ANOVA Designs</a>” (http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html [access on May 28, 2012]), he described the algorithm of permutation test for a one-way nested design ANOVA by using R language.
I followed this algorithm and made a R function to do this test.
</p>
<script src="https://gist.github.com/chenpanliao/42d8ade2461e67f17037.js"></script>
<!--
<pre class="prettyprint"># Copyright 2012 Chen-Pan Liao <andrew.43@gmail.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# Author: Chen-Pan Liao
# Date: May 28, 2012
#
# This work is inspirated by Dr. David C. Howell, University of Vermont.
# See Dr. Howell's webpage to learn more:
# http://www.uvm.edu/~dhowell/StatPages/More_Stuff/
# Permutation%20Anova/PermTestsAnova.html
nestedPermutationAnova <- function(
dv, # Dependent variable
treatment, # Treatment
subject, # Experimental unit
nreps = 1999 # Number of permutation
) {
# initiation
dv <- as.numeric(dv)
treatment <- as.factor(treatment)
subject <- as.factor(subject)
nreps <- as.integer(nreps)
subject <- as.factor(paste(subject,treatment,sep="_in_"))
# rename subject as "subject_in_treatment"
treatment.name <- levels(treatment)
subject.name <- levels(subject)
treatment.n <- nlevels(treatment)
subject.n <- nlevels(subject)
f.treatment <- numeric(nreps)
f.subject <- numeric(nreps);
# original model
model.0 <- summary(aov(dv ~ treatment / subject))
f.subject.0 <- model.0[[1]][1,4]
f.treatment.0 <- model.0[[1]][1,3] / model.0[[1]][2,3]
# permutation
for (i in 1:nreps) {
# permutation for effect of subject
dv.subject <- NULL
for (j in 1:treatment.n){
dv.subject <- c(
dv.subject,
sample(dv[ treatment == treatment.name[j] ])
)
}
model <- summary(aov(dv.subject ~ treatment / subject))
f.subject[i] <- model[[1]][1,4]
# permutation for effect of treatment
dv.treatment <- NULL
subject.name.reorder <- sample(subject.name)
for (j in 1:subject.n){
dv.treatment <- c(
dv.treatment,
dv[ subject == subject.name.reorder[j] ]
)
}
model <- summary(aov(dv.treatment ~ treatment / subject))
f.treatment[i] <- model[[1]][1,3] / model[[1]][2,3]
}
# p-value
p.treatment <- (sum(f.treatment >= f.treatment.0)+1) / (nreps+1)
p.subject <- (sum(f.subject >= f.subject.0)+1) / (nreps+1)
# plot
par(mfrow = c(1,2))
hist(f.treatment, breaks=40, xlab="Sudo f-value for treatment", main="")
abline(v=f.treatment.0, lty=2)
hist(f.subject, breaks=40, xlab="Sudo f-value for subject", main="")
abline(v=f.subject.0, lty=2)
# return
return (
list(
p.treatment = p.treatment,
p.subject = p.subject,
nreps = nreps
)
)
}</pre>
-->
<h2>Example</h2>
<pre class="prettyprint">dat <- read.csv(textConnection("
trt , unit , obs
1 , 1 , 11
1 , 1 , 9
1 , 1 , 9
1 , 2 , 8
1 , 2 , 7
1 , 2 , 6
1 , 3 , 8
1 , 3 , 10
1 , 3 , 11
2 , 4 , 11
2 , 4 , 8
2 , 4 , 7
2 , 5 , 10
2 , 5 , 14
2 , 5 , 12
2 , 6 , 9
2 , 6 , 10
2 , 6 , 8
"))
## traditional nested ANOVA
mod.2 <- aov(
obs ~ factor(trt) + Error(factor(unit)),
data = dat
)
summary(mod.2) # alternative
## permutation nested ANOVA
nestedPermutationAnova(dat$obs, dat$trt, dat$unit, 499)</pre>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-20512277524070718312012-05-10T03:44:00.002-07:002013-03-17T11:15:39.666-07:00Logistic regression with package rms in R<p class="abstract"></p>
<p>The <a href="http://cran.r-project.org/web/packages/rms/index.html"><code>rms</code></a> package by Frank E Harrell Jr provides a useful function <code>lrm</code> to apply logistic regression. Moreover, this function also provides many useful statistics, and a gooeness-of-fit test (le Cessie and Houwelingen test) could be applied by using the function <code>residuals(a lrm model, "gof")</code>. Read the <a href="http://cran.r-project.org/web/packages/rms/rms.pdf">official document of package <code>rms</code></a> for more details.</p>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-50370455270860384962012-03-18T12:02:00.001-07:002014-01-13T19:10:43.210-08:00淺談貝氏定理<div lang="zh-Hant">
<p class="abstract">本文將介紹貝氏定理 (Bayes’ theorem) 的推導與實例。</p>
<h4>推導</h4>
<p>
條件機率的定義是,
\[ \Pr(A|B)= \frac{\Pr(A \cap B)}{\Pr(B)}, \]
其中 $\Pr(A|B)$ 是指在事件 $B$ 發生的前題下, 事件 $A$ 發生的機率; $\Pr(A \cap B)$ 是指事件 $A$ 發生且事件 $B$ 發生的機率. 當然, $A \cap B$ 與 $B \cap A$ 是完全同義的, 因此, 我們也可以得到
\[ \Pr(B|A)=\frac{\Pr(B \cap A)}{\Pr(A)}=\frac{\Pr(A \cap B)}{\Pr(A)}. \]
<p>
<p>
有趣的事發生了. 若將上述二式中的 $\Pr(A \cap B)$ 單離出來, 可以發現
\[ \Pr(A \cap B) = \Pr(A|B) \times \Pr(B) = \Pr(B|A) \times \Pr(A), \]
可得
\[ \Pr(A|B) = \frac{\Pr(B|A) \times \Pr(A)}{\Pr(B)}, \]
被稱為貝氏定理. 不過在實用上來說, 常利用全機率定理改寫分母, 成為定理的另一個形式
\[ \Pr(A|B) = \frac{\Pr(B|A) \times \Pr(A)}{\Pr(B|A)\times \Pr(A) + \Pr(B|\lnot A)\times \Pr(\lnot A) }, \]
其中 $\lnot A$ 表示 “非 $A$”.
</p>
<h4>例題</h4>
<p>
<a href="http://en.wikipedia.org/w/index.php?title=Bayes%27_theorem&oldid=482244832">Wikipedia</a> 中描述了一個貝氏定理中著名的 “吸毒測試” 例題, 簡單翻譯與修改後如下. 某種檢測吸毒的方法具有 99% 的敏感度和 97% 的可靠度, 也就是有吸毒者受測呈陽性的機率為 99% 而非吸毒者受測呈陰性的機率是 97%. 調查人員已經知道某群體的總體吸毒率為 0.1%. 已知, 在該群體中某一位人員受檢後呈陽性, 則該人員吸毒的機率是多少?
</p>
<p>
令吸毒事件為 $D$, 測試結果呈陽性為 $P$. 由命題可知, $\Pr(P|D)=0.99$, $\Pr(\lnot P|\lnot D)=0.97$, $\Pr(D)=0.001$. 由貝氏定義可知,
\[
\begin{aligned}
\Pr(D|P) =& \frac{\Pr(P|D) \times \Pr(D)}{\Pr(P)} \\
=& \frac{\Pr(P|D) \times \Pr(D)}{\Pr(P|D) \times \Pr(D) + \Pr(P|\lnot D) \times \Pr(\lnot D)} \\
=& \frac{\Pr(P|D) \times \Pr(D)}{\Pr(P|D) \times \Pr(D) + [1-\Pr(\lnot P|\lnot D)] \times [1-\Pr(\lnot D)]} \\
=& \frac{0.99 \times 0.001}{0.99 \times 0.001 + (1-0.97) \times (1-0.001)} \\
\simeq& 0.032
\end{aligned}
\]
為所求, 也就是該人員受測呈陽性但有吸毒的機率只有 3.2%, 即使該檢測有 99% 的敏感度. 這種結果是不是與你預期的差距很大?
</p>
<p>
利用貝氏定理, 統計學家已經將之推廣出一派新的統計學門, 稱為貝氏推論. 貝氏推論的核心概念是, 在已知的資料中, 如何從過去的知識推論母體. 在例題中, 過去的知識 $\Pr(D)$ 被稱為事前機率 (prior), $\Pr(D|P)$ 被稱為事後機率 (posterior), 而 $\Pr(P|D) / \Pr(P)$ 被稱為標準概似度 (standardised likelihood). 因此, 貝氏定理亦常被表示成
\[ \text{posterior} = \text{standardised likelihood} \times \text{prior}. \]
假如 posterior 不易取得, 可藉由對 prior 更多的了解, 使我們可以更有信心地推論 posterior, 是貝氏推論的重要武器.
</p>
</div>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-32886578997307025862011-12-10T01:13:00.001-08:002013-03-30T19:56:06.016-07:00Create a NA matrix in R<p>
<p><code>matrix(data=NA, nrow=3, ncol=5)</code> can create/initiate a 3 × 5 NA matrix on the fly. Of course, <code>matrix(data=0, nrow=3, ncol=5)</code> may be useful in case.</p>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-69945293940994756782011-10-30T16:56:00.000-07:002014-01-07T04:21:08.128-08:00Statistical computing by using R
<p>
最近把東海大學生命科學系大學部的生物統計學實驗講義中的 SAS code 全改寫成 R code。內容是如何使用 R 計算常見的統計檢定,包括例題及 R code。該文件以 <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a> 授權。</p>
<p>
<a href="http://www.scribd.com/doc/70943527/Statistical-computing-by-using-R" title="View Statistical computing by using R on Scribd">View Statistical computing by using R on Scribd</a></p>
<p>
<iframe src="https://docs.google.com/file/d/0B0E2FRIvjDDoOU1VUkFVb1BXRVU/preview" width="640" height="480"></iframe>
</p>
<!--
<iframe class="scribd_iframe_embed" src="//www.scribd.com/embeds/70943527/content?start_page=1&view_mode=scroll&access_key=key-2lmdoe4endtkj6rg0wuz&show_recommendations=false" data-auto-height="false" data-aspect-ratio="0.708006279434851" scrolling="no" id="doc_98036" width="400" height="600"></iframe>
<p>
<iframe class="scribd_iframe_embed" data-aspect-ratio="0.6" data-auto-height="true" id="doc_49569" scrolling="no" src="http://www.scribd.com/embeds/70943527/content?start_page=1&view_mode=list&access_key=key-2lmdoe4endtkj6rg0wuz" width="100%"></iframe></p>
<script type="text/javascript">
(function() { var scribd = document.createElement("script"); scribd.type = "text/javascript"; scribd.async = true; scribd.src = "http://www.scribd.com/javascripts/embed_code/inject.js"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(scribd, s); })();
</script>
-->apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0Taichung City, Taiwan24.2332076 120.941736823.7698726 120.3100228 24.6965426 121.5734508tag:blogger.com,1999:blog-1101350361280593719.post-54581676482977745962011-08-04T05:57:00.000-07:002014-01-29T00:59:33.310-08:00A function of Pearson/Deviance goodness-of-fit for (generalized) linear regression<!-- <p><a href="http://zh.wikipedia.org/wiki/File:Rlogo.png"><img src="http://upload.wikimedia.org/wikipedia/commons/c/c1/Rlogo.png" alt="GNU R logo" class="rightImage" /></a></p> -->
<p class="abstract">After an analysis of (generalized) linear regression, a Pearson/Deviance goodness-of-fit test is useful to test if the model fitted reasonably. I wrote a simple function in R to do it.</p>
<h4>Source Code</h4>
<pre class="prettyprint"># Copyright 2011 Chen-Pan Liao <andrew.43@gmail.com>
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# Usage example:
# <em>model.name</em> <- glm (...)
# lm.fit.test(<em>model.name</em>)
lm.fit.test <- function (my.model) {
df <- my.model$df.residual;
chisq.pearson <- sum(resid(my.model , type="pearson")^2);
chisq.deviance <- sum(resid(my.model , type="deviance")^2);
p.pearson <- pchisq( chisq.pearson , df , lower.tail=F );
p.deviance <- pchisq( chisq.deviance , df , lower.tail=F );
ratio.pearson <- chisq.pearson / df;
ratio.deviance <- chisq.deviance / df;
cat(
"Pearson chisq = " , chisq.pearson ,
", df = " , df ,
", p = " , p.pearson ,
", chisq / df = " , chisq.pearson/df ,
".\n" ,
"Deviance chisq = " , chisq.deviance ,
", df = " , df ,
", p = " , p.deviance ,
", chisq / df = " , chisq.deviance/df ,
".\n"
);
}</pre>
<h4>Usage and example</h4>
<p>After defining a glm() model as a custom name, copy the source code into R interpreter and then input <code>lm.fit.test(your.model.name)</code> where <code>your.model.name</code> is the custom glm model name. See the fallowing example</p>
<pre class="prettyprint">y <- c(rpois(20,1) , rpois(20,2));
x <- gl(2 , 20);
my.model <- glm(y~x , family=poisson);
summary(my.model);
lm.fit.test(my.model);</pre>
apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-81588953733253986832011-07-07T08:56:00.001-07:002013-03-30T19:58:12.027-07:00Games-Howell post-hoc in R
<p>
Mr. <a href="http://aoki2.si.gunma-u.ac.jp/Aoki/Aoki.html">Shigenobu AOKI</a> displayed a quick R function for both of Tukey and Games-Howell post-hoc comparisons after an ANOVA. The Games-Howell post-hoc is famous for non-equal variances between treatments. Read more on <a href="http://aoki2.si.gunma-u.ac.jp/R/tukey.html">the webpage about Games-Howell post-hoc in R</a>.
</p>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-32051689399677512072011-06-22T23:21:00.000-07:002014-01-13T19:11:02.415-08:00邏輯斯迴歸、勝算與勝算比的關係 (logistic regression, odds and odds ratio)<div lang="zh-Hant">
<h4>公式一覽</h4>
<p>
<!-- <img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH2OqYQSfTBuhTcv7atGovU3mLxEDv0N_1nQC6GCk8BG0VFYpDUTBIXNFmS-xbpepHYIRRpnCRhmiRIH-fibQqXH7n4VQm0kghI0nTd7K4u6STiAvhW6dkyQqK3L8BA8DzjWIgixIuU44/s570/logistic-model.png" alt="Logistic regression, odds and odds ratio" class="centerImage" /> -->
\[
\log ( \text{odd} ) = \text{logit} (p) = \log ( \frac{\Pr (y=1) }{1 - \Pr (y=1)} ) = \beta_0 + \sum_{i=1}^{n} \beta_i x_i
\]
\[
\text{odd} = \frac{\Pr (y=1) }{1 - \Pr (y=1)} = \exp ( \beta_0 + \sum_{i=1}^{n} \beta_i x_i )
\]
\[
\Pr (y=1) = \frac{\text{odd}}{1+\text{odd}} = \frac{1}{1+\text{odd}^{-1}} = \frac{1}{1+\exp (-( \beta_0 + \sum_{i=1}^{n}\beta_i x_i)) }
\]
\[
\log (\text{odds ratio}) = \log \left( \frac{\text{odd}_1}{\text{odd}_2} \right) = \log(\text{odd}_1) - \log(\text{odd}_2)
\]
</p>
<h4>寫在前面</h4>
<p class="abstract">
如何解讀邏輯斯迴歸 (logistic regression) 的迴歸式? 因為我常搞錯邏輯斯迴歸結果與勝算或勝算比的關係, 特書此筆記. 本文最主要的內容是唸過 “<a href="http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm">FAQ: How do I interpret odds ratios in logistic regression?</a>” 一文後的筆記。</p>
<h4>名詞翻譯</h4>
<ul>
<li>logistic regression: 邏級式迴歸;</li>
<li>odds: 勝算;</li>
<li>odds ratio: 勝算比.</li>
</ul>
<p>請注意 “勝算” 與 “勝算比” 是兩回事.</p>
<h4>僅包含常數項的迴歸式</h4>
<p>在模型
\[ \text{logit}(p) = \log(\frac{1}{1-p}) = \beta_0,\]
其中 $p$ 為依變數 $y = 1$ 之機率, 而 $y$ 為布林資料 (0 或 1) 以表示肯定或否定. 此時, $\beta_0$ 即為
\[ \log(\frac{\Pr (y = 1)}{\Pr (y = 0)}).\]
說成白話文, $y = 1$ 的機率除以 $y = 0$ 的機率為 $\exp(\beta_0)$.</p>
<p>我以下列資料舉例. 抽樣的 10 名學生中, 有 6 人有手機而 4 人沒有. 計算 “有手機” 對 “無手機” 的勝算為 $\frac{6/10}{4/10} = 1.5$. 在迴歸式中可得 $\text{logit}(p) = \log( \frac{1}{1-p}) = 0.4055$, 可驗證 $\exp(0.4055) = 1.500052$ (因為 $y = 1$ 為有手機而 $y = 0$ 為無手機). 以下為驗證之 R code:</p>
<pre class="prettyprint">> <code>isPhone<-c(rep(1,6),rep(0,4))</code>
> <code>m<-glm(isPhone~1,family=binomial);summary(m)</code>
Call:
glm(formula = isPhone ~ 1, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.354 -1.354 1.011 1.011 1.011
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4055 0.6455 0.628 0.53
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13.46 on 9 degrees of freedom
Residual deviance: 13.46 on 9 degrees of freedom
AIC: 15.46
Number of Fisher Scoring iterations: 4
> exp(0.4055)
[1] 1.500052</pre>
<h4>包含一個二元類別自變數的迴歸式</h4>
<p>在模型
\[ \text{logit}(p) = \beta_0 + \beta_1 x_1, \]
其中 $p$ 為依變數 $y = 1$ 之機率而 $y$ 為布林資料 (0 或 1) 以表示肯定或否定, $x_1$ 為二元類別資料並轉換為布林資料 (0 或 1). 此時, $\beta_0$ 即為 $x = 0$ 時 $\log ( \frac{\Pr (y = 1)}{\Pr (y = 0)})$, 而 $\beta_1$ 為 $x_1 = 1$ 對 $x_1 = 0$ 在 $y$ 是否為 1 的勝算比. 說成白話文, $x_1 = 0$ 時 $y = 1$ 的機率除以 $y = 0$ 的機率為 $\exp( \beta_0)$; $x_1 = 1$ 對 $x_1 = 0$ 的勝算比為 $\beta_1$.</p>
<p>我以下列資料舉例. 抽樣的 10 名學生中, 有 6 名男生 4 名女生, 且每人分別擁有手機或沒有, 資料如下:</p>
<table>
<thead>
<tr><th>有無手機</th><th>男生 (x<sub>1</sub> = 1)</th><th>女生 (x<sub>1</sub> = 0)</th></tr>
</thead>
<tbody>
<tr><td>有 (y = 1)</td><td>5</td><td>1</td></tr>
<tr><td>無 (y = 0)</td><td>1</td><td>3</td></tr>
<tr><td>小計</td><td>6</td><td>4</td></tr>
</tbody>
</table>
<p>“男生有手機” 對 “男生無手機” 的勝算為 $\frac{5/6}{1/6} = 5$; “女生有手機” 對 “女生無手機” 的勝算為 $\frac{1/4}{3/4} = 0.3333$; 二者的勝算比為 $5 / 0.3333 = 15$.</p>
<p>在男生 = 1, 女生 = 0, 有手機 = 1, 無手機 = 0 的設定下, 迴歸式中可得 $\text{logit}(p) = -1.099 + 2.708(\text{性別})$, 可驗證 $\exp (-1.099) = 0.3332041$ 為 “女生有手機” 對 “女生無手機” 的勝算 (因為女生的編碼為 $y = 0$); $\exp (2.708) = 14.99925$ 為性別與有無手機的勝算比, 可解釋作 “男生有手機之勝算為女生的 15 倍”. 以下為驗證之 R code:</p>
<pre class="prettyprint">> isMale<-c(1,1,1,1,1,1,0,0,0,0)
> isPhone<-as.factor(c(1,1,1,1,1,0,0,0,0,1))
> m<-glm(isPhone~isMale,family=binomial);summary(m)
Call:
glm(formula = isMale ~ isPhone, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8930 -0.7585 0.6039 0.6039 1.6651
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.099 1.155 -0.951 0.3414
isMale 2.708 1.592 1.701 0.0889 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13.4602 on 9 degrees of freedom
Residual deviance: 9.9054 on 8 degrees of freedom
AIC: 13.905
Number of Fisher Scoring iterations: 4
> exp(-1.099)
[1] 0.3332041
> exp(2.708)
[1] 14.99925</pre>
<h4>包含一連續變數作為自變數的迴歸式</h4>
<p>在模型
\[\text{logit}(p) = \beta_0 + \beta_0 x_1,\]
其中 $p$ 為依變數 $y = 1$ 之機率而 $y$ 為布林資料 (0 或 1) 以表示肯定或否定, $x_1$ 為連續型變數. 此時, $\beta_0$ 即為 $x = 0$ 時 $\log (y = 1 \text{對} y = 0 \text{的勝算})$, 而 $\beta_1$ 為 $x_1$ 對勝算的貢獻量. 說成白話文, $x_1=0$ 時 $y = 1$ 的機率除以 $y = 0$ 的機率為 $\exp(\beta_0)$; 若 $x_1 \neq 0$ 時 $y = 1$ 的機率除以 $y = 0$ 的機率為 $\exp (\beta_0 + \beta_1 x_1)$, 而 $\beta_1$ 為正數或負數則影響了勝算的正向或負向影響。</p>
<p>我以下列資料舉例. 抽樣的 10 名學生中, 每人有不同每日零用錢及是否擁有手機, 資料如下:</p>
<table>
<thead>
<tr><th>有無手機 (y, 1 為有, 0 為無)</th><th>每日零用錢 (x<sub>1</sub>)</th></tr>
</thead>
<tbody>
<tr><td>1</td><td>100</td></tr>
<tr><td>1</td><td>120</td></tr>
<tr><td>1</td><td>90</td></tr>
<tr><td>1</td><td>110</td></tr>
<tr><td>1</td><td>110</td></tr>
<tr><td>0</td><td>80</td></tr>
<tr><td>0</td><td>100</td></tr>
<tr><td>0</td><td>90</td></tr>
<tr><td>0</td><td>70</td></tr>
<tr><td>0</td><td>110</td></tr>
</tbody>
</table>
<p>上述資料可得回歸式 $\text{logit}(p) = -9.71278 + 0.09840(\text{每日零用錢})$. 由迴歸式可觀察出, 每日零用錢每增加 1 單位則對有手機之勝算增加 $\text{exp}(0.09840) = 1.103404$, 亦表示有無手機在多拿 1 單位零用錢與沒有者的勝算比是 $\exp(\beta_1 \times 1) = 1.103404$,多拿 2 單位則為 $\exp(\beta_1 \times 2)$, 以此類推.</p>
<p>我們可以估計每日零用錢為 70 時有手機對無手機的勝算: $\exp (-9.71278 + 0.09840 \times 70) = 0.05932171$, 可解釋為每日零用錢為 70 的學生中有手機的機率為沒有手機的 0.05932171 倍. 要估算每日零用錢為 75 的勝算亦如法炮製: $\exp( -9.71278 + 0.09840 \times 75) = 0.09702564$. 因此, 二者的勝算比為 $0.09702564 / 0.05932171 = 1.635584$, 也就是 $\exp (0.09840 \times 5)$. </p>
<p>由於$\text{勝算} = p / (1 - p)$, 故可由勝算回推 $p$, 公式為 $p = \text{勝算} / (1 + \text{勝算})$. 前面有談到零用錢為 70 時有手機對無手機的勝算為 0.05932171, 可推得零用錢為 70 時有手機的機率為 $0.05932171 / (1 + 0.05932171) = 0.05599971$. 同理, 每日零用錢為 75 時有手機的機率為 $0.09702564 / (1 + 0.09702564) = 0.08844428$.</p>
<pre class="prettyprint">> isPhone<-c(1,1,1,1,1,0,0,0,0,0)
> money<-c(100, 120, 90, 110, 110, 80, 100, 90, 70, 110)
> m<-glm(isPhone~money,family=binomial);summary(m)
Call:
glm(formula = isPhone ~ money, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.67078 -0.76661 0.07113 0.75438 1.55604
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.71278 6.59477 -1.473 0.141
money 0.09840 0.06574 1.497 0.134
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13.863 on 9 degrees of freedom
Residual deviance: 10.481 on 8 degrees of freedom
AIC: 14.481
Number of Fisher Scoring iterations: 4</pre>
<h4>結論</h4>
<ul>
<li>邏輯斯迴歸的迴歸式 $\log ( \text{odd} ) = \text{logit} (p) = \log \left( \frac{\Pr (y=1) }{1 - \Pr (y=1)} \right) = \beta_0 + \sum_{i=1}^{n} \beta_i x_i$ 所加總得到的數值是 <strong>log(勝算)</strong>, 故 exp(logit(p)) = 勝算;</li>
<li>將二個勝算相除可以得到<strong>勝算比</strong>;</li>
<li>可由勝算回推機率, 公式為機率 = 勝算 / (1 + 勝算);</li>
<li>因為指數與對數在運算上的特性, log(勝算比) 常成為二個 logit(p) 相加減之結果.</li>
<li>若不解, 一切回歸到 “log(勝算) = log (p/(1 - p)) = 迴歸式” 及 “勝算比 = 勝算<sub>1</sub> / 勝算<sub>2</sub>” 的基本概念, 再往下推導.
</ul>
</div>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-68600395884221043292011-06-20T02:48:00.000-07:002014-01-02T09:57:29.828-08:00Least-significant difference (LSD) in R
<p class="abstract">How to apply LSD (as a kind of multiple comparisons after ANOVA) in R?</p>
<p>A dataset for typical one-way ANOVA is given:</p>
<pre class="prettyprint">y <- rnorm(40)
x <- gl(4,10)
m <- aov(y~x)</pre>
<p>Applying LSD by using function <code>pairwise.t.test</code>:</p>
<pre class="prettyprint">pairwise.t.test(y,x,p.adj="none")</pre>
<p>Applying LSD by using function <code>LSD.test</code> from package <code>agricolae</code>:</p>
<pre class="prettyprint">library(agricolae)
LSD.test(m,"x",group=T) # with grouping
LSD.test(m,"x",group=F) # without grouping</pre>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-16126213243892185372011-06-15T04:12:00.000-07:002014-01-02T09:59:38.528-08:00Specifying a reference level in R: function relevel
<p class="abstract">How to specify a specific level as a reference level in GNU R while applying (generalized) linear regression?</p>
<p>A dataframe <code>dt</code> is given:</p>
<pre class="prettyprint">y<-rpois(40,1)
x1<-gl(4,10)
x2<-gl(4,1,40)
dt <- data.frame(cbind(y,x1,x2))</pre>
<p>Applying poisson regression:</p>
<pre class="prettyprint">m1<-glm(y~factor(x1)+factor(x2),data=dt)
summary(m1)</pre>
<p>The output shows x1=1 and x2=1 are both reference level defined by R automatically. To switch the reference level from x1=1 to x1=2, the function <code>relevel</code> could be used:</p>
<pre class="prettyprint">m2<-glm(y~<strong>relevel(factor(dt$x1),"2")</strong>+factor(x2),data=dt)
summary(m2)</pre>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0tag:blogger.com,1999:blog-1101350361280593719.post-56510851985951311772011-05-20T01:23:00.000-07:002015-03-26T06:56:35.859-07:00阿盤的 Beamer 中英字型配方<div lang="zh-Hant">
<p class="abstract">本文介紹我自己的 Beamer 中英文字型配置。</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/29758346" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/chenpanliao/apan-beamertemplate" title="阿盤的 Beamer 中英字型配方" target="_blank">阿盤的 Beamer 中英字型配方</a> </strong> from <strong><a href="//www.slideshare.net/chenpanliao" target="_blank">Chen-Pan Liao</a></strong> </div>
<p><!--詳文請閱<a href="https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0E2FRIvjDDoNGI5MzFiNjYtN2I4Ny00MzYwLTkyOTctYzg4Zjc4ZTQ2MDBj&hl=zh_TW">阿盤的 Beamer 中英字型配方</a>。-->另歡迎至<a href="http://hyperrate.com/thread.php?tid=23496">討論串</a>伺教。</p>
</div>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.comtag:blogger.com,1999:blog-1101350361280593719.post-63888792604403759012010-06-25T15:38:00.000-07:002014-01-13T19:34:15.307-08:00學習 HTML, PHP 及 Database<div lang="zh-Hant">
<p class="abstract">
在 <a href="telnet://ptt.cc"><code>telnet://ptt.cc</code></a> 的 Web_Design 板有板友剛入門寫網頁. 他有為自己訂立了學習目標, 包括學習架站, 寫討論區等等, 但又不知道如何入門, 也不知道學會了這些技術有什麼幫助, 於是發表標題為 “[問題] 新手上路,卻不知路在何處...” 的文章. 有些板友給予他回應, 大多認為他想學的東西太基礎, 早已經有許多現成的技術可以取代, 例如 “手寫討論區? 何不用 <a href="http://www.xoops.org/">XOOPS</a>?” 等等. 我看了還蠻多感觸的, 於是寫了篇文章回應, 內容如下.</p>
<blockquote>
<p>
我願意分享一些自己的心得給原 PO.</p>
<p>
我不是本科生, 也完全沒有相關背景, 頂多小時候學過 BASIC 語言罷了. 上大學之後, 為了當時流行每個人有自己的個人網頁, 就開始入門. 一開始, 當然也只是用 WYSIWYG 的軟體來製作網頁. 不知不覺, 就對 HTML 有了興趣. 畢竟這才是做網頁的基礎.</p>
<p>
HTML 不只是做網頁罷了, 其內涵是讓你使用正確的 tag 來展現文稿結構. 當時看了一些書, 主要是教導大家屏棄舊式瀏覽器支援的 HTML 語法 (如 <font>), 而採用 HTML 4 + CSS. 學這些沒為了什麼, 就是讓自己的網頁在各種瀏覽器都可以正確解析罷了. (於是加入了 “凡是網頁設計師必痛恨 IE 6” 的族群.)</p>
<p>
我就利用自己學會的 “標準 HTML + CSS” 開始寫自己的日記. 做到這裡, 就很有成就感了: 每天寫寫內容, 畫畫 CSS.</p>
<p>
好日子過沒多久, 發現我的日記越來越多. 由於我是以 “一篇日記就是一個 HTML file” 的方式進行.
要讓每篇日記都共享相同的網頁結構變得非常困難. 看來, 是該把我的日記做得像 blog 才行. 於是我就開始學習 PHP.</p>
<p>
當然, PHP 必竟是一種程式語言, 學習上會吃力一些. 於是我就從最簡單的 PHP 教本開始唸. 只唸我有需要的部份. 有時候簡單的教本還沒唸完, 就又買了進階級的教本來唸. 就這些唸了又練, 練了又唸, 花了不少時間,
終於把網頁以 PHP 改寫完.</p>
<p>
沒多久, 我才認知, 原來我還需要學會 database 才能幫助我的日記網站有更好的維護方式. 那時我連 database 是幹嘛的都不懂, 只覺得大概就像個 excel 檔吧! 好在 PHP 的教本都是連帶著 MySQL 一起教, 也就學了 database 了.</p>
<p>
從學 HTML 到 PHP + MySQL, 也花課餘時間二三年了呢. 雖然程式不到專業級, 但給自己的成就感是很豐富的了. 有了一些些實做的經驗, javascript, python, perl, 正規表示式也都學了點皮毛. 我幫老師寫過實驗室網站, 也利用這些知識幫助我目前的工作. 但我明白我是不可能用這些基本功養家的.</p>
<p>
也許原 PO 會覺得, 花了這些多時間學會這些不能拿來養家的技術, 何苦? 我倒完全不這麼覺得啦.</p>
<p>
我學會了 HTML, 也就學會了 “文稿結構” 這件事. 這對寫任何文章都有幫助, 也使我學習 LATEX 非常快.</p>
<p>
為了學 CSS, 就探究了平面設計與繪圖軟體操作. 這在許多工作也都用得著.</p>
<p>
學會了 PHP, 就等於你是個 “會寫程式的人” (即使只是半調子.) 會寫點小程式對做很多事都很有幫助, 例如別人要用手按一星期的工作, 而你只要半天就完成. 此外, 只要學會第一種程式語言, 第二種就容易許多了.</p>
<p>
學了 database, 我連計帳本都做正規化了! 有時候老闆交代幫他整理資料或實驗數據, 懂 database 概念的人真的得心應手許多.</p>
<p>
好吧, 也許我是在催眠自己. 但我沒後悔過花這麼多時間學過這些技術. 它們雖然沒直接幫助我賺什麼錢, 但附加價值是無窮的, 也大大地豐富我的精神生活. 不過, 假如現在我再重新寫一個中型網站專案, 我沒空也懶惰就是了!</p>
</blockquote>
</div>apanhttp://www.blogger.com/profile/11846279873979046420noreply@blogger.com0