Convert taxa matrix to guild matrix with R

在 R 中如何把分類群資料小計成為功能群資料? 可利用以下介紹的 R function taxaToGuild() 將一分類群資料檔及一分類群對應功能群資料檔轉化出一個功能群資料框. 一般來說, 該 function 也提供一個資料框資料以系統化的方式進行欄小計.

taxaToGuild.R 原始檔

# Copyright 2013 Chen-Pan Liao
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
############################################
## taxaToGuild.r
## Author: Chen-Pan Liao (2013)
## License: Public domain
## Environment: R (ver. 2.15+)
## Usage:
## new.dataframe <- taxaToGuild(
## taxa.file.csv,
## taxa.file.csv.row.num,
## guild.file.csv,
## guild.file.csv.row.num,
## guild.system.row.num
## )
############################################
taxaToGuild <- function (
taxa.file.csv,
taxa.file.csv.row.num,
guild.file.csv,
guild.file.csv.row.num,
guild.system.row.num
# taxa.file.csv = "taxa.csv",
# taxa.file.csv.row.num = 1,
# guild.file.csv = "guild.csv",
# guild.file.csv.row.num = 1,
# guild.system.row.num = 1
){
guild.data <- read.csv(guild.file.csv, row.names = guild.file.csv.row.num)
taxa.data <- read.csv("taxa.csv", row.names = taxa.file.csv.row.num)
guild.list.taxa <- row.names(guild.data)
guild.list.guild <- guild.data[, guild.system.row.num]
taxa.list <- names(taxa.data)
taxa.list.ind <- numeric(length(taxa.list))
taxa.unknown <- 0
for(n in 1:length(taxa.list)){
if (sum(taxa.list[n] == guild.list.taxa) > 0) {
taxa.list.ind[n] <- which(taxa.list[n] == guild.list.taxa)
} else {
taxa.list.ind[n] <- NA
taxa.unknown[length(taxa.unknown) + 1] <- n
}
}
new.guild.list <- unique(guild.list.guild[taxa.list.ind])
taxa.unknown <- taxa.unknown[-1]
#new.taxa.data <- taxa.data
#names(new.taxa.data) <- guild.list.guild[taxa.list.ind]
## new taxa matrix initiation
newer.taxa.data <- data.frame (
matrix(
0,
nrow = nrow(taxa.data),
ncol = length(new.guild.list)
),
row.names = row.names(taxa.data)
)
names(newer.taxa.data) <- new.guild.list
## subsum of new taxa matrix
for(n in 1:length(new.guild.list)){
if (!is.na(new.guild.list[n])) {
newer.taxa.data[,n] <- rowSums(
data.frame(
taxa.data[
,
which(new.guild.list[n] == guild.list.guild[taxa.list.ind])
]
)
)
} else {
newer.taxa.data[,n] <- rowSums(
data.frame(
taxa.data[ , taxa.unknown]
)
)
}
}
cat(
sprintf(
"The guild system is %s.\n",
names(guild.data)[guild.system.row.num]
)
)
cat(
sprintf(
"Warning: the following taxa is merged into guild NA due to no reference: %s.\n",
taxa.list[taxa.unknown]
)
)
return(newer.taxa.data)
}
view raw taxaToGuild.R hosted with ❤ by GitHub

例子

假設有一蜘蛛以科為分類群的物種數資料,以 CSV 檔儲存後如下 (taxa.csv):

"Plot","Leptonetidae","Clubionidae","Others","Araneidae","Oonopidae","Ctenizidae"
1,2,0,0,1,1,2
2,1,2,3,1,0,0
3,1,0,1,1,1,1
4,2,3,2,1,0,1
5,0,1,1,2,1,4
6,2,0,0,0,1,3
7,1,0,1,2,2,1
8,0,0,1,2,3,1

其中第一列皆為變數名且第一欄為可辦視的樣點名。另有一科名對照功能群的資料檔以 CSV 檔儲存後如下 (guild.csv):

"taxa","Guild1","Guild2","Guild3"
"Oonopidae","Ground runner","runner","Ground"
"Araneidae","Orb weaver","weaver","Space"
"Clubionidae","Foliage runner","runner","Foliage"
"Ctenidae","Ground runner","runner","Ground"
"Ctenizidae","Burrow dweller","dweller","Burrow"
"Gnaphosidae","Ground runner","runner","Ground"
"Leptonetidae","Ground weaver","weaver","Ground"
"Linyphiidae","Space weaver","weaver","Space"

其中第一列皆為變數名且第一欄為可辦視的科名, 第 2 至 4 欄為三套不同的功能群系統. 在將 taxa.csv, guild.csv, taxaToGuild.r 三檔案置於相同某路徑後,於 R 環境中以 setwd("某路徑") 後進行以下動作.

source("taxaToGuild.r")
new.dataframe <- taxaToGuild(
  taxa.file.csv = "taxa.csv",
  taxa.file.csv.row.num = 1,
  guild.file.csv = "guild.csv",
  guild.file.csv.row.num = 1,
  guild.system.row.num = 1
)
new.dataframe

其中 taxa.file.csv = "taxa.csv" 表示引入 taxa.csv 為分類群資料, taxa.file.csv.row.num = 1 表示 taxa.csv 的第一欄為列名, guild.file.csv = "guild.csv" 表示引入 guild.csv 為功能群對照檔, guild.file.csv.row.num = 1 表示 guild.csv 的第一欄為列名, guild.system.row.num = 1 表示以 guild.csv 的第 1 組對照表 (實際上是此例的第 2 欄). 新產生的功能群資料框賦予變數 new.dataframe. 操作可得以下結果.

> source("taxaToGuild.r")
> new.dataframe <- taxaToGuild(
+   taxa.file.csv = "taxa.csv",
+   taxa.file.csv.row.num = 1,
+   guild.file.csv = "guild.csv",
+   guild.file.csv.row.num = 1,
+   guild.system.row.num = 1
+ )
The guild system is Guild1.
Warning: the following taxa is merged into guild NA due to no reference: Others.
> new.dataframe
  Ground weaver Foliage runner NA Orb weaver Ground runner Burrow dweller
1             2              0  0          1             1              2
2             1              2  3          1             0              0
3             1              0  1          1             1              1
4             2              3  2          1             0              1
5             0              1  1          2             1              4
6             2              0  0          0             1              3
7             1              0  1          2             2              1
8             0              0  1          2             3              1

值得注意的是, 因為 "Others" 無法藉由 "Guild1" 這項對應表成功對應至某個功能群, 所以被丟棄至 new.dataframe 中的 "NA" 欄. 在本例中若欲刪去 new.dataframe 中的 "NA" 欄, 可利用 new.dataframe[,-3].