在R中将树状图导出为表格

5

我想从R中导出一个hclust树状图并将其转换为数据表,以便随后将其导入另一个("自制")软件中。 str(unclass(fit)) 提供了树状图的文本概述,但我真正寻找的是数字表格。我已经查看了Bioconductor ctc包,但它产生的输出看起来有些晦涩。我想要类似于这个表格的东西:http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm 有没有办法从R中的hclust对象中获得类似的结果?

2个回答

3

如果有人也对树状图导出感兴趣,这是我的解决方案。很可能,这不是最好的方案,因为我最近才开始使用R,但至少它能够工作。欢迎提出改进代码的建议。

所以,如果hr是我的hclust对象,df是我的数据,其中第一列包含从0开始的简单索引,行名称是聚类项的名称:

# Retrieve the leaf order (row name and its position within the leaves)
leaf.order <- matrix(data=NA, ncol=2, nrow=nrow(df),
              dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hr$labels[hr$order]
for (i in 1:nrow(leaf.order)) {
   leaf.order[which(leaf.order[,2] %in% rownames(df[i,])),1] <- df[i,1]
}
leaf.order <- as.data.frame(leaf.order)

hr.merge <- hr$merge
n <- max(df[,1])

# Re-index all clustered leaves and nodes. First, all leaves are indexed starting from 0.
# Next, all nodes are indexed starting from max. index leave + 1.
for (i in 1:length(hr.merge)) {
  if (hr.merge[i]<0) {hr.merge[i] <- abs(hr.merge[i])-1}
  else { hr.merge[i] <- (hr.merge[i]+n) }
}
node.id <- c(0:length(hr.merge))

# Generate dendrogram matrix with node index in the first column.
dend <- matrix(data=NA, nrow=length(node.id), ncol=6,
           dimnames=list(c(0:(length(node.id)-1)),
              c("node.id", "parent.id", "pruning.level",
              "height", "leaf.order", "row.name")) )
dend[,1] <- c(0:((2*nrow(df))-2))  # Insert a leaf/node index

# Calculate parent ID for each leaf/node:
# 1) For each leaf/node index, find the corresponding row number within the merge-table.
# 2) Add the maximum leaf index to the row number as indexing the nodes starts after indexing all the leaves.
for (i in 1:(nrow(dend)-1)) {
  dend[i,2] <- row(hr.merge)[which(hr.merge %in% dend[i,1])]+n
}

# Generate table with indexing of all leaves (1st column) and inserting the corresponding row names into the 3rd column.
hr.order <- matrix(data=NA,
           nrow=length(hr$labels), ncol=3,
           dimnames=list(c(), c("order.number", "leaf.id", "row.name")))
hr.order[,1] <- c(0:(nrow(hr.order)-1))
hr.order[,3] <- t(hr$labels[hr$order])
hr.order <- data.frame(hr.order)
hr.order[,1] <- as.numeric(hr.order[,1])

# Assign the row name to each leaf.
dend <- as.data.frame(dend)
for (i in 1:nrow(df)) {
      dend[which(dend[,1] %in% df[i,1]),6] <- rownames(df[i,])
}

# Assign the position on the dendrogram (from left to right) to each leaf.
for (i in 1:nrow(hr.order)) {
      dend[which(dend[,6] %in% hr.order[i,3]),5] <- hr.order[i,1]-1
}

# Insert height for each node.
dend[c((n+2):nrow(dend)),4] <- hr$height

# All leaves get the highest possible pruning level
dend[which(dend[,1] <= n),3] <- nrow(hr.merge)

# The nodes get a decreasing index starting from the pruning level of the
# leaves minus 1 and up to 0

for (i in (n+2):nrow(dend)) {
   if ((dend[i,4] != dend[(i-1),4]) || is.na(dend[(i-1),4])){
        dend[i,3] <- dend[(i-1),3]-1}
      else { dend[i,3] <- dend[(i-1),3] }
}
dend[,3] <- dend[,3]-min(dend[,3])

dend <- dend[order(-node.id),]

# Write results table.
write.table(dend, file="path", sep=";", row.names=F)

1
我刚刚使用了这段代码,它完美地运行了。对我来说最大的困难是阅读关于所需输入数据的说明 - 这个数据框“df”的描述实际上非常重要。 - eleanorahowe
@Eleanor,我很高兴你觉得它有用。你说得对,代码依赖于输入数据框的特定结构。希望你没有花太多时间来弄清楚它。 - AnjaM
R是一种以1为索引的语言,但是这段代码似乎是使用0为索引的循环编写的。在使用时要小心,因为可能会出现偏移一个的错误。 - Tom Kelly

1
有一个包可以做你想要的完全相反的事情 - Labeltodendro ;-)
但说真的,你不能手动从hclust对象中提取元素(例如 $merge$height$order),并从提取的元素创建自定义表格吗?

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接