假设您的数据框是
x
,我会简单地这样做:
do.call(rbind, tapply(unlist(x, use.names = FALSE),
rep(1:ncol(x), each = nrow(x)),
table))
Benchmarking
datsim <- function(n, p, k) {
as.data.frame(replicate(p, sample(LETTERS[1:k], n, TRUE), simplify = FALSE),
col.names = paste0("X",1:p), stringsAsFactors = TRUE)
}
x <- datsim(100, 500, 3)
system.time(do.call(rbind, lapply(x, function(u) table(factor(u, levels=levels(unlist(x)))))))
system.time(do.call(rbind, tapply(unlist(x, use.names = FALSE), rep(1:ncol(x), each = nrow(x)), table)))
Dirty的答案可以改进,方法如下:
system.time({clevels <- levels(unlist(x, use.names = FALSE));
do.call(rbind, lapply(x, function(u) table(factor(u, levels=clevels))))})
还需考虑user20650的回答:
x <- datsim(200, 5000, 5)
system.time(t(table(stack(lapply(x, as.character)))))
虽然我的答案:
system.time(do.call(rbind, tapply(unlist(x, use.names = FALSE), rep(1:ncol(x), each = nrow(x)), table)))
改进了Dirty的答案,它能够:
system.time({clevels <- levels(unlist(x, use.names = FALSE));
do.call(rbind, lapply(x, function(u) table(factor(u, levels=clevels))))})
levels(u)[u]
比as.character
慢一些。(我认为这是有道理的,因为我确定 R 的开发人员已经对此进行了优化) - user20650