在矩阵或数据框中查找每个唯一列的频率

Question

在矩阵或数据框中查找每个唯一列的频率

4

我希望能够按列查找矩阵频率。例如对于下面的矩阵x：

   x <- matrix(c(rep(1:4,3),rep(2:5,2)),4,5)
   x
         [,1] [,2] [,3] [,4] [,5]
   [1,]    1    1    1    2    2
   [2,]    2    2    2    3    3
   [3,]    3    3    3    4    4
   [4,]    4    4    4    5    5

现在，如何找到每个唯一列的频率，并创建一个矩阵，其中每列都是x的唯一列，最后一行添加为它在矩阵x中的频率。

 #freqmatrix
        [,1] [,2]
 [,1]      1  2
 [,2]      2  3
 [,3]      3  4
 [,4]      4  5
 [,5]      3  2

- morteza

4个回答

3

你的最终目标是什么？换句话说，你会如何进一步处理这些数据？如果只是制表的话，使用paste()函数就能得出答案了。

x <- matrix(c(rep(1:4,3),rep(2:5,2)),4,5)
x1 <- data.frame(table(apply(x, 2, paste, collapse = ", ")))
#         Var1 Freq
# 1 1, 2, 3, 4    3
# 2 2, 3, 4, 5    2

如果您确实希望将 Var1 分离出来，您可以在该列上使用 read.csv()。

cbind(read.csv(text = as.character(x1$Var1), header = FALSE), x1[-1])
#   V1 V2 V3 V4 Freq
# 1  1  2  3  4    3
# 2  2  3  4  5    2

或者，如果您喜欢转置您的输出：

t(cbind(read.csv(text = as.character(x1$Var1), header = FALSE), x1[-1]))
#      [,1] [,2]
# V1      1    2
# V2      2    3
# V3      3    4
# V4      4    5
# Freq    3    2

- A5C1D2H2I1M1N2O1R2T1

2

由于涉及到列表的列表，本次回答可能会有些混乱：

x <- matrix(c(rep(1:4,3),rep(2:5,2)),4,5)
#convert columns to elements in list
y <- apply(x, 2, list)

#Get unique columns
unique_y <- unique(unlist(y, recursive=FALSE))

#Get column frequencies
frequencies <- sapply(unique(y), function(f) sum(unlist(y, recursive=FALSE) %in% f))

#Bind unique columns with frequencies
rbind(simplify2array(unique_y), frequencies)

瞧：

            [,1] [,2]
               1    2
               2    3
               3    4
               4    5
frequencies    3    2

- sebastian-c

2

如果您的输入是一个 data.frame，可以使用 aggregate 进行一行代码聚合：

y <- matrix(c(1:4, 2:5, 1:4, 1,3,4,5, 2:5), ncol=5)
> y
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    2    1    1    2
# [2,]    2    3    2    3    3
# [3,]    3    4    3    4    4
# [4,]    4    5    4    5    5

z <- as.data.frame(t(y))
> t(aggregate(z, by=z, length)[1:(ncol(z)+1)])
#      [,1] [,2] [,3]
# V1      1    1    2
# V2      2    3    3
# V3      3    4    4
# V4      4    5    5
# V1.1    2    1    2

注意：如果输入矩阵x中的列数大于其行数，即ncol(x) >> nrow(x)，则此解决方案将非常快速。

- Arun

1

谢谢，由于我的数据框很大，你的通知非常好。 - morteza

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- orizon · Accepted Answer

这里提供一种解决方案，避免将矩阵转换为列表的列表，但这也有些混乱：

x.unique <- unique(x, MARGIN  = 2)

freq <- apply(x.unique, MARGIN = 2, 
              function(b) sum(apply(x, MARGIN = 2, function(a) all(a == b)))
        )

rbind(x.unique, freq)

     [,1] [,2]
        1    2
        2    3
        3    4
        4    5
freq    3    2