长度不限的无序组合

25

我正在寻找一个函数,它可以返回给我的是一个向量的所有无序组合。例如:

x <- c('red','blue','black')
uncomb(x)
[1]'red'
[2]'blue'
[3]'black'
[4]'red','blue'
[5]'blue','black'
[6]'red','black'
[7]'red','blue','black'

我猜想某个库中有一个函数可以做到这一点,但我找不到它。我正在尝试使用 gtoolpermutations,但它不是我要找的函数。


2
我不会发布我的答案,因为它非常接近Richard Scriven的答案。但是,如果您想利用gtool包,可以使用combinations而不是permutationssapply(seq_along(x), combinations, v = x, n = length(x)) - Davide Passaretti
3个回答

27
你可以对 combn() 函数的 m 参数应用一个长度为 x 的序列。
x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
# 
# [[2]]
# [1] "blue"
# 
# [[3]]
# [1] "black"
# 
# [[4]]
# [1] "red"  "blue"
# 
# [[5]]
# [1] "red"   "black"
# 
# [[6]]
# [1] "blue"  "black"
# 
# [[7]]
# [1] "red"   "blue"  "black"
如果您喜欢矩阵形式的结果,那么可以将stringi::stri_list2matrix()应用于上面的列表。
stringi::stri_list2matrix(
    do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
    byrow = TRUE
)
#      [,1]    [,2]    [,3]   
# [1,] "red"   NA      NA     
# [2,] "blue"  NA      NA     
# [3,] "black" NA      NA     
# [4,] "red"   "blue"  NA     
# [5,] "red"   "black" NA     
# [6,] "blue"  "black" NA     
# [7,] "red"   "blue"  "black"

3
可以尝试使用unlist(lapply(seq_along(x), combn, x=x, simplify=FALSE),recursive=FALSE)来获得另一种可能的输出结果。长度不相等的数据对象非常适合作为一个列表。 - thelatemail
我同意,但在评论中被提示要更接近所需的输出。即使 lapply(seq_along(x), combn, x = x) 看起来完全正确,它也是按列读取的。 - Rich Scriven
列表(在我的变体中)几乎与OP在问题中呈现的所需输出完全相同。使用矩阵似乎会因为所有NA而更难传递给其他函数。 - thelatemail
1
我完全同意@thelatemail - 我已经在第一部分进行了编辑。不知何故,我更喜欢do.call(c, ...)而不是unlist(..., recursive = FALSE) - Rich Scriven
2
大同小异 - “西红柿,番茄,咱们就算了吧…” - thelatemail

9
我是从列出所有组合的combn重定向到这里的,因为这是其中一个重复目标。这是一个旧问题,@RichScriven提供的答案非常好,但我想给社区提供几个更自然、更高效的选项(最后两个)。
我们首先注意到输出与幂集非常相似。调用rje包中的powerSet,我们可以看到我们的输出确实与幂集中的每个元素匹配,除了第一个元素等同于空集合
x <- c("red", "blue", "black")
rje::powerSet(x)
[[1]]
character(0)   ## empty set equivalent

[[2]]
[1] "red"

[[3]]
[1] "blue"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "black"

[[6]]
[1] "red"   "black"

[[7]]
[1] "blue"  "black"

[[8]]
[1] "red"   "blue"  "black"

如果您不想要第一个元素,可以在函数调用末尾轻松添加[-1],如下所示:rje::powerSet(x)[-1]
接下来的两个解决方案来自较新的包arrangementsRcppAlgos(我是作者),将为用户提供更高效的解决方案。这两个包都能够生成多重集合的组合。

为什么这很重要?

可以证明,从集合A的幂集到多重集合c(rep(emptyElement, length(A)), A)选择length(A)的所有组合存在一对一映射,其中emptyElement是空集的表示(如零或空白)。考虑到这一点,观察:
## There is also a function called combinations in the
## rje package, so we fully declare the function with
## scope operator
library(arrangements)
arrangements::combinations(x = c("",x), k = 3, freq = c(2, rep(1, 3)))
     [,1]  [,2]   [,3]   
[1,] ""    ""     "red"  
[2,] ""    ""     "blue" 
[3,] ""    ""     "black"
[4,] ""    "red"  "blue" 
[5,] ""    "red"  "black"
[6,] ""    "blue" "black"
[7,] "red" "blue" "black"

library(RcppAlgos)
comboGeneral(c("",x), 3, freqs = c(2, rep(1, 3)))
     [,1]  [,2]   [,3]   
[1,] ""    ""     "red"  
[2,] ""    ""     "blue" 
[3,] ""    ""     "black"
[4,] ""    "red"  "blue" 
[5,] ""    "red"  "black"
[6,] ""    "blue" "black"
[7,] "red" "blue" "black"

如果您不喜欢处理空元素和/或矩阵,您也可以使用lapply返回一个列表。
lapply(seq_along(x), comboGeneral, v = x)
[[1]]
     [,1]   
[1,] "red"  
[2,] "blue" 
[3,] "black"

[[2]]
     [,1]   [,2]   
[1,] "red"  "blue" 
[2,] "red"  "black"
[3,] "blue" "black"

[[3]]
     [,1]  [,2]   [,3]   
[1,] "red" "blue" "black"


lapply(seq_along(x), function(y) arrangements::combinations(x, y))
[[1]]
     [,1]   
[1,] "red"  
[2,] "blue" 
[3,] "black"

[[2]]
     [,1]   [,2]   
[1,] "red"  "blue" 
[2,] "red"  "black"
[3,] "blue" "black"

[[3]]
     [,1]  [,2]   [,3]   
[1,] "red" "blue" "black"

现在我们展示最后两种方法更加高效(注:我从@RichSciven提供的答案中删除了do.call(c,simplify = FALSE,以便比较生成类似输出。 我还包括rje :: powerSet以确保):
set.seed(8128)
bigX <- sort(sample(10^6, 20)) ## With this as an input, we will get 2^20 - 1 results.. i.e. 1,048,575
library(microbenchmark)
microbenchmark(powSetRje = powerSet(bigX),
               powSetRich = lapply(seq_along(bigX), combn, x = bigX),
               powSetArrange = lapply(seq_along(bigX), function(y) arrangements::combinations(x = bigX, k = y)),
               powSetAlgos = lapply(seq_along(bigX), comboGeneral, v = bigX),
               unit = "relative")

Unit: relative
          expr        min        lq      mean   median        uq      max neval
     powSetRje 64.4252454 44.063199 16.678438 18.63110 12.082214 7.317559   100
    powSetRich 61.6766640 43.027789 16.009151 17.88944 11.406994 7.222899   100
 powSetArrange  0.9508052  1.060309  1.080341  1.02257  1.262713 1.126384   100
   powSetAlgos  1.0000000  1.000000  1.000000  1.00000  1.000000 1.000000   100

进一步地,arrangements 还配备了一个名为 layout 的参数,允许用户选择特定的输出格式。其中之一是 layout = "l" 用于列表。它类似于在 combn 中设置 simplify = FALSE,并且允许我们获得类似于 powerSet 的输出。请注意:
do.call(c, lapply(seq_along(x), function(y) {
                    arrangements::combinations(x, y, layout = "l")
                  }))
[[1]]
[1] "red"

[[2]]
[1] "blue"

[[3]]
[1] "black"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "red"   "black"

[[6]]
[1] "blue"  "black"

[[7]]
[1] "red"   "blue"  "black"

而且基准测试:

microbenchmark(powSetRje = powerSet(bigX)[-1],
               powSetRich = do.call(c, lapply(seq_along(bigX), combn, x = bigX, simplify = FALSE)),
               powSetArrange = do.call(c, lapply(seq_along(bigX), function(y) arrangements::combinations(bigX, y, layout = "l"))),
               times = 15, unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq      max neval
     powSetRje 5.539967 4.785415 4.277319 4.387410 3.739593 3.543570    15
    powSetRich 4.994366 4.306784 3.863612 3.932252 3.334708 3.327467    15
 powSetArrange 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    15    15

我该如何使用此函数获取一系列长度的所有组合?例如,如果我的输入向量为x <- c("red", "blue", "black", "green"),我想生成长度为2和3的所有组合?(以矩阵形式而非列表形式) - deschen

1

使用矩阵结果的解决方案,不使用任何外部包:

store <- lapply(
  seq_along(x), 
  function(i) {
    out <- combn(x, i) 
    N <- NCOL(out)
    length(out) <- length(x) * N
    matrix(out, ncol = N, byrow = TRUE)
})
t(do.call(cbind, store))

     [,1]    [,2]    [,3]   
[1,] "red"   NA      NA     
[2,] "blue"  NA      NA     
[3,] "black" NA      NA     
[4,] "red"   "black" NA     
[5,] "blue"  "blue"  NA     
[6,] "red"   "black" NA     
[7,] "red"   "blue"  "black"

1
你可以将 3L 更改为 length(x) 以获得更通用的解决方案。 - Joseph Wood

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接