生成向量元素的所有可能组合列表

Question

生成向量元素的所有可能组合列表

81

我正在尝试生成一个长度为14的向量中0和1的所有可能组合。有没有简单的方法可以将输出作为向量列表，甚至更好的是数据框？

为了更好地示范我所寻找的，假设我只想要一个长度为3的向量。我希望能够生成以下内容：

 (1,1,1), (0,0,0), (1,1,0), (1,0,0), (1,0,1), (0,1,0), (0,1,1), (0,0,0)

- Mayou

1

这些是排列，因为顺序很重要。 - Joseph Wood

10个回答

29

tidyr有一些类似于expand.grid()的选项。

tidyr::crossing()返回一个tibble，不会将字符串转换为因子（尽管您可以使用expand.grid(..., stringsAsFactors = F)来实现）。

library(tidyr)

crossing(var1 = 0:1, var2 = 0:1, var3 = 0:1)
# A tibble: 8 x 3
   var1  var2  var3
  <int> <int> <int>
1     0     0     0
2     0     0     1
3     0     1     0
4     0     1     1
5     1     0     0
6     1     0     1
7     1     1     0
8     1     1     1

tidyr::expand()可以提供仅由数据中出现的值组成的所有组合，例如：

tidyr::expand()能够产生仅由数据中出现的值组成的所有组合，如下所示：

expand(mtcars, nesting(vs, cyl))
# A tibble: 5 x 2
     vs   cyl
  <dbl> <dbl>
1     0     4
2     0     6
3     0     8
4     1     4
5     1     6

对于所有可能的两个变量的组合，即使在数据中没有具有这些特定值的观测值，也要考虑在内，例如：

expand(mtcars, vs, cyl)
# A tibble: 6 x 2
     vs   cyl
  <dbl> <dbl>
1     0     4
2     0     6
3     0     8
4     1     4
5     1     6
6     1     8

您可以看到，在原始数据中没有观察到vs == 1 & cyl == 8的情况。

tidyr::complete()也可以像expand.grid()一样使用。以下是文档中的示例：

df <- dplyr::tibble(
  group = c(1:2, 1),
  item_id = c(1:2, 2),
  item_name = c("a", "b", "b"),
  value1 = 1:3,
  value2 = 4:6
)
df %>% complete(group, nesting(item_id, item_name))

# A tibble: 4 x 5
  group item_id item_name value1 value2
  <dbl>   <dbl> <chr>      <int>  <int>
1     1       1 a              1      4
2     1       2 b              3      6
3     2       1 a             NA     NA
4     2       2 b              2      5

这会为每个组合创建group=2, item_id=1和item_name=a的行，包括所有可能的item_id和item_name组合。

- sbha

tidyr现在拥有tidyr::expand_grid()函数。 - Michael Dewar

14

作为 @Justin 方法的替代方案，您还可以使用“data.table”软件包中的 CJ。在这里，我还使用了 replicate 来创建我的14个零和一的列表。

library(data.table)
do.call(CJ, replicate(14, 0:1, FALSE))
#        V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
#     1:  0  0  0  0  0  0  0  0  0   0   0   0   0   0
#     2:  0  0  0  0  0  0  0  0  0   0   0   0   0   1
#     3:  0  0  0  0  0  0  0  0  0   0   0   0   1   0
#     4:  0  0  0  0  0  0  0  0  0   0   0   0   1   1
#     5:  0  0  0  0  0  0  0  0  0   0   0   1   0   0
#    ---                                               
# 16380:  1  1  1  1  1  1  1  1  1   1   1   0   1   1
# 16381:  1  1  1  1  1  1  1  1  1   1   1   1   0   0
# 16382:  1  1  1  1  1  1  1  1  1   1   1   1   0   1
# 16383:  1  1  1  1  1  1  1  1  1   1   1   1   1   0
# 16384:  1  1  1  1  1  1  1  1  1   1   1   1   1   1

- A5C1D2H2I1M1N2O1R2T1

与众所周知的expand.grid相比，该方法在速度方面被低估。 - Barnaby

8

我在这里讨论了一种通用方法来解决所有类似的问题。首先，让我们看看随着N的增加，解决方案如何演变以找出一般模式。

首先，长度为1的解决方案是

0
1

现在针对长度为2的情况，解决方案变为（第二列由|分隔）：

0 | 0 0, 0 1
1 | 1 0, 1 1

与长度为1的先前解决方案相比，显然要获得这个新解决方案，我们只需将0和1分别附加到先前每个解决方案的末尾（第一个列中的0和1）。

现在对于长度为3的情况，解决方案为（第三列）：

0 | 0 0 | 0 0 0, 0 0 1
1 | 1 0 | 1 0 0, 1 0 1
  | 0 1 | 0 1 0, 0 1 1
  | 1 1 | 1 1 0, 1 1 1

再次提醒，这种新的解决方案是通过将0和1附加到先前解决方案的每个元素（对于长度为2的情况，是第二列）获得的。

这个观察结果自然地引出了一个递归解决方案。假设我们已经获得了长度为N-1的解决方案solution(c(0,1), N-1)，为了获得长度为N的解决方案，我们只需将0和1附加到解决方案N-1的每个元素中append_each_to_list(solution(c(0,1), N-1), c(0,1))。注意到这里如何将更复杂的问题（解决N）自然地分解为更简单的问题（解决N-1）。

接下来我们只需要几乎照字面意思将这段简单英语翻译成R代码：

# assume you have got solution for a shorter length len-1 -> solution(v, len-1) 
# the solution of length len will be the solution of shorter length appended with each element in v 
solution <- function(v, len) {
  if (len<=1) {
    as.list(v)
  } else {
    append_each_to_list(solution(v, len-1), v)
  } 
}

# function to append each element in vector v to list L and return a list
append_each_to_list <- function(L, v) {
  purrr::flatten(lapply(v, 
         function(n) lapply(L, function(l) c(l, n))
         ))
}

调用函数的方法：

> solution(c(1,0), 3)
[[1]]
[1] 1 1 1

[[2]]
[1] 0 1 1

[[3]]
[1] 1 0 1

[[4]]
[1] 0 0 1

[[5]]
[1] 1 1 0

[[6]]
[1] 0 1 0

[[7]]
[1] 1 0 0

- englealuze

5

由于您正在处理0和1，因此自然而然地会以位为单位考虑整数。使用稍微修改过的函数（MyIntToBit），该函数来自帖子，以及您选择的apply函数，我们可以获得所需的结果。

MyIntToBit <- function(x, dig) {
    i <- 0L
    string <- numeric(dig)
    while (x > 0) {
        string[dig - i] <- x %% 2L
        x <- x %/% 2L
        i <- i + 1L
    }
    string
}

如果您想要一个列表，请使用 lapply，如下所示：

lapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))

如果您喜欢矩阵，sapply可以胜任：

sapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))

以下是示例输出：

> lapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[[1]]
[1] 0 0 0

[[2]]
[1] 0 0 1

[[3]]
[1] 0 1 0

[[4]]
[1] 0 1 1

[[5]]
[1] 1 0 0

[[6]]
[1] 1 0 1

[[7]]
[1] 1 1 0

[[8]]
[1] 1 1 1


> sapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    0    0    0    1    1    1    1
[2,]    0    0    1    1    0    0    1    1
[3,]    0    1    0    1    0    1    0    1

- Joseph Wood

5

有16384种可能的排列方式。您可以使用iterpc包逐个获取结果。

library(iterpc)
I = iterpc(2, 14, label=c(0,1), order=T, replace=T)
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 1
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 1 0

如果你想获取所有结果，你仍然可以使用getall(I)。

- Randy Lai

5

一个漂亮的最小可重现示例在这里：

x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
# 
# [[2]]
# [1] "blue"
# 
# [[3]]
# [1] "black"
# 
# [[4]]
# [1] "red"  "blue"
# 
# [[5]]
# [1] "red"   "black"
# 
# [[6]]
# [1] "blue"  "black"
# 
# [[7]]
# [1] "red"   "blue"  "black"

所有功劳归于@RichScriven

- stevec

3

这是一种不同的方法，用于之前的答案。如果需要14个1和0的所有可能组合，则类似于生成从0到（2 ^ 14）-1的所有可能数字，并保留它们的二进制表示。

n <- 14
lapply(0:(2^n-1), FUN=function(x) head(as.integer(intToBits(x)),n))

- Patricio Moracho

2

这很好，而且很高兴看到它使用内置函数。然而，它与上面Joseph Wood的答案相似（当n=14时，他的自定义版本使用 MyIntToBit 运行速度快3-4倍）。 - Gregor Thomas

1

前言

这里有很多不错的答案。我想为那些似乎无法理解提供的实现方法的人添加一个答案。这里的解决方案本质上是循环的概括，这就是为什么递归解决方案看起来如此优雅的原因。没有人直接将其写成循环--我认为给出最直接的解决方案是有优点的，这样可以追踪实际发生的事情。

这并不能保证有很好的性能--大多数其他答案更加实用。目的是让你追踪实际发生的事情。

数学

组合是指在选择集合中唯一的元素时，元素的顺序不重要（[0, 1]与[1, 0]是不同的）。您的列表有n个元素，您正在选择k个元素，总共有n^k种组合。

例如

您有三个字母，['a', 'b', 'c']，并希望找到排列其中两个字母的所有唯一方式，允许重复使用字母（因此允许['a', 'a']）。n = 3，k = 2--我们有三个东西，希望找到选择其中两个的所有不同方式。有9种方法可以进行此选择（3^2--->n^k）。

代码

如上所述，最简单的解决方案需要很多循环。

随着k值的增加，不断添加循环和要选择的值。

set <- c("a", "b", "c")
n <- length(set)

# k = 1
# There are only three ways to pick one thing from a selection of three items!
sprintf("Number of combinations:%4d", n^1)
for(i in seq_along(set)){
  print(paste(set[i])) 
}

# k = 2
sprintf("Number of combinations:%4d", n^2)
for(i in seq_along(set)){
  for(j in seq_along(set)){
    print(paste(set[i], set[j])) 
  }
}

# k = 3
sprintf("Number of combinations:%4d", n^3)
for(i in seq_along(set)){
  for(j in seq_along(set)){
    for(k in seq_along(set)){
      print(paste(set[i], set[j], set[k])) 
    }
  }
}

# See the pattern? The value of k corresponds
# to the number of loops and to the number of
# indexes on `set`

- Connor Krenzer

1

使用cross()及其变体的purrr解决方案：

library(purrr)

cross(list(0:1, 0:1, 0:1)) %>% simplify_all()

# [[1]]
# [1] 0 0 0
# 
# [[2]]
# [1] 1 0 0
# 
# [[3]]
# [1] 0 1 0
# 
# ...
#
# [[8]]
# [1] 1 1 1

cross_df(list(var1 = 0:1, var2 = 0:1, var3 = 0:1))

# # A tibble: 8 × 3
#    var1  var2  var3
#   <int> <int> <int>
# 1     0     0     0
# 2     1     0     0
# 3     0     1     0
# 4     1     1     0
# 5     0     0     1
# 6     1     0     1
# 7     0     1     1
# 8     1     1     1

使用 dplyr 包，你可以使用 full_join(x, y, by = character()) 函数进行交叉连接操作，生成所有 x 和 y 的组合。

Reduce(\(x, y) full_join(x, y, by = character()),
       list(tibble(var1 = 0:1), tibble(var2 = 0:1), tibble(var3 = 0:1)))

# # A tibble: 8 × 3
#    var1  var2  var3
#   <int> <int> <int>
# 1     0     0     0
# 2     0     0     1
# 3     0     1     0
# 4     0     1     1
# 5     1     0     0
# 6     1     0     1
# 7     1     1     0
# 8     1     1     1

- Darren Tsai

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Justin · Accepted Answer

你需要寻找 expand.grid 函数。

expand.grid(0:1, 0:1, 0:1)

或者，对于长案例：

n <- 14
l <- rep(list(0:1), n)

expand.grid(l)