在R中按组创建组合

Question

在R中按组创建组合

13

我想为我的课堂创建一个包含所有可能的四人小组的列表。如果我有20名学生，我该如何在R中按组创建此列表？我的行是每个组合，有20列用于完整的学生ID列表，其中第1-4列为“group1”，第5-9列为“group2”等等。

下面给出了每个四人小组（x1、x2、x3和x4）可能组合的列表。现在，对于列出的每一行，其他四个四人小组的可能性是什么？因此，应该有20列（Group1_1:4，Group2_1:4，Group3_1:4，Group4_1:4，Group5_1:4）。

combn(c(1:20), m = 4)

期望输出

Combination 1 = Group1[1, 2, 3, 4] Group2[5, 6, 7, 8], Group3[9, 10, 11, 12], etc. 
Combination 2 = Group1[1, 2, 3, 5]... etc.

网上有很多与组合相关的文章，很可能已经有人回答过这个问题，只是我找不到。感激任何帮助！

- SharpSharpLes

1

你的目标是分配组还是获取创建组的所有可能唯一方式的列表？因为其中一个非常容易且不需要太多存储空间。而另一个则需要相当大的存储空间才能保存结果。 - Dason

没错，我觉得数据集会相当大，但我的目标是创建一份所有可能的独特分组方式清单。 - SharpSharpLes

1

当然。2组共4名学生...组合1 - 组1 = 1、2，组2 = 3、4 组合2 - 组1 = 1、3，组2 = 2、4 组合3 - 组1 = 1、4，组2 = 2、3 - SharpSharpLes

4

你是否尝试着做这个：https://stackoverflow.com/a/51754958/4408538。如果是的话，我认为我可以用R语言很容易地编写出某些内容。 - Joseph Wood

1

这是一个算法的基本思路；不确定是否适用于R语言：https://dev59.com/_Jrga4cB1Zd3GeqPiiU-#39129475 - m69 ''snarky and unwelcoming''

显示剩余5条评论

9个回答

6

这是一个计算上有挑战性的问题，因为我认为有25亿种可能性要枚举。(如果有错误，请欢迎指正)

根据存储方式的不同，包含所有这些分组的表格可能需要比大多数计算机处理的RAM还要多。如果我们采用“一次生成一个组合”的方法，即使我们能每秒生成1,000,000个组合，也需要41分钟才能生成所有可能性，如果我们只能每秒生成1,000个，则需要一个月的时间。

编辑-在底部添加了部分实现，可以创建从#1到#2,546,168,625中的任何所需分组。对于某些目的来说，这可能几乎与实际存储整个序列一样好，而该序列非常大。

假设我们将制作5组，每组4名学生：A组、B组、C组、D组和E组。

让我们把A组定义为学生#1所在的组。他们可以与其他19名学生中的任意三名学生搭档。我认为有969种这样的组合：

> nrow(t(combn(1:19, 3)))
[1] 969

现在，其他组仅剩下16名学生。我们将第一个不属于A组的学生分配到B组。这可能是第2、3、4或5个学生。这并不重要，我们只需要知道可以与该学生匹配的学生仅有15人。总共有455种这样的组合：

> nrow(t(combn(1:15, 3)))
[1] 455

现在还剩12个学生。再次分配第一个未分组的学生到C组，那么他们与其他11个学生有165种组合方式：

> nrow(t(combn(1:11, 3)))
[1] 165

我们还剩下8名学生，其中7名可以与第一个未分组的学生配对成为D组，有35种不同的方式：

> nrow(t(combn(1:7, 3)))
[1] 35

然后，一旦我们确定了其他小组，只剩下一个由四名学生组成的小组，其中三名学生可以与第一个未分组的学生配对：

> nrow(t(combn(1:3, 3)))
[1] 1

这意味着有2.546B种组合:

> 969*455*165*35*1
[1] 2546168625

这是一个正在进行中的函数，它可以根据任意顺序号生成分组。

1) [正在进行] 将顺序号转换为向量，描述应使用哪些#组合来组成A、B、C、D和E组。例如，将 #1 转换为 c(1, 1, 1, 1, 1) ，将 #2,546,168,625 转换为 c(969, 455, 165, 35, 1)。

2) 将组合转换为特定输出，描述每个组中的学生。

groupings <- function(seq_nums) {
  students <- 20
  group_size = 4
  grouped <- NULL
  remaining <- 1:20
  seq_nums_pad <- c(seq_nums, 1) # Last group always uses the only possible combination
  for (g in 1:5) {
    group_relative <- 
      c(1, 1 + t(combn(1:(length(remaining) - 1), group_size - 1))[seq_nums_pad[g], ])
    group <- remaining[group_relative]
    print(group)
    grouped = c(grouped, group)
    remaining <-  setdiff(remaining, grouped)
  }
}

> groupings(c(1,1,1,1))
#[1] 1 2 3 4
#[1] 5 6 7 8
#[1]  9 10 11 12
#[1] 13 14 15 16
#[1] 17 18 19 20
> groupings(c(1,1,1,2))
#[1] 1 2 3 4
#[1] 5 6 7 8
#[1]  9 10 11 12
#[1] 13 14 15 17
#[1] 16 18 19 20
> groupings(c(969, 455, 165, 35))   # This one uses the last possibility for
#[1]  1 18 19 20                    #   each grouping.
#[1]  2 15 16 17
#[1]  3 12 13 14
#[1]  4  9 10 11
#[1] 5 6 7 8

- Jon Spring

你的推理是正确的，而且是计算第n组组合（不太确定正确术语）的核心。 - Joseph Wood

@JosephWood 我认为正确的术语是n是第n个组合的“秩”。 - m69 ''snarky and unwelcoming''

@m69，啊是的，我之前听说过排名/未排名。实际上我问的是这些排列的正确术语（即“组合群”）。它们有点像组合和排列。它们绝对属于组合数学领域。多年来，在接触到具有特定名称的该领域新事物后（例如，我刚刚接触到超级排列），我敢打赌这种情况也不例外。 - Joseph Wood

6

这在很大程度上依赖于这个答案:

可以创建所有组合和这些组合的所有群组的算法

需要注意的是，该答案并不是非常灵活的-它只包括了三个组的解决方案。为了使其更加健壮，我们可以根据输入参数创建代码。也就是说，以下递归函数会基于3个组创建：

group <- function(input, step){
 len <- length(input) 
 combination[1, step] <<- input[1] 

 for (i1 in 2:(len-1)) { 
   combination[2, step] <<- input[i1] 

   for (i2 in (i1+1):(len-0)) { 
     combination[3, step] <<- input[i2] 

     if (step == m) { 
       print(z); result[z, ,] <<- combination 
       z <<- z+1 
     } else { 
       rest <- setdiff(input, input[c(i1,i2, 1)]) 
       group(rest, step +1) #recursive if there are still additional possibilities
   }} 
 } 
}

对于 N = 16 和 k = 4，这需要大约55秒来运行。我想将其翻译成Rcpp，但不幸的是我没有那个技能。

group_N <- function(input, k = 2) {
  N = length(input)
  m = N/k
  combos <- factorial(N) / (factorial(k)^m * factorial(m))

  result <- array(NA_integer_, dim = c(combos, m, k))
  combination = matrix(NA_integer_, nrow = k, ncol = m)

  z = 1

  group_f_start = 'group <- function(input, step){\n len <- length(input) \n combination[1,  step] <<- input[1] \n '
  i_s <- paste0('i', seq_len(k-1))

  group_f_fors = paste0('for (', i_s, ' in ', c('2', if (length(i_s) != 1) {paste0('(', i_s[-length(i_s)], '+1)')}), ':(len-', rev(seq_len(k)[-k])-1, ')) { \n combination[', seq_len(k)[-1], ', step] <<- input[', i_s, '] \n', collapse = '\n ')

  group_f_inner = paste0('if (step == m) { \n result[z, ,] <<- combination \n z <<- z+1 \n } else { \n rest <- setdiff(input, input[c(',
                         paste0(i_s, collapse = ','),
                         ', 1)]) \n group(rest, step +1) \n }')

  eval(parse(text = paste0(group_f_start, group_f_fors, group_f_inner, paste0(rep('}', times = k), collapse = ' \n '))))

  group(input, 1)
  return(result)
}

性能

system.time({test_1 <- group_N(seq_len(4), 2)})
#   user  system elapsed 
#   0.01    0.00    0.02
library(data.table)

#this funky step is just to better show the groups. the provided
## array is fine.

as.data.table(t(rbindlist(as.data.table(apply(test_1, c(1,3), list)))))
#    V1  V2
#1: 1,2 3,4
#2: 1,3 2,4
#3: 1,4 2,3

system.time({test_1 <- group_N(seq_len(16), 4)})
#   user  system elapsed 
#  55.00    0.19   55.29 

as.data.table(t(rbindlist(as.data.table(apply(test_1, c(1,3), list)))))
#very slow
#                  V1          V2          V3          V4
#      1:     1,2,3,4     5,6,7,8  9,10,11,12 13,14,15,16
#      2:     1,2,3,4     5,6,7,8  9,10,11,13 12,14,15,16
#      3:     1,2,3,4     5,6,7,8  9,10,11,14 12,13,15,16
#      4:     1,2,3,4     5,6,7,8  9,10,11,15 12,13,14,16
#      5:     1,2,3,4     5,6,7,8  9,10,11,16 12,13,14,15
#     ---                                                
#2627621:  1,14,15,16  2,11,12,13  3, 6, 9,10     4,5,7,8
#2627622:  1,14,15,16  2,11,12,13     3,7,8,9  4, 5, 6,10
#2627623:  1,14,15,16  2,11,12,13  3, 7, 8,10     4,5,6,9
#2627624:  1,14,15,16  2,11,12,13  3, 7, 9,10     4,5,6,8
#2627625:  1,14,15,16  2,11,12,13  3, 8, 9,10     4,5,6,7

- Cole

4

这里有一个关于小数字的例子。我觉得这在处理20个学生的情况下不太可行。

total_students = 4
each_group = 2
total_groups = total_students/each_group

if (total_students %% each_group == 0) {
    library(arrangements)

    group_id = rep(1:total_groups, each = each_group)

    #There is room to increase efficiency here by generating only relevant permutations
    temp = permutations(1:total_students, total_students)
    temp = unique(t(apply(temp, 1, function(i) {
        x = group_id[i]
        match(x, unique(x))
    })))

    dimnames(temp) = list(COMBO = paste0("C", 1:NROW(temp)),
                          Student = paste0("S", 1:NCOL(temp)))
} else {
    cat("Total students not multiple of each_group")
    temp = NA
}
#> Warning: package 'arrangements' was built under R version 3.5.3
temp
#>      Student
#> COMBO S1 S2 S3 S4
#>    C1  1  1  2  2
#>    C2  1  2  1  2
#>    C3  1  2  2  1

^{本文创建于2019年9月2日，使用reprex软件包 (v0.3.0)}

以下函数(参考链接)可计算总可能组合数：

foo = function(N, k) {
    #N is total number or people, k is number of people in each group
    if (N %% k == 0) {
        m = N/k
        factorial(N)/(factorial(k)^m * factorial(m))
    } else {
        stop("N is not a multiple of n")
    }
}

foo(4, 2)
#[1] 3

foo(20, 4)
#[1] 2546168625

对于来自20人的4人组，可能的排列数量非常庞大。

- d.b

我认为你是正确的，这可以简化。对于我们来说，C1和C6（上面）是一样的。它们将学生1和2组合在一起，学生3和4组合在一起。 - SharpSharpLes

1

你可以尝试使用基本的R语言来定义一个自定义函数，就像下面这样

f <- function(v, grpsz) {
    p <- combn(v, grpsz)
    lst <- asplit(p[, p[1, ] == min(p[1, ])], 2)
    cnt <- 1
    repeat {
        if (cnt == length(v) / grpsz) {
            return(lst)
        }
        lst <- unlist(lapply(lst, \(x) {
            p <- combn(v[!v %in% x], grpsz)
            Map(
                cbind,
                list(x),
                asplit(
                    p[, p[1, ] == min(p[1, ]), drop = FALSE],
                    2
                )
            )
        }), recursive = FALSE)
        cnt <- cnt + 1
    }
}

这样我们可以得到一个矩阵列表的结果（每列表示一个组），例如。

> f(1:6, 2)
[[1]]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

[[2]]
     [,1] [,2] [,3]
[1,]    1    3    4
[2,]    2    5    6

[[3]]
     [,1] [,2] [,3]
[1,]    1    3    4
[2,]    2    6    5

[[4]]
     [,1] [,2] [,3]
[1,]    1    2    5
[2,]    3    4    6

[[5]]
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    3    5    6

[[6]]
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    3    6    5

[[7]]
     [,1] [,2] [,3]
[1,]    1    2    5
[2,]    4    3    6

[[8]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

[[9]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    6    5

[[10]]
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    5    3    6

[[11]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    5    4    6

[[12]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    5    6    4

[[13]]
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    6    3    5

[[14]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    6    4    5

[[15]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    6    5    4


> f(1:6, 3)
[[1]]
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

[[2]]
     [,1] [,2]
[1,]    1    3
[2,]    2    5
[3,]    4    6

[[3]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4
[3,]    5    6

[[4]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4
[3,]    6    5

[[5]]
     [,1] [,2]
[1,]    1    2
[2,]    3    5
[3,]    4    6

[[6]]
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6

[[7]]
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    6    5

[[8]]
     [,1] [,2]
[1,]    1    2
[2,]    4    3
[3,]    5    6

[[9]]
     [,1] [,2]
[1,]    1    2
[2,]    4    3
[3,]    6    5

[[10]]
     [,1] [,2]
[1,]    1    2
[2,]    5    3
[3,]    6    4

- ThomasIsCoding

0

这是一个仅使用基本的R函数来生成可能组合的函数。

Group_Assignment_Function <- function (Identifiers, Number_of_Items_in_Each_Group, Number_of_Groups) {
  Output <- vector(mode = 'list', length = Number_of_Groups)
  Possible_Groups_Function <- function (x) {
    if (is.list(x)) {
      lapply(x, Possible_Groups_Function)
    } else if (!is.list(x)) {
      as.list(as.data.frame(combn(x, Number_of_Items_in_Each_Group)))
    }
  }
  Remaining_Items_Function <- function (x, y) {
    if (!is.list(y)) {
      lapply(x, function (z) {
        setdiff(y, z)
      })
    } else if (is.list(y)) {
      mapply(Remaining_Items_Function, x = x, y = y, SIMPLIFY = F)
    }
  }
  All_Possible_Groups_Function <- function (x) {
    for (i in seq_len(Number_of_Groups - 1)) {
      if (i == 1) {
        Group_Possibilities <- Possible_Groups_Function(x)
      } else if (i > 1) {
        Group_Possibilities <- Possible_Groups_Function(Remaining_Items)
      }
      Output[[i]] <- Group_Possibilities
      if (!all(sapply(Group_Possibilities, is.list))) {
        Remaining_Items <- lapply(Group_Possibilities, function (y) {
          setdiff(x, y)
        })
      } else if (all(sapply(Group_Possibilities, is.list))) {
        Remaining_Items <- Remaining_Items_Function(Group_Possibilities, Remaining_Items)
      }
    }
    if (Number_of_Groups == 1) {
      Output[[Number_of_Groups]] <- Possible_Groups_Function(x)
    } else if (Number_of_Groups > 1) {
      Output[[Number_of_Groups]] <- Possible_Groups_Function(Remaining_Items)
    }
    Output
  }
  All_Possible_Groups <- All_Possible_Groups_Function(Identifiers)
  Repitition_Times <- choose(length(Identifiers) - (Number_of_Items_in_Each_Group * (0:(Number_of_Groups - 1))), Number_of_Items_in_Each_Group)
  Repitition_Times <- c(Repitition_Times[2:length(Repitition_Times)], 1)
  Repitition_Times <- lapply((length(Repitition_Times) - seq_len(length(Repitition_Times))) + 1, function (x) {
    rev(Repitition_Times)[1:x]
  })
  Repitition_Times <- lapply(Repitition_Times, function (y) {
    Reduce(`*`, y)
  })
  All_Possible_Groups <- lapply(All_Possible_Groups, function(x) {
    z <- sapply(x, function (y) {
      class(y)[1] == "list"
    })
    w <- c(x[!z], unlist(x[z], recursive = F))
    if (sum(z)){
      Recall(w)
    } else if (!sum(z)) {
      w
    }
  })
  All_Possible_Groups <- mapply(function (x, y) {
    x[rep(seq_len(length(x)), each = y)]
  }, x = All_Possible_Groups, y = Repitition_Times, SIMPLIFY = F)
  All_Possible_Groups <- lapply(seq_len(unique(sapply(All_Possible_Groups, length))), function (x) {
    lapply(All_Possible_Groups,"[[", x)
  })
  List_of_Possible_Groups <- lapply(All_Possible_Groups, function (x) {
    names(x) <- paste0("Group_", seq_len(Number_of_Groups))
    x
  })
  names(List_of_Possible_Groups) <- NULL
  Ordered_List_of_Possible_Groups_1 <- lapply(List_of_Possible_Groups, function (x) {
    lapply(x, sort)
  })
  Ordered_List_of_Possible_Groups_2 <- lapply(Ordered_List_of_Possible_Groups_1, function (x) {
    order(sapply(x, function (y) {
      y[1]
    }))
  })
  Ordered_List_of_Possible_Groups_1 <- mapply(function (x, y) {
    x[y]
  }, x = Ordered_List_of_Possible_Groups_1, y = Ordered_List_of_Possible_Groups_2, SIMPLIFY = F)
  Ordered_List_of_Possible_Groups_1 <- lapply(Ordered_List_of_Possible_Groups_1, function (x) {
    do.call('c', x)
      })
  Ordered_List_of_Possible_Groups_1 <- lapply(Ordered_List_of_Possible_Groups_1, function (x) {
    names(x) <- NULL
    x
  })
  List_of_Possible_Groups <- List_of_Possible_Groups[-c(which(duplicated(Ordered_List_of_Possible_Groups_1)))]
  names(List_of_Possible_Groups) <- paste("Possibility", seq_len(length(List_of_Possible_Groups)), sep = "_")
  List_of_Possible_Groups
}

这是如何使用它的示例：

Identifiers <- as.character(1:5)
Number_of_Items_in_Each_Group <- 2
Number_of_Groups <- 2
Group_Assignment_Function(Identifiers = Identifiers, Number_of_Items_in_Each_Group = Number_of_Items_in_Each_Group, Number_of_Groups = Number_of_Groups)
# $Possibility_1
# $Possibility_1$Group_1
# [1] "1" "2"
# 
# $Possibility_1$Group_2
# [1] "3" "4"
# 
# 
# $Possibility_2
# $Possibility_2$Group_1
# [1] "1" "2"
# 
# $Possibility_2$Group_2
# [1] "3" "5"
# 
# 
# $Possibility_3
# $Possibility_3$Group_1
# [1] "1" "2"
# 
# $Possibility_3$Group_2
# [1] "4" "5"
# 
# 
# $Possibility_4
# $Possibility_4$Group_1
# [1] "1" "3"
# 
# $Possibility_4$Group_2
# [1] "2" "4"
# 
# 
# $Possibility_5
# $Possibility_5$Group_1
# [1] "1" "3"
# 
# $Possibility_5$Group_2
# [1] "2" "5"
# 
# 
# $Possibility_6
# $Possibility_6$Group_1
# [1] "1" "3"
# 
# $Possibility_6$Group_2
# [1] "4" "5"
# 
# 
# $Possibility_7
# $Possibility_7$Group_1
# [1] "1" "4"
# 
# $Possibility_7$Group_2
# [1] "2" "3"
# 
# 
# $Possibility_8
# $Possibility_8$Group_1
# [1] "1" "4"
# 
# $Possibility_8$Group_2
# [1] "2" "5"
# 
# 
# $Possibility_9
# $Possibility_9$Group_1
# [1] "1" "4"
# 
# $Possibility_9$Group_2
# [1] "3" "5"
# 
# 
# $Possibility_10
# $Possibility_10$Group_1
# [1] "1" "5"
# 
# $Possibility_10$Group_2
# [1] "2" "3"
# 
# 
# $Possibility_11
# $Possibility_11$Group_1
# [1] "1" "5"
# 
# $Possibility_11$Group_2
# [1] "2" "4"
# 
# 
# $Possibility_12
# $Possibility_12$Group_1
# [1] "1" "5"
# 
# $Possibility_12$Group_2
# [1] "3" "4"
# 
# 
# $Possibility_13
# $Possibility_13$Group_1
# [1] "2" "3"
# 
# $Possibility_13$Group_2
# [1] "4" "5"
# 
# 
# $Possibility_14
# $Possibility_14$Group_1
# [1] "2" "4"
# 
# $Possibility_14$Group_2
# [1] "3" "5"
# 
# 
# $Possibility_15
# $Possibility_15$Group_1
# [1] "2" "5"
# 
# $Possibility_15$Group_2
# [1] "3" "4"

对于更大数量的项目，需要一些时间。如果有人有更好的base R解决方案，我很想看看。我相信有更有效率的方法，因为这种方法会生成所有可能的排列，然后消除那些实际上在每个组中没有不同事物的排列。

- David Moore

0

以下代码有效。

# Create list of the 20 records
list <- c(1:20)

# Generate all combinations including repetitions
c <- data.frame(expand.grid(rep(list(list), 4))); rm(list)
c$combo <- paste(c$Var1, c$Var2, c$Var3, c$Var4)
# Remove repetitions
c <- subset(c, c$Var1 != c$Var2 & c$Var1 != c$Var3 & c$Var1 != c$Var4 & c$Var2 != c$Var3 & c$Var2 != c$Var4 & c$Var3 != c$Var4)

# Create common group labels (ex. abc, acb, bac, bca, cab, cba would all have "abc" as their group label).
key <- data.frame(paste(c$Var1, c$Var2, c$Var3, c$Var4))
key$group  <- apply(key, 1, function(x) paste(sort(unlist(strsplit(x, " "))), collapse = " "))
c$group <- key$group; rm(key)

# Sort by common group label and id combos by group
c <- c[order(c$group),]
c$Var1 <- NULL; c$Var2 <- NULL; c$Var3 <- NULL; c$Var4 <- NULL;
c$rank <- rep(1:24)

# Pivot
c <- reshape(data=c,idvar="group", v.names = "combo", timevar = "rank", direction="wide")

- Monk

20C4中有4,845个独特的组合。他在询问（我想），对于这些组合中的任何一个（例如1、2、3、4），所有排列方式是什么（例如1234、1243、1324、1342，...）。每个4,845个组合都有24种可能的排列方式。因此，最终结果是一个4,845 x 24的矩阵（其中有一列额外表示共同的组标签）。 - Monk

1

是的，我认为你是正确的。看起来这个问题与你的答案重复了。 - Monk

0

所以，你可以使用expand.grid函数来获得所有的组合，只需将数据向量添加四次。然后，结果将具有像c(1,1,1,1)这样的组合，因此我删除了每行中有任何重复值的行，最后一部分就是生成组合。这是两个循环，速度相当慢，但它会得到你想要的结果。它可以通过使用Rcpp包来加速。代码如下:

ids = 1:20
d2 = expand.grid(ids,ids,ids,ids)
## Remove rows with duplicated values
pos_use = apply(apply(d2,1,duplicated),2,function(x) all(x == F))
d2_temp = t(apply(d2[pos_use,],1,sort))
list_temp = list()
pos_quitar = NULL
for(i in 1:nrow(d2_temp)){
  pos_quitar = c(pos_quitar,i)
  ini_comb = d2_temp[i,]
  d2_temp_use  = d2_temp[-pos_quitar,]
  temp_comb = ini_comb
  for(j in 2:5){
    pos_quitar_new = which(apply(d2_temp_use,1,function(x) !any(temp_comb%in%x)))[1]
    temp_comb = c(temp_comb,d2_temp_use[pos_quitar_new,])
  }
  pos_quitar = c(pos_quitar,pos_quitar_new)
  list_temp[[i]] = temp_comb
}

list_temp

- Alejandro Andrade

-1

以下代码可以在不重复的情况下从20个元素中选择4个元素的所有唯一组合。

x <- c(1:20)
combinations <- data.frame(t(combn(x, 4)))

- Monk

3

这提供了每个由4名学生组成的单一小组可能的组合列表（x1，x2，x3和x4）。现在，对于列出的每一行，其他4个由4名学生组成的小组有哪些可能性？因此，应该有20列（Group1_1:4，Group2_1:4，Group3_1:4，Group4_1:4，Group5_1:4）。如果有不清楚的地方，请告诉我。 - SharpSharpLes

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joseph Wood · Accepted Answer

你可以使用来自 RcppAlgos (v >= 2.3.5)^* 的 comboGroups。

library(RcppAlgos)
a <- comboGroups(10, numGroups = 2, retType = "3Darray")

dim(a)
[1] 126   5   2

a[1,,]
     Grp1 Grp2
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

a[126,,]
     Grp1 Grp2
[1,]    1    2
[2,]    7    3
[3,]    8    4
[4,]    9    5
[5,]   10    6

或者如果您更喜欢矩阵：

a1 <- comboGroups(10, 2, retType = "matrix")

head(a1)
     Grp1 Grp1 Grp1 Grp1 Grp1 Grp2 Grp2 Grp2 Grp2 Grp2
[1,]    1    2    3    4    5    6    7    8    9   10
[2,]    1    2    3    4    6    5    7    8    9   10
[3,]    1    2    3    4    7    5    6    8    9   10
[4,]    1    2    3    4    8    5    6    7    9   10
[5,]    1    2    3    4    9    5    6    7    8   10
[6,]    1    2    3    4   10    5    6    7    8    9

它也非常快速。您甚至可以使用nThreads或Parallel = TRUE（后者使用系统最大线程数减一）并行生成，以获得更大的效率提升：

comboGroupsCount(16, 4)
[1] 2627625

system.time(comboGroups(16, 4, "matrix"))
 user  system elapsed 
0.107   0.030   0.137

system.time(comboGroups(16, 4, "matrix", nThreads = 4))
 user  system elapsed 
0.124   0.067   0.055
                                ## 7 threads on my machine
system.time(comboGroups(16, 4, "matrix", Parallel = TRUE))
 user  system elapsed 
0.142   0.126   0.047

一个非常好的功能是能够生成样本或特定的词汇组合，尤其是当结果数量很大时。

comboGroupsCount(factor(state.abb), numGroups = 10)
Big Integer ('bigz') :
[1] 13536281554808237495608549953475109376

mySamp <- comboGroupsSample(factor(state.abb), 
                            numGroups = 10, "3Darray", n = 5, seed = 42)
                            
mySamp[1,,]
     Grp1 Grp2 Grp3 Grp4 Grp5 Grp`6 Grp7 Grp8 Grp9 Grp10
[1,] AL   AK   AR   CA   CO   CT   DE   FL   LA   MD   
[2,] IA   AZ   ME   ID   GA   OR   IL   IN   MS   NM   
[3,] KY   ND   MO   MI   HI   PA   MN   KS   MT   OH   
[4,] TX   RI   SC   NH   NV   WI   NE   MA   NY   TN  
[5,] VA   VT   UT   OK   NJ   WY   WA   NC   SD   WV   
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH ... WY

firstAndLast <- comboGroupsSample(state.abb, 10, "3Darray",
                                  sampleVec = c("1",
                                                "13536281554808237495608549953475109376"))

firstAndLast[1,,]
     Grp1 Grp2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10
[1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA" 
[2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA" 
[3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV" 
[4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI" 
[5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY"
    
firstAndLast[2,,]
     Grp1 Grp2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10
[1,] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" 
[2,] "WA" "TX" "RI" "OH" "NM" "NE" "MN" "ME" "IA" "HI" 
[3,] "WV" "UT" "SC" "OK" "NY" "NV" "MS" "MD" "KS" "ID" 
[4,] "WI" "VT" "SD" "OR" "NC" "NH" "MO" "MA" "KY" "IL" 
[5,] "WY" "VA" "TN" "PA" "ND" "NJ" "MT" "MI" "LA" "IN"

最后，生成所有20人分成5组的组合（OP所要求的）可以使用lower和upper参数在不到一分钟的时间内完成，总共有2,546,168,625个组合。

system.time(aPar <- parallel::mclapply(seq(1, 2546168625, 969969), function(x) {
     combs <- comboGroups(20, 5, "3Darray", lower = x, upper = x + 969968)
     ### do something
     dim(combs)
}, mc.cores = 6))
   user  system elapsed 
217.667  22.932  48.482

sum(sapply(aPar, "[", 1))
[1] 2546168625

虽然我在一年前开始解决这个问题，但这个问题对于将其形式化为一个软件包的灵感非常重要。

^* 我是RcppAlgos的作者。