一个将观测值均匀分成组的R函数

Question

一个将观测值均匀分成组的R函数

rgroupingcombinatorics

3

我有一个30x2的数据框(df)，其中一列包含了30个人的姓名，第二列包含他们的ID号码。我想在R中创建一个函数，该函数可以随机并尽可能平均地将这30个人分成多组，并且可以处理有余数和无余数的情况。

为了澄清，这个函数应该： • 以2个参数作为参数：df和代表组数的整数 • 给我返回原始的df，但是增加了一个额外的列，该列包含每个人被随机分配到的组号 • 如果人数（行数）不能被给定的整数整除，则剩余的行应尽可能平均地分配到各个组中

例如： • 如果我想把30个人分成1组，我的函数应该返回一个新的列“group_no”，其中每个人都有1个（每个人都被分配到同一组）

• 如果我想要4个组，我希望看到10个人被分配到2个组，剩下的5个人被分配到另外2个组。

• 如果我想要8个组，那么函数应该给我6个由4个人组成的组和2个由3个人组成的组等等。

我已经编写了一些代码，它可以做到我所需的，但我只是手动输入组别，所以不知道它有多随机或正确...我想编写一个函数，可以自动执行这些任务。

#My code so far
#For 1 group of 30 people

people=1:30
groups=1
df$group_no <- print(sample(groups))

#For 4 groups (2 groups of 10 people and 2 groups of 5 people)
groups=c(rep(1,5), rep(2,5), rep(3,10), rep(4,10))
df$group_no <- print(sample(groups))

#For 7 groups (3 groups of 6 people and 4 groups of 3 people)
groups=c(rep(1,6), rep(2,6), rep(3,6), rep(4,3), rep(5,3), rep(6,3), rep(7,3))
df$group_no <- print(sample(groups))

#For 8 groups (6 groups of 4 people and 2 groups of 3 people)
groups=c(rep(1,4), rep(2,4), rep(3,4), rep(4,4), rep(5,4), rep(6,4), rep(7,3), rep(8,3))
df$group_no <- print(sample(groups))


#For 10 groups of 3 people each
groups=c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3), rep(6,3), rep(7,3), rep(8,3), rep(9,3), rep(10,3))
df$group_no <- print(sample(groups))


fct_grouping <- function(df, nr_groups) {
 ????? 
}

- R. Simian

可能重复：https://dev59.com/-G025IYBdhLWcg3wRTxK - MrFlick

你对7-7-8-8的问题完全正确，我刚刚意识到了我的错误，并正在纠正中。实际上，对于7组的例子，我应该有5组4人和2组5人。但是对于6-6-9-9，我想我不希望这样，因为我试图尽可能均匀地将人员分配到各个组中。所以，我正在努力形成几乎包含相等人数的小组。希望这样说得清楚。 - R. Simian

3个回答

1

以下代码应该可以满足您的要求，并返回一个带有分组的向量。

fct_grouping <- function(df, nr_groups) {
    base_number <- floor(nrow(df) / nr_groups)
    rest <- nrow(df) - base_number * nr_groups
    groupings <- sort(c(rep(seq(nr_groups), base_number), if (rest==0) numeric() else seq(rest)))
    return(groupings)
}

- apeqqut

谢谢apeqqut！ - R. Simian

1

我相信您要寻找的内容在R语言中是可以进行数学编程的，但是当组数与人数的余数不等于零时，建模会变得困难，因为有多种分配情况可供选择（考虑定义10组及以上的情况）。此外，您提出的示例未满足所需条件（尽可能相似的组大小）。

以下是我能想到的最接近的解决方案：

df <- data.frame(people = c(1:30))

fct_grouping <- function(df, nr_groups) {

if (nrow(df) %% nr_groups == 0) {
print(cbind(df, sample(nr_groups)))

} else {
print("n is not a multiple of number of people")
}}

df2 <- fct_grouping(df, 5)

#         people sample(nr_groups)
# 1       1                 1
# 2       2                 3
# 3       3                 2
# 4       4                 5
# 5       5                 4
# 6       6                 1
# 7       7                 3
# 8       8                 2
# 9       9                 5
# 10     10                 4
# 11     11                 1
# 12     12                 3
# 13     13                 2
# 14     14                 5
# 15     15                 4
# 16     16                 1
# 17     17                 3
# 18     18                 2
# 19     19                 5
# 20     20                 4
# 21     21                 1
# 22     22                 3
# 23     23                 2
# 24     24                 5
# 25     25                 4
# 26     26                 1
# 27     27                 3
# 28     28                 2
# 29     29                 5
# 30     30                 4

- David Jorquera

谢谢David Jorquera。下面由Lief Esbenshade提供的最后一个示例代码完美地完成了我想要实现的目标！我想我是因为过于考虑数学上的复杂性而使自己变得过于复杂化。 - R. Simian

确实是个很棒的答案！请记得接受为正确答案。 - David Jorquera

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lief Esbenshade · Accepted Answer

今日免费次数已满, 请开通会员/明日再来


grouper <- function(df, n) {

  # create a random number for each row
  random <- sample(1:nrow(df), replace = FALSE, nrow(df))

  # divide the random number by the group size
  df$group_number <- ceiling(random / (nrow(df) / n))

  return(df)  
}