我希望能够根据一个单独的数据框中的索引给出的权重随机抽样月份,但是该索引会根据某些分类变量而发生变化。
以下是一个例子:
require(dplyr)
sim.size <- 1000
# Generating the weights for each month, and category combination
class_probs <- data_frame(categoryA=rep(letters[1:3],24)
categoryB=rep(LETTERS[1:2],each=36),
Month=rep(month.name,6),
MonthIndex=runif(72))
# Generating some randomly simulated cateogories
sim.data <- data_frame(categoryA=sample(letters[1:3],size=sim.size,replace=TRUE),
categoryB=sample(LETTERS[1:2],size=sim.size,replace=TRUE))
# This is where i need help
# I would like to add an extra column called Month on the end of sim.data
# That will be sampled using the class_probs data, taking into account the
# Both categoryA and categoryB to generate the weights in MonthIndex
sim.data %>%
group_by(categoryA,categoryB) %>%
do(sample_n(class_probs[class_probs$categoryA==categoryA &
class_probs$categoryB==categoryB, ],
size=nrow(sim.data[sim.data$categoryA==categoryA &
sim.data$categoryB==categoryB]),
replace=TRUE,
weight=MonthIndex)$Month)
因此,对于每个组,我希望能够抽取相同数量的类别A和类别B的特定组合出现次数,并从 class_prob 数据框的子集中根据 MonthIndex 抽取一个月份用于每次抽样...
所选的月份随后作为额外的一列添加到原始数据集 sim.data
中
希望我的代码已经很接近了...只需要帮助确定哪些部分需要更改一下...
class_probs <- data_frame(categoryA=rep(letters[1:3],24)
。我建议在代码开头设置随机种子,例如set.seed(1)
,以使此示例更易于重现。 - Sam Firke