基本上,我们需要有条件地从x向量中选择一个元素,该元素在min.dist-1
次运行中尚未被选择。使用purrr的reduce函数,我们可以实现此操作:
min.dist <- 2
reduce(integer(length(x)-1), ~c(.x, sample(x[!x %in% tail(.x, min.dist)], 1)), .init=sample(x,1))
[1] "A" "E" "D" "B" "A" "D" "E" "C" "D" "A" "C" "E" "B" "A" "E"
打包在函数中
shuffle <- function(x, min.dist=2){
stopifnot(min.dist < length(unique(x)))
reduce(integer(length(x)-1), ~c(.x, sample(x[!x %in% tail(.x, min.dist)], 1)), .init=sample(x,1))
}
> shuffle(x, 3)
[1] "A" "C" "B" "D" "E" "A" "B" "C" "E" "D" "A" "B" "C" "E" "A"
> shuffle(x, 3)
[1] "A" "B" "D" "E" "C" "A" "B" "D" "E" "C" "A" "D" "E" "C" "A"
> shuffle(x, 4)
[1] "C" "E" "D" "A" "B" "C" "E" "D" "A" "B" "C" "E" "D" "A" "B"
> shuffle(x, 4)
[1] "A" "B" "D" "E" "C" "A" "B" "D" "E" "C" "A" "B" "D" "E" "C"
> shuffle(x, 2)
[1] "E" "A" "D" "E" "B" "D" "A" "E" "C" "D" "A" "E" "C" "A" "B"
> shuffle(x, 2)
[1] "B" "A" "D" "C" "B" "A" "E" "B" "A" "E" "B" "C" "D" "A" "E"
在@27ϕ9评论之后:
shuffle <- function(x, min.dist=2){
stopifnot(min.dist < length(unique(x)))
reduce(integer(length(x)-1), ~ c(.x, sample(x[!x %in% tail(.x, min.dist) &( x %in% names(t <- table(x[x%in%.x]) > table(.x))[t] | !x %in% .x)], 1)), .init=sample(x,1))
}
> table(shuffle(rep(LETTERS[1:5], 3),2))
A B C D E
3 3 3 3 3
> table(shuffle(rep(LETTERS[1:5], 3),2))
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
更新
在经过一些尝试和错误后,考虑到你不总是会有足够的元素来调整min.dist
,我想出了一个解决方案,这段代码是上面那些代码中最详细解释的:
shuffle <- function(x, min.dist=2){
stopifnot(min.dist < length(unique(x)))
reduce(integer(length(x)-1), function(.x, ...){
in.tail <- x %in% tail(.x, min.dist)
freq.got <- x %in% names(t<-table(x[x%in%.x]) > table(.x))[t]
yet <- !x %in% .x
c(.x, if(any((!in.tail & freq.got) | yet )) sample(x[(!in.tail & freq.got) | yet ], 1) else x[which(freq.got)[1]] )
}, .init=sample(x,1))
}
现在运行table(shuffle(rep(LETTERS[1:5], 3),2))
总是为所有变量返回3,并且我们可以有一定的把握地说,在向量中,变量之间的最小距离为2.保证没有元素重复的唯一方法是使用min.dist=length(unique(x))-1
,否则就会出现最多r < min.dist
元素与它们上次出现的位置之间的距离不足min.dist
,如果存在这样的元素,则它们将位于生成向量的length(x) + 1 - 1:min.dist
子集中。
为了确保完全确定,可以使用循环来检查输出向量的尾部是否具有唯一值:(删除我用于演示目的的打印语句)
shuffler <- function(x, min.dist=2){
while(!length(unique(print(tail(l<-shuffle(x, min.dist=min.dist), min.dist+1))))==min.dist+1){}
l
}
table(print(shuffler(rep(LETTERS[1:5], 3),2)))
[1] "A" "B" "C" "E" "B" "C" "D" "A" "C" "D" "A" "E" "B" "D" "E"
A B C D E
3 3 3 3 3
table(print(shuffler(rep(LETTERS[1:5], 3),2)))
[1] "D" "C" "C"
[1] "C" "C" "E"
[1] "C" "A" "C"
[1] "D" "B" "D"
[1] "B" "E" "D"
[1] "C" "A" "E" "D" "A" "B" "C" "E" "A" "B" "D" "C" "B" "E" "D"
A B C D E
3 3 3 3 3
更新:
shuffler <- function(x, min.dist=2){
while(any(unlist(lapply(unique(tl<-tail(l<-shuffle(x, min.dist=min.dist), 2*min.dist)), function(x) diff(which(tl==x))<=min.dist)))){}
l
}
这个新版本在测试矢量尾部元素的min.dist
时更为严格,而上一个版本只适用于min.dist=2
,然而这个新版本进行了更好的测试。