如何从一个数组中随机且不重复地选择元素？

Question

如何从一个数组中随机且不重复地选择元素？

randomjulia

3

正如标题所述，我想从一个数组中选择2或3个元素，不能重复选择。

我知道可以使用Base.rand函数和if语句来完成这项工作，但我仍在寻找更优雅的方法。

= = =编辑：2020/01/21 = = =

@Gwang-Jin Kim，@phipsgabler 感谢您的建议。我根据您的答案进行了小型测试。

就速度问题而言，也许Base.rand仍然是更好的方法，尽管它的时间成本从1.3e-7到1.4e-7波动。但就优雅的方式而言，sample和shuffle都是可选项。

我的结论正确吗？

using Base
using Random
using StatsBase

function _sampling1(M::Int64, N::Int64)
    for i in 1:M
        for j in 1:N
            r1, r2, r3 = Base.rand(1:N, 3)
            while (r1 == r2) | (r2 == r3) | (r1 == r3)
                r2, r3 = Base.rand(1:N, 2)
            end
        end
    end
end

function _sampling2(M::Int64, N::Int64)
    for i in 1:M
        for j in 1:N
            r1, r2, r3 = Random.shuffle(1:N)[1:3]
        end
    end
end

function _sampling3(M::Int64, N::Int64)
    for i in 1:M
        for j in 1:N
            r1, r2, r3 = StatsBase.sample(1:N, 3, replace=false)
        end
    end
end

M = 500
N = 100

time_cost1 = @elapsed _sampling1(M, N)
time_cost2 = @elapsed _sampling2(M, N)
time_cost3 = @elapsed _sampling3(M, N)

println("   rand: $(time_cost1 / (M * N))")
println("shuffle: $(time_cost2 / (M * N))")
println(" sample: $(time_cost3 / (M * N))")

#>>>    rand: 1.3713026e-7
#>>> shuffle: 1.57786382e-6
#>>>  sample: 5.6382496e-7

- YuChan Tai

2个回答

2

不必使用库中的特殊shuffle函数，可以使用洗牌。例如，对索引数组进行洗牌并选择前5个：

using Random

random_indices = shuffle(eachindex(your_array))
your_array[random_indices[1:5]]

费舍尔-耶茨洗牌算法具有线性复杂度；在某些情况下，这比重复调用sample更具优势（在实践或资源方面）。

您也可以直接对数组进行洗牌（这可能是最缓存友好的方法）。最节省内存的方法是一次使用就地shuffle!（例如用于交叉验证或大型数据集的批处理）。

- phipsgabler

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gwang-Jin Kim · Accepted Answer

# thanks for @Bogumił Kamiński for hint that the `sample`
# function actually is from `StatsBase` package

# install `StatsBase` package or `Distributions` package
using Pkg
Pkg.add("StatsBase") # or: Pkg.add("Distributions")

# load it
using StatsBase # or: using Distributions


# the actual code for sampling 3 or 2 elements without replacement
sample(your_array, 3, replace = false) # 3 elements
sample(your_array, 2, replace = false) # 2 elements