在C#中从List<T>中选择N个随机元素

Question

在C#中从List<T>中选择N个随机元素

c#algorithmcollectionsrandomelement

208

我需要一个快速的算法从一个通用列表中选择5个随机元素。例如，我想从一个List<string>中获取5个随机元素。

- JC Grubbs

15

“随机”一词是否包含重复选择？换句话说，同一个元素是否可以被多次选中？（真正的随机）或者一旦选择了一个元素，该元素是否应从可用池中删除，不能再次选择？ - Pretzel

你只是洗牌并取前N个..为什么这里会有这么多讨论？ - Fattie

被接受的答案并非完全错误，甚至不算错。请参见此处：https://dev59.com/N5Pfa4cB1Zd3GeqPEIJc。即使未提及，用例考虑也不是无关紧要的。 - uckelman

嗨@uckelman，干杯，已经有大量的讨论指出了明显的问题；蓄水池抽样仅在某些领域（实际上，在维基百科文章的第二个当前句子中完全概述）中有用。所提出的问题特别是关于C#中的List<string>，用户特别想要一个快速简单的解决方案。（显然，答案是排序并取五个。如果您在领域内做任何其他事情，那将是极其糟糕的工程。请注意，当然可以编造... - Fattie

...正在使用Hadoop和GPU或其他技术，然后在那个领域里，您需要分析哪种（正如您所说）蓄水池抽样方法（其中包括许多方法，并且仍在不断研究中）最好。更具体地说，看看选定的“答案”。假设这是一个实际项目，比如Nintendo的游戏团队之类的。场地上有“40”辆坦克，必须随机选择5辆。其中一名程序员开始编写该解决方案-他们只会被立即解雇！天哪。不当的工程设计是极其糟糕的工程设计。 - Fattie

显示剩余9条评论

35个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- drzaus · Answer 1

我看到@JohnShedletsky在有关已接受答案的评论时提到了以下内容(转述)：

你应该能够以O(subset.Length)的时间复杂度而不是O(originalList.Length)来完成这个操作。

基本上，您应该能够生成 subset 随机索引，然后从原始列表中获取它们。

方法

public static class EnumerableExtensions {

    public static Random randomizer = new Random(); // you'd ideally be able to replace this with whatever makes you comfortable

    public static IEnumerable<T> GetRandom<T>(this IEnumerable<T> list, int numItems) {
        return (list as T[] ?? list.ToArray()).GetRandom(numItems);

        // because ReSharper whined about duplicate enumeration...
        /*
        items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--;
        */
    }

    // just because the parentheses were getting confusing
    public static IEnumerable<T> GetRandom<T>(this T[] list, int numItems) {
        var items = new HashSet<T>(); // don't want to add the same item twice; otherwise use a list
        while (numItems > 0 )
            // if we successfully added it, move on
            if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--;

        return items;
    }

    // and because it's really fun; note -- you may get repetition
    public static IEnumerable<T> PluckRandomly<T>(this IEnumerable<T> list) {
        while( true )
            yield return list.ElementAt(randomizer.Next(list.Count()));
    }

}

如果您希望更加高效，您可能会使用索引的HashSet而不是实际列表元素（如果您具有复杂类型或昂贵的比较）;

单元测试

为了确保我们没有任何冲突等。

[TestClass]
public class RandomizingTests : UnitTestBase {
    [TestMethod]
    public void GetRandomFromList() {
        this.testGetRandomFromList((list, num) => list.GetRandom(num));
    }

    [TestMethod]
    public void PluckRandomly() {
        this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false);
    }

    private void testGetRandomFromList(Func<IEnumerable<int>, int, IEnumerable<int>> methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) {
        var items = Enumerable.Range(0, 100);
        IEnumerable<int> randomItems = null;

        while( repetitions-- > 0 ) {
            randomItems = methodToGetRandomItems(items, numToTake);
            Assert.AreEqual(numToTake, randomItems.Count(),
                            "Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions);
            if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(),
                            "Collisions (non-unique values) found, failed at {0} repetition--", repetitions);
            Assert.IsTrue(randomItems.All(o => items.Contains(o)),
                        "Some unknown values found; failed at {0} repetition--", repetitions);
        }
    }
}

- Jesús López · Answer 2

这里有一种基于费舍尔-耶茨洗牌算法的实现，其算法复杂度为O(n)，其中n是子集或样本大小，而不是列表大小，正如John Shedletsky所指出的那样。

public static IEnumerable<T> GetRandomSample<T>(this IList<T> list, int sampleSize)
{
    if (list == null) throw new ArgumentNullException("list");
    if (sampleSize > list.Count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
    var indices = new Dictionary<int, int>(); int index;
    var rnd = new Random();

    for (int i = 0; i < sampleSize; i++)
    {
        int j = rnd.Next(i, list.Count);
        if (!indices.TryGetValue(j, out index)) index = j;

        yield return list[index];

        if (!indices.TryGetValue(i, out index)) index = i;
        indices[j] = index;
    }
}

- nawfal · Answer 3

从一组中选择 N 个随机项目与顺序无关！随机性是指不可预测性，而不是在组中洗牌位置。所有涉及某种排序的答案都比不涉及排序的答案低效。由于效率是关键，我将发布一些不会太大程度改变项目顺序的内容。

1）如果您需要真正的随机值，这意味着可以任选元素（即，已选项目可以重新选择）：

public static List<T> GetTrueRandom<T>(this IList<T> source, int count, 
                                       bool throwArgumentOutOfRangeException = true)
{
    if (throwArgumentOutOfRangeException && count > source.Count)
        throw new ArgumentOutOfRangeException();

    var randoms = new List<T>(count);
    randoms.AddRandomly(source, count);
    return randoms;
}

如果你关闭异常标志，那么你可以任意次数选择随机项目。

例如，如果你有 { 1, 2, 3, 4 }，则可以得到 { 1, 4, 4 }、{ 1, 4, 3 } 等 3 个项目，甚至可以得到 5 个项目的 { 1, 4, 3, 2, 4 }！

这应该非常快，因为它没有需要检查的内容。

如果你需要从组中获取独立成员且不重复，则我会依靠字典（正如许多人已经指出的那样）。

public static List<T> GetDistinctRandom<T>(this IList<T> source, int count)
{
    if (count > source.Count)
        throw new ArgumentOutOfRangeException();

    if (count == source.Count)
        return new List<T>(source);

    var sourceDict = source.ToIndexedDictionary();

    if (count > source.Count / 2)
    {
        while (sourceDict.Count > count)
            sourceDict.Remove(source.GetRandomIndex());

        return sourceDict.Select(kvp => kvp.Value).ToList();
    }

    var randomDict = new Dictionary<int, T>(count);
    while (randomDict.Count < count)
    {
        int key = source.GetRandomIndex();
        if (!randomDict.ContainsKey(key))
            randomDict.Add(key, sourceDict[key]);
    }

    return randomDict.Select(kvp => kvp.Value).ToList();
}

这段代码比其他字典方法要长一些，因为我不仅要添加，还要从列表中删除，所以有点像两个循环。当计数等于源计数时，可以看到我没有重新排序任何内容。这是因为我认为随机性应该在整个返回集中。我的意思是，如果你想从1、2、3、4、5中随机选择5个项目，那么它不应该是1、3、4、2、5或1、2、3、4、5无所谓，但如果你需要从同一组中选择4个项目，那么它应该以不可预测的方式产生1、2、3、4、1、3、5、2、2、3、5、4等。其次，当要返回的随机项数超过原始组的一半时，从组中删除source.Count - count项比添加count项更容易。出于性能考虑，我使用了source而不是sourceDict来获取删除方法中的随机索引。

如果你有 { 1, 2, 3, 4 }，那么这可以变成 { 1, 2, 3 }，{ 3, 4, 1 }等等，对于3个项目。

3) 如果你需要真正独特的随机值，考虑原始组中的重复项，那么你可以使用与上面相同的方法，但是 HashSet 比字典更轻巧。

public static List<T> GetTrueDistinctRandom<T>(this IList<T> source, int count, 
                                               bool throwArgumentOutOfRangeException = true)
{
    if (count > source.Count)
        throw new ArgumentOutOfRangeException();

    var set = new HashSet<T>(source);

    if (throwArgumentOutOfRangeException && count > set.Count)
        throw new ArgumentOutOfRangeException();

    List<T> list = hash.ToList();

    if (count >= set.Count)
        return list;

    if (count > set.Count / 2)
    {
        while (set.Count > count)
            set.Remove(list.GetRandom());

        return set.ToList();
    }

    var randoms = new HashSet<T>();
    randoms.AddRandomly(list, count);
    return randoms.ToList();
}

randoms变量被设置为HashSet，以避免在输入列表很小的情况下Random.Next产生相同值的稀有情况下重复添加。

所以{1, 2, 2, 4} => 3个随机项 => {1, 2, 4} 而不是 {1, 2, 2}

{1, 2, 2, 4} => 4个随机项 => 异常！！或{1, 2, 4}取决于设置的标志。

我使用了一些扩展方法：

static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
    return rnd.Next(source.Count);
}

public static T GetRandom<T>(this IList<T> source)
{
    return source[source.GetRandomIndex()];
}

static void AddRandomly<T>(this ICollection<T> toCol, IList<T> fromList, int count)
{
    while (toCol.Count < count)
        toCol.Add(fromList.GetRandom());
}

public static Dictionary<int, T> ToIndexedDictionary<T>(this IEnumerable<T> lst)
{
    return lst.ToIndexedDictionary(t => t);
}

public static Dictionary<int, T> ToIndexedDictionary<S, T>(this IEnumerable<S> lst, 
                                                           Func<S, T> valueSelector)
{
    int index = -1;
    return lst.ToDictionary(t => ++index, valueSelector);
}

如果需要对列表中数以万计的项目进行10000次迭代的性能优化，那么你可能希望使用比System.Random更快的随机类, 但我认为这并不重要，因为后者很可能永远不会成为瓶颈，它已经足够快了。 编辑：如果您还需要重新排列返回项目的顺序，则没有什么能击败dhakim的Fisher-Yates方法 - 简单明了。

- hardsetting · Answer 4

延伸自 @ers 的回答，如果担心 OrderBy 的实现可能不同，可以使用以下代码来确保安全：

// Instead of this
YourList.OrderBy(x => rnd.Next()).Take(5)

// Temporarily transform 
YourList
    .Select(v => new {v, i = rnd.Next()}) // Associate a random index to each entry
    .OrderBy(x => x.i).Take(5) // Sort by (at this point fixed) random index 
    .Select(x => x.v); // Go back to enumerable of entry

- Tine M. · Answer 5

我使用的简单解决方案（对于大型列表可能不是很好）：将列表复制到临时列表中，然后在循环中随机从临时列表中选择一个项目，并将其放入所选项目列表中，同时从临时列表中删除它（因此不能重新选择）。

示例：

List<Object> temp = OriginalList.ToList();
List<Object> selectedItems = new List<Object>();
Random rnd = new Random();
Object o;
int i = 0;
while (i < NumberOfSelectedItems)
{
            o = temp[rnd.Next(temp.Count)];
            selectedItems.Add(o);
            temp.Remove(o);
            i++;
 }

- Ian S · Answer 6

这是我在第一次尝试中能够想到的最好翻译：

public List<String> getRandomItemsFromList(int returnCount, List<String> list)
{
    List<String> returnList = new List<String>();
    Dictionary<int, int> randoms = new Dictionary<int, int>();

    while (randoms.Count != returnCount)
    {
        //generate new random between one and total list count
        int randomInt = new Random().Next(list.Count);

        // store this in dictionary to ensure uniqueness
        try
        {
            randoms.Add(randomInt, randomInt);
        }
        catch (ArgumentException aex)
        {
            Console.Write(aex.Message);
        } //we can assume this element exists in the dictonary already 

        //check for randoms length and then iterate through the original list 
        //adding items we select via random to the return list
        if (randoms.Count == returnCount)
        {
            foreach (int key in randoms.Keys)
                returnList.Add(list[randoms[key]]);

            break; //break out of _while_ loop
        }
    }

    return returnList;
}

使用在1-总列表计数范围内的随机列表，然后简单地提取该列表中的那些项目似乎是最好的方法，但是我仍在考虑使用字典来确保唯一性。

还要注意我使用了一个字符串列表，根据需要进行替换。

- Alex Gilbert · Answer 7

这种方法可能与Kyle的方法相等。

假设您的列表大小为n，您想要k个元素。

Random rand = new Random();
for(int i = 0; k>0; ++i) 
{
    int r = rand.Next(0, n-i);
    if(r<k) 
    {
        //include element i
        k--;
    }
}

效果非常好 :)

-Alex Gilbert

- Leaky · Answer 8

这里是三种不同方法的基准测试：

来自Kyle的被接受答案的实现。
一种基于随机索引选择和HashSet重复过滤的方法，来自drzaus。
一种更学术化的方法，由Jesús López发布，称为Fisher–Yates shuffle。

测试将包括对多个不同列表大小和选择大小进行性能基准测试。

我还包括了这三种方法的标准差测量，即随机选择的分布情况有多好。

简而言之，从这三个方面来看，drzaus的简单解决方案似乎是最好的。选定的答案很好，很优雅，但效率并不高，因为时间复杂度是基于样本大小而不是选择大小的。因此，如果您从长列表中选择少量项目，它将需要数量级更多的时间。当然，它仍然比基于完全重新排序的解决方案表现得更好。

有趣的是，即使您只在实际返回项目时触摸列表，例如在我的实现中，这个O（n）时间复杂度问题也是真实的。我唯一能想到的是Random.Next()非常慢，如果您为每个选定的项目生成一个随机数，则会获得性能优势。

同样有趣的是，Kyle的解决方案的标准差相对较高。我不知道为什么；可能是我的实现有问题。

很抱歉代码和输出很长，但我希望它有些启发性。如果您发现测试或实现中有任何问题，请告诉我，我会修复它。

static void Main()
{
    BenchmarkRunner.Run<Benchmarks>();

    new Benchmarks() { ListSize = 100, SelectionSize = 10 }
        .BenchmarkStdDev();
}

[MemoryDiagnoser]
public class Benchmarks
{
    [Params(50, 500, 5000)]
    public int ListSize;

    [Params(5, 10, 25, 50)]
    public int SelectionSize;

    private Random _rnd;
    private List<int> _list;
    private int[] _hits;

    [GlobalSetup]
    public void Setup()
    {
        _rnd = new Random(12345);
        _list = Enumerable.Range(0, ListSize).ToList();
        _hits = new int[ListSize];
    }

    [Benchmark]
    public void Test_IterateSelect()
        => Random_IterateSelect(_list, SelectionSize).ToList();

    [Benchmark]
    public void Test_RandomIndices() 
        => Random_RandomIdices(_list, SelectionSize).ToList();

    [Benchmark]
    public void Test_FisherYates() 
        => Random_FisherYates(_list, SelectionSize).ToList();

    public void BenchmarkStdDev()
    {
        RunOnce(Random_IterateSelect, "IterateSelect");
        RunOnce(Random_RandomIdices, "RandomIndices");
        RunOnce(Random_FisherYates, "FisherYates");

        void RunOnce(Func<IEnumerable<int>, int, IEnumerable<int>> method, string methodName)
        {
            Setup();
            for (int i = 0; i < 1000000; i++)
            {
                var selected = method(_list, SelectionSize).ToList();
                Debug.Assert(selected.Count() == SelectionSize);
                foreach (var item in selected) _hits[item]++;
            }
            var stdDev = GetStdDev(_hits);
            Console.WriteLine($"StdDev of {methodName}: {stdDev :n} (% of average: {stdDev / (_hits.Average() / 100) :n})");
        }

        double GetStdDev(IEnumerable<int> hits)
        {
            var average = hits.Average();
            return Math.Sqrt(hits.Average(v => Math.Pow(v - average, 2)));
        }
    }

    public IEnumerable<T> Random_IterateSelect<T>(IEnumerable<T> collection, int needed)
    {
        var count = collection.Count();
        for (int i = 0; i < count; i++)
        {
            if (_rnd.Next(count - i) < needed)
            {
                yield return collection.ElementAt(i);
                if (--needed == 0)
                    yield break;
            }
        }
    }

    public IEnumerable<T> Random_RandomIdices<T>(IEnumerable<T> list, int needed)
    {
        var selectedItems = new HashSet<T>();
        var count = list.Count();

        while (needed > 0)
            if (selectedItems.Add(list.ElementAt(_rnd.Next(count))))
                needed--;

        return selectedItems;
    }

    public IEnumerable<T> Random_FisherYates<T>(IEnumerable<T> list, int sampleSize)
    {
        var count = list.Count();
        if (sampleSize > count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
        var indices = new Dictionary<int, int>(); int index;

        for (int i = 0; i < sampleSize; i++)
        {
            int j = _rnd.Next(i, count);
            if (!indices.TryGetValue(j, out index)) index = j;

            yield return list.ElementAt(index);

            if (!indices.TryGetValue(i, out index)) index = i;
            indices[j] = index;
        }
    }
}

输出：

|        Method | ListSize | Select |        Mean |     Error |    StdDev |  Gen 0 | Allocated |
|-------------- |--------- |------- |------------:|----------:|----------:|-------:|----------:|
| IterateSelect |       50 |      5 |    711.5 ns |   5.19 ns |   4.85 ns | 0.0305 |     144 B |
| RandomIndices |       50 |      5 |    341.1 ns |   4.48 ns |   4.19 ns | 0.0644 |     304 B |
|   FisherYates |       50 |      5 |    573.5 ns |   6.12 ns |   5.72 ns | 0.0944 |     447 B |

| IterateSelect |       50 |     10 |    967.2 ns |   4.64 ns |   3.87 ns | 0.0458 |     220 B |
| RandomIndices |       50 |     10 |    709.9 ns |  11.27 ns |   9.99 ns | 0.1307 |     621 B |
|   FisherYates |       50 |     10 |  1,204.4 ns |  10.63 ns |   9.94 ns | 0.1850 |     875 B |

| IterateSelect |       50 |     25 |  1,358.5 ns |   7.97 ns |   6.65 ns | 0.0763 |     361 B |
| RandomIndices |       50 |     25 |  1,958.1 ns |  15.69 ns |  13.91 ns | 0.2747 |    1298 B |
|   FisherYates |       50 |     25 |  2,878.9 ns |  31.42 ns |  29.39 ns | 0.3471 |    1653 B |

| IterateSelect |       50 |     50 |  1,739.1 ns |  15.86 ns |  14.06 ns | 0.1316 |     629 B |
| RandomIndices |       50 |     50 |  8,906.1 ns |  88.92 ns |  74.25 ns | 0.5951 |    2848 B |
|   FisherYates |       50 |     50 |  4,899.9 ns |  38.10 ns |  33.78 ns | 0.4349 |    2063 B |

| IterateSelect |      500 |      5 |  4,775.3 ns |  46.96 ns |  41.63 ns | 0.0305 |     144 B |
| RandomIndices |      500 |      5 |    327.8 ns |   2.82 ns |   2.50 ns | 0.0644 |     304 B |
|   FisherYates |      500 |      5 |    558.5 ns |   7.95 ns |   7.44 ns | 0.0944 |     449 B |

| IterateSelect |      500 |     10 |  5,387.1 ns |  44.57 ns |  41.69 ns | 0.0458 |     220 B |
| RandomIndices |      500 |     10 |    648.0 ns |   9.12 ns |   8.54 ns | 0.1307 |     621 B |
|   FisherYates |      500 |     10 |  1,154.6 ns |  13.66 ns |  12.78 ns | 0.1869 |     889 B |

| IterateSelect |      500 |     25 |  6,442.3 ns |  48.90 ns |  40.83 ns | 0.0763 |     361 B |
| RandomIndices |      500 |     25 |  1,569.6 ns |  15.79 ns |  14.77 ns | 0.2747 |    1298 B |
|   FisherYates |      500 |     25 |  2,726.1 ns |  25.32 ns |  22.44 ns | 0.3777 |    1795 B |

| IterateSelect |      500 |     50 |  7,775.4 ns |  35.47 ns |  31.45 ns | 0.1221 |     629 B |
| RandomIndices |      500 |     50 |  2,976.9 ns |  27.11 ns |  24.03 ns | 0.6027 |    2848 B |
|   FisherYates |      500 |     50 |  5,383.2 ns |  36.49 ns |  32.35 ns | 0.8163 |    3870 B |

| IterateSelect |     5000 |      5 | 45,208.6 ns | 459.92 ns | 430.21 ns |      - |     144 B |
| RandomIndices |     5000 |      5 |    328.7 ns |   5.15 ns |   4.81 ns | 0.0644 |     304 B |
|   FisherYates |     5000 |      5 |    556.1 ns |  10.75 ns |  10.05 ns | 0.0944 |     449 B |

| IterateSelect |     5000 |     10 | 49,253.9 ns | 420.26 ns | 393.11 ns |      - |     220 B |
| RandomIndices |     5000 |     10 |    642.9 ns |   4.95 ns |   4.13 ns | 0.1307 |     621 B |
|   FisherYates |     5000 |     10 |  1,141.9 ns |  12.81 ns |  11.98 ns | 0.1869 |     889 B |

| IterateSelect |     5000 |     25 | 54,044.4 ns | 208.92 ns | 174.46 ns | 0.0610 |     361 B |
| RandomIndices |     5000 |     25 |  1,480.5 ns |  11.56 ns |  10.81 ns | 0.2747 |    1298 B |
|   FisherYates |     5000 |     25 |  2,713.9 ns |  27.31 ns |  24.21 ns | 0.3777 |    1795 B |

| IterateSelect |     5000 |     50 | 54,418.2 ns | 329.62 ns | 308.32 ns | 0.1221 |     629 B |
| RandomIndices |     5000 |     50 |  2,886.4 ns |  36.53 ns |  34.17 ns | 0.6027 |    2848 B |
|   FisherYates |     5000 |     50 |  5,347.2 ns |  59.45 ns |  55.61 ns | 0.8163 |    3870 B |

StdDev of IterateSelect: 671.88 (% of average: 0.67)
StdDev of RandomIndices: 296.07 (% of average: 0.30)
StdDev of FisherYates: 280.47 (% of average: 0.28)

- Tom Gullen · Answer 9

根据Kyle的答案，这是我的c#实现。

/// <summary>
/// Picks random selection of available game ID's
/// </summary>
private static List<int> GetRandomGameIDs(int count)
{       
    var gameIDs = (int[])HttpContext.Current.Application["NonDeletedArcadeGameIDs"];
    var totalGameIDs = gameIDs.Count();
    if (count > totalGameIDs) count = totalGameIDs;

    var rnd = new Random();
    var leftToPick = count;
    var itemsLeft = totalGameIDs;
    var arrPickIndex = 0;
    var returnIDs = new List<int>();
    while (leftToPick > 0)
    {
        if (rnd.Next(0, itemsLeft) < leftToPick)
        {
            returnIDs .Add(gameIDs[arrPickIndex]);
            leftToPick--;
        }
        arrPickIndex++;
        itemsLeft--;
    }

    return returnIDs ;
}

- Jesse Gador · Answer 10

目标：从集合源中选择N个项目，不重复。我为任何通用集合创建了一个扩展。以下是我的方法：

public static class CollectionExtension
{
    public static IList<TSource> RandomizeCollection<TSource>(this IList<TSource> source, int maxItems)
    {
        int randomCount = source.Count > maxItems ? maxItems : source.Count;
        int?[] randomizedIndices = new int?[randomCount];
        Random random = new Random();

        for (int i = 0; i < randomizedIndices.Length; i++)
        {
            int randomResult = -1;
            while (randomizedIndices.Contains((randomResult = random.Next(0, source.Count))))
            {
                //0 -> since all list starts from index 0; source.Count -> maximum number of items that can be randomize
                //continue looping while the generated random number is already in the list of randomizedIndices
            }

            randomizedIndices[i] = randomResult;
        }

        IList<TSource> result = new List<TSource>();
        foreach (int index in randomizedIndices)
            result.Add(source.ElementAt(index));

        return result;
    }
}