我需要一个快速的算法从一个通用列表中选择5个随机元素。例如,我想从一个List<string>
中获取5个随机元素。
我需要一个快速的算法从一个通用列表中选择5个随机元素。例如,我想从一个List<string>
中获取5个随机元素。
使用LINQ:
YourList.OrderBy(x => rnd.Next()).Take(5)
YourList
中有很多项,但你只想选择其中的几个,请注意。在这种情况下,这不是一种有效的方法。 - Callum Watkinspublic static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount)
{
return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList();
}
Guid.NewGuid()
并不能保证随机性,只能保证唯一性。 - Enigmativity实际上这比听起来的要难,主要是因为许多在数学上正确的解决方案实际上无法让您命中所有可能性(稍后详细说明)。
首先,以下是一些易于实现、正确(如果您有一个真正随机的数字生成器)的方法:
(0) Kyle的答案,其时间复杂度为O(n)。
(1) 生成一个由n对[(0, rand), (1, rand), (2, rand), ...]组成的列表,按第二个坐标排序,并使用前k个指数来获取您的随机子集。 我认为这很容易实现,尽管它的时间复杂度为O(n log n)。
(2) 初始化一个空列表s = [],它将增长到k个随机元素的索引。 随机选择一个数字r,在{0, 1, 2,...,n-1}中,即 r = rand%n,并将其添加到s中。接下来取r = rand%(n-1)并插入s;添加比它小的元素数目到r中以避免冲突。接下来取r = rand%(n-2),然后以此类推,直到s中有k个不同的元素。这最坏的运行时间为O(k ^ 2)。因此,对于k << n,这可能更快。如果您保持s排序并跟踪它具有的连续间隔,可以在O(k log k)中实现它,但需要更多的工作。
@Kyle-你是对的,仔细想想我同意你的答案。 我一开始匆忙阅读了它,并错误地认为您正在表明要顺序选择每个元素并使用固定概率k / n,这是错误的,但您的自适应方法对我来说似乎是正确的。 抱歉。
好的,现在最关键的部分:渐近地(对于固定的k,n增长),从n个元素中选出k个元素组合的数量为n ^ k / k!(这是(n choose k)的近似值)。 如果n很大,而k不是非常小,则这些数字是巨大的。 任何标准32位随机数生成器中最好的循环长度是2 ^ 32 = 256 ^ 4。 因此,如果我们有一个由1000个元素组成的列表,并且我们想随机选择5个元素,那么没有标准的随机数生成器会命中所有可能性。 但是,只要您满意适用于较小集合的选择,并且始终“看起来”随机,则这些算法应该就可以了。
附录:在编写此答案后,我意识到正确实现Idea(2)很棘手,因此我想澄清此答案。 为了获得O(k log k)时间,您需要一个类似数组的结构,支持O(log m)搜索和插入-平衡二叉树可以做到这一点。 使用这样的结构来构建名为s的数组,以下是一些伪代码:
# Returns a container s with k distinct random numbers from {0, 1, ..., n-1}
def ChooseRandomSubset(n, k):
for i in range(k):
r = UniformRandom(0, n-i) # May be 0, must be < n-i
q = s.FirstIndexSuchThat( s[q] - q > r ) # This is the search.
s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q.
return s
我建议运行一些示例用例,以查看如何有效地实现上述英文说明。
我认为所选答案是正确而简洁的。不过我实现它时有所不同,因为我也想要结果是随机排序的。
static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>(
IEnumerable<SomeType> someTypes,
int maxCount)
{
Random random = new Random(DateTime.Now.Millisecond);
Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>();
foreach(SomeType someType in someTypes)
randomSortTable[random.NextDouble()] = someType;
return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value);
}
Random random = new Random(DateTime.Now.Millisecond);
都是绝对错误的。每次创建一个新的 Random
实例都会降低实际随机性。使用一个 static readonly
的实例,最好是使用默认构造函数构造它。 - jpmc26我刚遇到了这个问题,通过谷歌搜索后,我找到了关于随机打乱列表的问题:http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
如果想要完全地随机打乱你的列表(就地操作),可以按照以下步骤进行:
要打乱一个由n个元素组成的数组a(索引为0..n-1):
for i from n − 1 downto 1 do
j ← random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
如果你只需要前5个元素,那么不必从n-1一直运行到1,只需将i运行到n-5 (即n-5)即可。 for (i = n − 1; i >= n-k; i--)
{
j = random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
}
每个被选择的项目都会向数组末尾交换,所以选出的k个元素就是数组中的最后k个元素。
这需要O(k)时间,其中k是你需要随机选取的元素数量。
另外,如果您不想修改初始列表,可以将所有交换记录在一个临时列表中,然后反转该列表,并再次应用它们,从而执行反向集合的交换并返回初始列表,而不改变O(k)的运行时间。
最后,如果(n == k),应该在1处停止而不是n-k,因为随机选择的整数总是0。
已经过去了12年,这个问题仍然存在,我没有找到一个我喜欢的Kyle解决方案的实现方式,所以在这里呈现:
public IEnumerable<T> TakeRandom<T>(IEnumerable<T> collection, int take)
{
var random = new Random();
var available = collection.Count();
var needed = take;
foreach (var item in collection)
{
if (random.Next(available) < needed)
{
needed--;
yield return item;
if (needed == 0)
{
break;
}
}
available--;
}
}
.AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5);
以下是C#语言的一种解释,来自算法中的龙:
int k = 10; // items to select
var items = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
var selected = new List<int>();
double needed = k;
double available = items.Count;
var rand = new Random();
while (selected.Count < k) {
if( rand.NextDouble() < needed / available ) {
selected.Add(items[(int)available-1])
needed--;
}
available--;
}
这个算法将选择项目列表中唯一的索引。
var
导致 needed
和 available
都是整数,这使得 needed/available
总是为0。 - Niko /// <summary>
/// Takes k elements from the next n elements at random, preserving their order.
///
/// If there are fewer than n elements in items, this may return fewer than k elements.
/// </summary>
/// <typeparam name="TElem">Type of element in the items collection.</typeparam>
/// <param name="items">Items to be randomly selected.</param>
/// <param name="k">Number of items to pick.</param>
/// <param name="n">Total number of items to choose from.
/// If the items collection contains more than this number, the extra members will be skipped.
/// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned.</param>
/// <returns>Enumerable over the retained items.
///
/// See https://dev59.com/tXVD5IYBdhLWcg3wOo5h for the commentary.
/// </returns>
public static IEnumerable<TElem> TakeRandom<TElem>(this IEnumerable<TElem> items, int k, int n)
{
var r = new FastRandom();
var itemsList = items as IList<TElem>;
if (k >= n || (itemsList != null && k >= itemsList.Count))
foreach (var item in items) yield return item;
else
{
// If we have a list, we can infer more information and choose a better algorithm.
// When using an IList, this is about 7 times faster (on one benchmark)!
if (itemsList != null && k < n/2)
{
// Since we have a List, we can use an algorithm suitable for Lists.
// If there are fewer than n elements, reduce n.
n = Math.Min(n, itemsList.Count);
// This algorithm picks K index-values randomly and directly chooses those items to be selected.
// If k is more than half of n, then we will spend a fair amount of time thrashing, picking
// indices that we have already picked and having to try again.
var invertSet = k >= n/2;
var positions = invertSet ? (ISet<int>) new HashSet<int>() : (ISet<int>) new SortedSet<int>();
var numbersNeeded = invertSet ? n - k : k;
while (numbersNeeded > 0)
if (positions.Add(r.Next(0, n))) numbersNeeded--;
if (invertSet)
{
// positions contains all the indices of elements to Skip.
for (var itemIndex = 0; itemIndex < n; itemIndex++)
{
if (!positions.Contains(itemIndex))
yield return itemsList[itemIndex];
}
}
else
{
// positions contains all the indices of elements to Take.
foreach (var itemIndex in positions)
yield return itemsList[itemIndex];
}
}
else
{
// Since we do not have a list, we will use an online algorithm.
// This permits is to skip the rest as soon as we have enough items.
var found = 0;
var scanned = 0;
foreach (var item in items)
{
var rand = r.Next(0,n-scanned);
if (rand < k - found)
{
yield return item;
found++;
}
scanned++;
if (found >= k || scanned >= n)
break;
}
}
}
}
我使用了一种专门的随机数生成器,但如果您愿意,您可以使用C#的Random。(FastRandom是由Colin Green编写的,是SharpNEAT的一部分。它具有2^128-1的周期,比许多随机数生成器更好。)
以下是单元测试:
[TestClass]
public class TakeRandomTests
{
/// <summary>
/// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability.
/// </summary>
[TestMethod]
public void TakeRandom_Array_Uniformity()
{
const int numTrials = 2000000;
const int expectedCount = numTrials/20;
var timesChosen = new int[100];
var century = new int[100];
for (var i = 0; i < century.Length; i++)
century[i] = i;
for (var trial = 0; trial < numTrials; trial++)
{
foreach (var i in century.TakeRandom(5, 100))
timesChosen[i]++;
}
var avg = timesChosen.Average();
var max = timesChosen.Max();
var min = timesChosen.Min();
var allowedDifference = expectedCount/100;
AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average");
//AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min");
//AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max");
var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
}
/// <summary>
/// Ensure that when randomly choosing items from an IEnumerable that is not an IList,
/// all items are chosen with roughly equal probability.
/// </summary>
[TestMethod]
public void TakeRandom_IEnumerable_Uniformity()
{
const int numTrials = 2000000;
const int expectedCount = numTrials / 20;
var timesChosen = new int[100];
for (var trial = 0; trial < numTrials; trial++)
{
foreach (var i in Range(0,100).TakeRandom(5, 100))
timesChosen[i]++;
}
var avg = timesChosen.Average();
var max = timesChosen.Max();
var min = timesChosen.Min();
var allowedDifference = expectedCount / 100;
var countInRange =
timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
}
private IEnumerable<int> Range(int low, int count)
{
for (var i = low; i < low + count; i++)
yield return i;
}
private static void AssertBetween(int x, int low, int high, String message)
{
Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
}
private static void AssertBetween(double x, double low, double high, String message)
{
Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
}
}
if (itemsList != null && k < n/2)
意味着在 if
语句内部,invertSet
总是为 false
,这意味着该逻辑永远不会被使用。 - NetMage
C#
中的List<string>
,用户特别想要一个快速简单的解决方案。(显然,答案是排序并取五个。如果您在领域内做任何其他事情,那将是极其糟糕的工程。请注意,当然可以编造... - Fattie