加权选择简单易懂

Question

加权选择简单易懂

pythonnumpy

52

如果我有一个列表中的项目集合，我想根据另一个权重列表从该列表中进行选择。

例如，我的集合是['one'，'two'，'three']，权重为[0.2,0.3,0.5]，那么我希望该方法在大约一半的抽样中给我 'three'。

最简单的方法是什么？

- Mischa Obrecht

7个回答

42

自Python 3.6起，您可以使用random.choices进行带权重的随机选择（有替换）。

random.choices(population, weights=None, *, cum_weights=None, k=1)

示例用法：

import random
random.choices(['one', 'two', 'three'], [0.2, 0.3, 0.5], k=10)
# ['three', 'two', 'three', 'three', 'three',
#  'three', 'three', 'two', 'two', 'one']

- Esteis

11

这个函数有两个参数：一个权重列表和一个包含要选择的对象的列表：

from numpy import cumsum
from numpy.random import rand
def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    cs = cumsum(weights) #An array of the weights, cumulatively summed.
    idx = sum(cs < rand()) #Find the index of the first weight over a random value.
    return objects[idx]

它不使用任何Python循环。

- Mischa Obrecht

2

评论似乎是误导性的。cumsum()给出的是累积值，而不是布尔值。要明确一点，这确实有效，但评论与实际发生的情况不符。 - Gareth Latty

我已经进行了编辑以修复问题，并按照PEP 257的建议将文档字符串放在一行上。 - Gareth Latty

1

假设权重是正的，cs 是一个已排序的列表。使用 numpy.searchsorted 将会显著加快查找索引的速度。 - Nick R

5

你可以使用numpy中的多项式分布（multinomial distribution）来实现你想要的功能。例如：

elements = ['one', 'two', 'three'] 
weights = [0.2, 0.3, 0.5]


import numpy as np

indices = np.random.multinomial( 100, weights, 1)
#=> array([[20, 32, 48]]), YMMV

results = [] #A list of the original items, repeated the correct number of times.
for i, count in enumerate(indices[0]):
    results.extend( [elements[i]]*count )

第一个位置的元素出现了20次，第二个位置的元素出现了32次，第三个位置的元素出现了48次，这与权重给出的大致相符。

如果您对多项式分布感到困惑，我发现文档非常有帮助。

- Maus

2

иЇ·жіЁж„ЏпјЊж‚ЁеЏЇд»Ґе°†з»“жћњзљ„жћ„е»єе‡Џе°‘е€°itertools.chain.from_iterable([elements[i]]*count, for i, count in enumerate(indices[0]))пјЊиї™ж ·дјљж›ґеї«гЂ‚ - Gareth Latty

1

实际上，你甚至可以通过用 itertools.repeat(elements[i], count) 替换列表乘法来进一步改进它。 - Gareth Latty

4

如果您不想使用numpy，您可以使用类似以下代码的方法：

from random import random
from itertools import takewhile

def accumulate(iterator):
    """Returns a cumulative sum of the elements.
    accumulate([1, 2, 3, 4, 5]) --> 1 3 6 10 15"""
    current = 0
    for value in iterator:
        current += value
        yield current

def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    limit = random()
    return objects[sum(takewhile(bool, (value < limit for value in accumulate(weights))))]

我们使用 itertools.takewhile() 来避免在达到我们想要停止的点之后再去检查值，否则，这基本上与 Mischa Obrecht 的答案相同，只是没有使用 numpy。

- Gareth Latty

2

你可以将列表初始化为与期望权重匹配的选择。这里我创建了一个包含100个值的列表，代表你想要的“拉力”百分比。

>>> import random
>>> elements = ['one', 'two', 'three'] 
>>> weights = [0.2, 0.3, 0.5]
>>>
>>> # get "sum" of result list of lists (flattens list)
>>> choices = sum([[element] * int(weight * 100)for element, weight in zip(elements, weights)], [])
>>> random.choice(choices)
three

这并不是累积的，但它看起来可能是你想要的。

- monkut

看起来它具有相同的效果，但为了进行选择而分配一个3*100向量似乎有点过度。特别是如果我要在问题首次出现的上下文中使用它，即蒙特卡罗模拟，那么您希望尽可能快... - Mischa Obrecht

你应该把这个信息添加到问题中。然而，你只分配了一次列表，调用"random.choice()"的速度会很快。 - monkut

是的，但我想说，如果有一种便宜的方法和一种昂贵的方法可以实现相同的结果，那么毫无疑问，人们会选择便宜的方法。裁判的裁决呢？ :) - Mischa Obrecht

1

在Maus的回答的基础上，如果您想要重复获取加权随机值，则非常好。如果您只想要一个单一的值，您可以通过结合numpy.random.multinomial()和itertools.compress()来实现：

from itertools import compress
from numpy.random import multinomial

def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    return next(compress(objects, multinomial(1, weights, 1)[0]))

- Gareth Latty

@aix 不小心覆盖了您的编辑，回滚到您更好的链接。 - Gareth Latty

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Quant Metropolis · Accepted Answer

自从 numpy 版本1.7起，您可以使用numpy.random.choice()函数：

elements = ['one', 'two', 'three'] 
weights = [0.2, 0.3, 0.5]

from numpy.random import choice
print(choice(elements, p=weights))