使用重复的索引递增Numpy数组

Question

使用重复的索引递增Numpy数组

26

我有一个Numpy数组和一个索引列表，我想将其值加1。该列表可能包含重复的索引，并且我希望增量随每个索引的重复次数而缩放。如果没有重复，命令很简单：

a=np.zeros(6).astype('int')
b=[3,2,5]
a[b]+=1

有重复的情况下，我想到了以下方法。

b=[3,2,5,2]                     # indices to increment by one each replicate
bbins=np.bincount(b)
b.sort()                        # sort b because bincount is sorted
incr=bbins[np.nonzero(bbins)]   # create increment array
bu=np.unique(b)                 # sorted, unique indices (len(bu)=len(incr))
a[bu]+=incr

这是最好的方法吗？假设使用np.bincount和np.unique操作会得到相同的排序顺序，是否存在风险？我是否错过了一些用于解决此问题的简单Numpy操作？

- fideli

1

请注意，numpy.zeros(6).astype('int') 最好写成 numpy.zeros(6, int)。 - Eric O. Lebigot

3个回答

6

在您执行操作之后

bbins=np.bincount(b)

为什么不这样做：

a[:len(bbins)] += bbins

(Edited for further simplification.)

- Alok Singhal

当b包含少量大型bin数字时，这不会变得更慢吗？ - Eric O. Lebigot

是的，在这种情况下，它会比简单的Python循环慢，但仍然比OP的代码快。我用b = [99999, 99997, 99999]和a = np.zeros(1000, 'int')进行了快速计时测试。时间如下：OP：2.5毫秒，我的代码：495微秒，简单循环：84微秒。 - Alok Singhal

这个很好。在我的程序中，一个简单的循环通常会更慢。谢谢。 - fideli

5

在多维情况下是否有类似的方法可以实现这一点？ - ajwood

1

如果b是a的一个小子范围，我们可以像这样改进Alok的答案：

import numpy as np
a = np.zeros( 100000, int )
b = np.array( [99999, 99997, 99999] )

blo, bhi = b.min(), b.max()
bbins = np.bincount( b - blo )
a[blo:bhi+1] += bbins

print a[blo:bhi+1]  # 1 0 2

- denis

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- gojomo · Accepted Answer

在numpy的版本大于等于1.8时，您还可以使用加法“通用函数”（'ufunc'）的at方法。正如文档所述：

对于加法ufunc，此方法相当于a [indices] + = b，除了对于多次索引的元素会累积结果。

因此，以您的示例为例：

a = np.zeros(6).astype('int')
b = [3, 2, 5, 2]

...然后...

np.add.at(a, b, 1)

...会将a保留为...

array([0, 0, 2, 1, 0, 1])