NumPy数组指示器操作

Question

NumPy数组指示器操作

4

我希望通过给定的指标（x和y轴）修改一个空位图。

对于由指标给出的每个坐标，值应该增加一。

到目前为止，一切都很顺利。但是，如果我的指标数组中有一些相似的指标，它只会将值增加一次。

>>> img
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

>>> inds
array([[0, 0],
       [3, 4],
       [3, 4]])

操作：

>>> img[inds[:,1], inds[:,0]] += 1

结果：

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0]])

期望的结果：

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 2, 0]])

有没有什么方法可以解决这个问题？最好是不使用循环的快速方法。

- elajdsha

另请参见 https://dev59.com/_1vUa4cB1Zd3GeqPsE7S#7435155 - AGN Gazer

3个回答

5

你可以使用 numpy.add.at ，并进行一些操作来准备好索引。

np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

如果您有更大的 inds 数组，这种方法仍然应该很快... (尽管 Paul Panzer 的解决方案更快)

- miradulo

4

关于其他两个答案的两点说明：

1）@jpp的回答可以通过使用带有axis和return_counts关键字的np.unique来改进。

2）如果我们将其转换为平面索引，我们可以使用np.bincount，它通常比np.add.at更快（但不总是，参见基准测试中的最后一个测试用例）。

感谢@miradulo提供基准测试的初始版本。

import numpy as np

def jpp(img, inds):
    counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)
    img[inds[:,1], inds[:,0]] += counts

def jpp_pp(img, inds):
    unq, cnts = np.unique(inds, axis=0, return_counts=True)
    img[unq[:,1], unq[:,0]] += cnts

def miradulo(img, inds):
    np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

def pp(img, inds):
    imgf = img.ravel()
    indsf = np.ravel_multi_index(inds.T[::-1], img.shape[::-1])
    imgf += np.bincount(indsf, None, img.size)

inds = np.random.randint(0, 5, (3, 2))
big_inds = np.random.randint(0, 5, (10000, 2))
sml_inds = np.random.randint(0, 1000, (5, 2))
from timeit import timeit


for f in jpp, jpp_pp, miradulo, pp:
    print(f.__name__)
    for i, n, a in [(inds, 1000, 5), (big_inds, 10, 5), (sml_inds, 10, 1000)]:
        img = np.zeros((a, a), int)
        print(timeit("f(img, i)", globals=dict(img=img, i=i, f=f), number=n) * 1000 / n, 'ms')

输出：

jpp
0.011815106990979984 ms
2623.5026352020213 ms
0.04642329877242446 ms
jpp_pp
0.041291153989732265 ms
5.418520100647584 ms
0.05826510023325682 ms
miradulo
0.007099648006260395 ms
0.7788308983435854 ms
0.009103797492571175 ms
pp
0.0035401539935264736 ms
0.06540440081153065 ms
3.486583800986409 ms

- Paul Panzer

2

啊，这很好，我会用你的基准测试替换掉我的。 - miradulo

2

@jpp 有一种情况下，bincount 不是最好的选择，那就是当 img 很大而 inds 很小的时候。我会尝试将其添加到基准测试中。 - Paul Panzer

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

这是一种方法。计数算法由@AlexRiley提供。

有关img和inds相对大小的性能影响，请参见@PaulPanzer的答案。

# count occurrences of each row and return array
counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)

# apply indices and counts
img[inds[:,1], inds[:,0]] += counts

print(img)

array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 2, 0]])