在一个numpy数组中找到非零数前面的零的个数。

Question

在一个numpy数组中找到非零数前面的零的个数。

8

我有一个numpy数组A。我想以一种高效的方式返回A中第一个非零元素之前的零的数量，因为它在循环中。如果A = np.array([0,1,2])，那么np.nonzero(A)[0][0]将返回1。但是，如果A = np.array([0,0,0])，这种方法不起作用（在这种情况下我希望得到的答案是3）。而且，如果A非常大并且第一个非零数接近开头，这似乎效率低下。

- Simd

相关问题：https://dev59.com/TGsz5IYBdhLWcg3w0rZC - YXD

1

相关工单：https://github.com/numpy/numpy/issues/2269 - shx2

@shx2 嗯...那个工单基本上停在两年前，然后指向另一个工单，该工单在10个月前就没有消息了。 - Simd

6个回答

4

在数组末尾添加一个非零数字，你仍然可以使用np.nonzero来得到你想要的结果。

A = np.array([0,1,2])
B = np.array([0,0,0])

np.min(np.nonzero(np.hstack((A, 1))))   # --> 1
np.min(np.nonzero(np.hstack((B, 1))))   # --> 3

- tanemaki

3

这里是一个迭代的Cython版本，如果这是一个严重的瓶颈，那么这可能是您最好的选择。

# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
    cdef int elements = a.size
    cdef int i = 0
    cdef int count = 0
    while i < elements:
        if a[i] == 0:
            count += 1
        else:
            return count
        i += 1
    return count

这类似于@mtrw的答案，但是使用本地速度进行索引。我的Cython有点生疏，因此可能还有进一步的改进。

在IPython中进行了一个极为有利的测试，尝试了几种不同的方法。

In [1]: import numpy as np

In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)

In [3]: import count_leading_zeros

In [4]: %paste
def count_leading_zeros_python(x):
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else:
            return ctr
    return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)

In [6]: a[5] = 1

In [7]: 

In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop

In [8]: 

In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop

In [9]: 

In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop

In [10]: 

In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop

然而，我只会在有证据（通过分析器）表明这是瓶颈时才使用此类方式。许多事情看起来可能效率低下，但却不值得你花时间去修复。

- YXD

+1 用于测试。有趣的是，即使迭代次数很少，Python循环的速度也比Cython慢约10倍。如果大数组中的非零元素稍后出现，Cython的优势将更加明显。 - mtrw

1

“天真的方法”有什么问题：

def countLeadingZeros(x):
""" Count number of elements up to the first non-zero element, return that count """
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else: #short circuit evaluation, we found a non-zero so return immediately
            return ctr
    return ctr #we get here in the case that x was all zeros

只要找到非零元素就返回，所以在最坏情况下是O(n)。将其移植到C语言可能会使它更快，但值得测试以确定是否对正在使用的数组真正必要。

- mtrw

1

我很惊讶为什么还没有人使用np.where 如果np.shape(np.where(a)[0])[0] != 0，则np.where(a)[0][0]可以解决问题，否则返回np.shape(a)[0]

>> a = np.array([0,1,2])
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 1
>> a = np.array([0,0,0))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 3
>> a = np.array([1,2,3))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 0

- Anoop

-2

如果您不在意速度，我有一个小技巧可以完成这项工作：

a = np.array([0,0,1,1,1])
t = np.where(a==0,1,0)+np.append(np.where(a==0,0,1),0)[1:]
print t
[1 2 1 1 0]
np.where(t==2)
(array([1]),)

- Raymond Coulson

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eelco Hoogendoorn · Accepted Answer

i = np.argmax(A!=0)
if i==0 and np.all(A==0): i=len(A)

这应该是没有扩展最有效的解决方案。同时也可以轻松地向多个轴作用。