NumPy检测区域边界

Question

NumPy检测区域边界

pythonnumpy

3

给定一个一维值数组：

A = [x,..,x,0,..,0,x,..,x,0,..,0,x,..,x,........]

其中：

x,..,x表示任意数量的任意值

0,..,0表示任意数量的零

我需要找到一个快速的算法来查找边界的索引，即：..,x,0,.. 和 ..,0,x..

这个问题似乎适合并行化处理，但这超出了我的经验。简单地循环遍历数组速度太慢，因为数据太大。

谢谢马丁

- martburg

你的数组有多大？0的数量比x多很多，还是x比0多，或者两者大致相同？这些边界的数量很大（相对于大小），还是很少？ - chthonicdaemon

@RubenBermudez 没有错误，只是计算时间太长了。 - martburg

@chthonicdaemon 长度约为200,000个值。零的垃圾具有已知的最大长度<= ~ 5000个值---相对较小的边界数。 - martburg

你能给出一个可运行的Python代码示例，来展示A是如何使用的吗？例如：A = np.array([1, 2, 3])，并且你期望得到的输出结果是什么？答案可能涉及到np.diff或者np.ediff1d。 - YXD

1

有点随意，这个问题实际上根本不适合并行化。瓶颈将是内存访问，而不是 CPU 速度。几乎没有计算要执行，但需要遍历一个大数组。如果你小心处理，应该能够通过并行方法获得加速，但这并不像你一开始想的那么简单。 - Joe Kington

显示剩余4条评论

2个回答

0

这至少应该将循环推入到Numpy原语中，尽管它会遍历数组三次：

A = 2*(rand(200000)>0.2)  # testing data
borders = flatnonzero(diff(A==0))

在我的电脑上，这需要1.79毫秒。

- chthonicdaemon

能够处理我的测试数据--我需要理解那段代码中发生了什么。 - martburg

1

+1，但是你目前的做法实际上存在一个微妙的错误。如果OP的数组以一段零开始或结束，你需要在“borders”中添加0/-1。当然，OP只是要求更改的索引，这可以实现。但是，要实际使用结果，您需要知道更改的方向。 - Joe Kington

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joe Kington · Accepted Answer

@chthonicdaemon的回答让你完成了90%的工作，但如果你想使用索引来切分数组，你需要一些额外的信息。假设你想使用这些索引来提取数组中不为0的区域。你已经找到了数组变化的索引，但你不知道这种变化是从True到False还是相反。因此，你需要检查第一个和最后一个值并进行相应的调整。否则，在某些情况下，你将提取零段而不是数据。

例如：

import numpy as np

def contiguous_regions(condition):
    """Finds contiguous True regions of the 1D boolean array "condition".
    Returns a 2D array where the first column is the start index of the region
    and the second column is the end index."""
    # Find the indicies of changes in "condition"
    idx = np.flatnonzero(np.diff(condition)) + 1

    # Prepend or append the start or end indicies to "idx"
    # if there's a block of "True"'s at the start or end...
    if condition[0]:
        idx = np.append(0, idx)
    if condition[-1]:
        idx = np.append(idx, len(condition))

    return idx.reshape(-1, 2)

# Generate an example dataset...
t = np.linspace(0, 4*np.pi, 20)
x = np.abs(np.sin(t)) + 0.1
x[np.sin(t) < 0.5] = 0

print x

# Get the contiguous regions where x is not 0
for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

在这种情况下，我们的示例数据集如下所示：

array([ 0.        ,  0.71421271,  1.06940027,  1.01577333,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.93716648,  1.09658449,  0.83572391,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ])

并通过以下方式实现：

for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

我们将获得：

[ 0.71421271  1.06940027  1.01577333]
[ 0.93716648  1.09658449  0.83572391]