高效地识别NumPy矩阵中相邻元素

Question

高效地识别NumPy矩阵中相邻元素

10

我有一个100乘100的numpy矩阵。该矩阵主要由零填充，但也包含一些整数。例如：

[0 0 0 0 0 0 0 1]
[0 2 2 0 0 0 0 0]
[0 0 2 0 0 0 0 0]  False
[0 0 0 0 0 0 0 0]
[0 3 3 0 0 0 0 0]

如何高效地确定矩阵是否包含相邻的不同类型的整数？

上面的示例将返回False。这里是一个True示例，其中包含指示相邻性的行：

[0 0 0 0 0 0 0 1]
[0 2 2 1 1 0 0 0]   <----  True
[0 0 2 0 0 0 0 0]  
[0 0 0 0 0 0 0 0]
[0 3 3 0 0 0 0 0]

对角线不被视为相邻。因此，该示例也将返回False：

[0 0 0 1 1 1 1 1]
[0 2 2 0 1 0 0 0]
[0 0 2 0 0 0 0 0]   False
[0 3 0 0 0 0 0 0]
[3 3 3 0 0 0 0 0]

我不需要确定相邻位置的具体坐标，只需要知道它是否存在。

目前，我找到矩阵中每个非零元素，并检查其4个相邻元素是最好的方法。

感谢所有出色的答案。

- Chris Parry

1

不发表完整答案，我会从带有轴参数的 numpy.diff 开始。 - Benjamin

谢谢，这应该会显著减少搜索范围。 - Chris Parry

你能添加检查每个元素的代码吗？这可以消除测试中存在的一些模糊不清。 - hpaulj

5个回答

3

看到你的问题描述，检查每个可能的非零整数值在数组中的位置并查看是否有交点可能不需要太多的计算工作。现在，这通常是过度的，但在您的规模上可能会起作用：您可以获取每个整数集合的索引，并使用scipy.spatial.distance.cdist计算它们之间的距离。我相信基于diff或其他方法的一些智能解决方案更有效，但我还是很开心：

import numpy as np
from scipy.spatial.distance import cdist
from itertools import combinations

M1 = np.array(
[[0,0,0,0,0,0,0,1],
 [0,2,2,1,1,0,0,0],  
 [0,0,2,0,0,0,0,0],
 [0,0,0,0,0,0,0,0],
 [0,3,3,0,0,0,0,0]])

M2 = np.array(
[[0,0,0,1,1,1,1,1],
 [0,2,2,0,1,0,0,0],
 [0,0,2,0,0,0,0,0],  
 [0,3,0,0,0,0,0,0],
 [3,3,3,0,0,0,0,0]])

def overlaps_eh(M):
    uniques = np.delete(np.unique(M),0) # get integers present
    unival_inds = [np.transpose(np.where(M==unival)) for unival in uniques]
    # unival_inds[k] contains the i,j indices of each element with the kth unique value

    for i1,i2 in combinations(range(len(unival_inds)),2):
        if np.any(cdist(unival_inds[i1],unival_inds[i2],'cityblock')==1):
            return True
    # if we're here: no adjacencies
    return False

首先，我们将非零独特的矩阵元素收集到数组uniques中。然后对于每个唯一值，我们找到具有此值的输入数组中每个元素的i,j索引。然后我们检查每对唯一值（使用itertools.combinations生成），并使用scipy.spatial.distance.cdist测量每对矩阵元素的成对距离。使用曼哈顿距离，如果任何一对元素的距离为1，则它们相邻。因此，只有在这些距离中的任何一个为1的情况下，我们才需要返回True，否则我们返回False。

- Andras Deak -- Слава Україні

3

以下是一种使用切片视图的方法，重点关注性能-

def distinct_ints(a):
    # Mask of zeros, non-zeros as we would use them frequently
    zm = a==0
    nzm = ~zm

    # Look for distint ints across rows
    row_thresh = (nzm[:,1:] & zm[:,:-1]).sum(1)
    row_out = ((nzm[:,1:] & (a[:,1:] != a[:,:-1])).sum(1)>row_thresh).any()

    # Look for distint ints across cols
    col_thresh = (nzm[1:] & zm[:-1]).sum(0)
    col_out = ((nzm[1:] & (a[1:] != a[:-1])).sum(0)>col_thresh).any()

    # Any from rows or cols
    out = row_out | col_out
    return out

- Divakar

2

这里有一个使用屏蔽数组的解决方案：

import numpy as np
import numpy.ma as ma
a = np.array([[0,1,0], [0,1,0], [2,2,2]])    # sample data 
x = ma.masked_equal(a, 0)                    # mask zeros
adjacencies = np.count_nonzero(np.diff(x, axis=0).filled(0)) + np.count_nonzero(np.diff(x, axis=1).filled(0))

在最后一行，将diff应用于掩码数组（忽略零条目）；diff中的非零条目表示数组a中相邻的不同的非零条目。变量adjacencies将具有相邻条目的总数（也许您只想知道它是否为0）。在上面的示例中，它是1。

- user6655984

1

可以使用numpy.diff来完成，但是需要注意的是，零值不应该被考虑在内，这增加了一些复杂性。

你可以将零值设置为足够大或小的值，以避免出现问题：

a[a == 0] = -999

或者使用浮点数数组，并将它们设置为nan或inf：

a[a == 0] = numpy.nan

然后只需在每个方向上查找1的一阶差异：

numpy.any(numpy.abs(numpy.diff(a, axis=0)) == 1) or numpy.any(numpy.abs(numpy.diff(a, axis=1)) == 1)

- Benjamin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MSeifert · Accepted Answer

如果您能使用scipy，使用ndimage.label和ndimage.labeled_comprehension将会非常容易：

import numpy as np
from scipy import ndimage

def multiple_unique_item(array):
    return len(np.unique(array)) > 1

def adjacent_diff(array):
    labeled_array, num_labels = ndimage.label(array)
    labels = np.arange(1, num_labels+1)
    any_multiple = ndimage.labeled_comprehension(array, labeled_array, labels, 
                                                 multiple_unique_item, bool, 0)
    return any_multiple.any()

label 默认标记与 0 不相邻的值，不包括对角线。然后，推导式将与标签相关联的所有值传递给辅助函数 - 检查是否存在多个唯一值。最后，它检查是否有任何标签具有多个值，并返回此结果。

要在测试输入数组上测试此功能：

arr1 = np.array([[0,0,0,0,0,0,0,1],
                 [0,2,2,1,1,0,0,0],  
                 [0,0,2,0,0,0,0,0],
                 [0,0,0,0,0,0,0,0],
                 [0,3,3,0,0,0,0,0]])

arr2 = np.array([[0,0,0,1,1,1,1,1],
                 [0,2,2,0,1,0,0,0],
                 [0,0,2,0,0,0,0,0],  
                 [0,3,0,0,0,0,0,0],
                 [3,3,3,0,0,0,0,0]])

arr3 = np.array([[0,0,0,0,0,0,0,1],
                 [0,2,2,0,0,0,0,0],
                 [0,0,2,0,0,0,0,0],  
                 [0,0,0,0,0,0,0,0],
                 [0,3,3,0,0,0,0,0]])

>>> adjacent_diff(arr1)
True
>>> adjacent_diff(arr2)
False
>>> adjacent_diff(arr3)
False