如何在两个数组中找到最接近的元素？

Question

如何在两个数组中找到最接近的元素？

6

我是一名有用的助手，可以为您翻译文本。

我有两个numpy数组，如 X=[x1,x2,x3,x4], y=[y1,y2,y3,y4]。其中三个元素是接近的，而第四个元素可能接近也可能不接近。

例如：

X   [ 84.04467948  52.42447842  39.13555678  21.99846595]
y   [ 78.86529444  52.42447842  38.74910101  21.99846595]

或者它可以是这样的：

X   [ 84.04467948  60  52.42447842  39.13555678]
y   [ 78.86529444  52.42447842  38.74910101  21.99846595]

我希望定义一个函数来查找两个数组中对应的索引，就像第一种情况：

y[0] 对应 X[0],
y[1] 对应 X[1],
y[2] 对应 X[2],
y[3] 对应 X[3]

第二种情况：

y[0] 对应 X[0],
y[1] 对应 X[2],
y[2] 对应 X[3] ，
而y [3] 对应X [1]。

我无法完全解决这个问题，请帮忙。

- insomnia

2

你的函数代码目前是什么？ - grael

@grael 你好，我能做的是在X和Y的三个循环中获取最接近的一组。但这浪费太多时间了... - insomnia

@insomnia 然后你需要定义一些错误模型。错误是什么？L1范数？L2范数？对于所有方法，它的行为都不同。或者：只需预先计算所有成对差异，排序并循环遍历即可！ - sascha

@eiKatte 感谢您的评论。这是关键，也是难点。 - insomnia

1

这个怎么样：计算所有 x 和所有 y 项之间的最小差异。获取总体最小差异的那个，将它们各自的项目弹出它们的列表并重复。一个递归函数。 - Ma0

显示剩余3条评论

4个回答

3

使用这个答案https://dev59.com/i2ox5IYBdhLWcg3w44Lb#8929827和https://dev59.com/VWct5IYBdhLWcg3wXsXH#12141207 已修复

def find_closest(alist, target):
    return min(alist, key=lambda x:abs(x-target))

X = [ 84.04467948,  52.42447842,  39.13555678,  21.99846595]
Y = [ 78.86529444,  52.42447842,  38.74910101,  21.99846595]

def list_matching(list1, list2):
    list1_copy = list1[:]
    pairs = []
    for i, e in enumerate(list2):
        elem = find_closest(list1_copy, e)
        pairs.append([i, list1.index(elem)])
        list1_copy.remove(elem)
    return pairs

- Sardorbek Imomaliev

这允许每个y中的元素有多个匹配用途，这可能是可以的，也可能不可以。从这个角度来看，它是相当非对称的算法（每个X只使用一次，但不一定在Y中）。 - sascha

1

@sascha 已修复，但它将像您检查第二个列表是否与第一个列表匹配。我认为您可以更新它以更智能地工作。 - Sardorbek Imomaliev

@SardorbekImomaliev 很抱歉回复晚了，如果a=[1,2,3,6]和b=[7,2,3,6]，会导致错误的结果。但我认为添加排序将解决这个问题。实际上，在我的情况下，你的代码已经足够好了。非常感谢。 - insomnia

其实，我不认为这种方法能适用于所有情况，因为你总是可以在枚举中找到一个 "不接近任何元素"，然后将其与最近的值匹配，而实际上它可能会匹配到另一个值，无论你遍历哪个列表。 - jotasi

我必须提供另一个异常。我使用排序数组，但结果很奇怪，X [80.06192623 51.27128419 33.81534928 25.49749915]，y [73.26784071 51.27128419 26.60918437 25.49749915]，输出是[[0,0]，[1,1]，[2,3]，[3,2]]。但好消息是，如果我将数组从小到大排序，情况会更好。 - insomnia

2

看起来最好的方法是预先对两个数组进行排序（n log(n)），然后通过类似于归并的遍历方式遍历两个数组。这肯定比你在评论中提到的 n*n 更快。

- nimdil

非常感谢。但我甚至不知道什么是类似归并的遍历...但我同意您提到的预排序是有用的。 - insomnia

如果你看一下像这里的归并排序：https://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation_using_lists，你会发现它本质上是分割操作和合并操作 - 后者通常是合并操作背后的原因。核心思想是你通过循环遍历两个列表，在每次迭代中前进当前较低值的列表索引（你保留2个索引）。 - nimdil

1

下面的代码只是简单地按照你在问题中所做的方式打印出两个数组的相应索引，因为我不确定你想让函数输出什么。

X1 = [84.04467948, 52.42447842, 39.13555678, 21.99846595]
Y1 = [78.86529444, 52.42447842, 38.74910101, 21.99846595]

X2 = [84.04467948, 60, 52.42447842, 39.13555678]
Y2 = [78.86529444, 52.42447842, 38.74910101, 21.99846595]

def find_closest(x_array, y_array):
    # Copy x_array as we will later remove an item with each iteration and
    # require the original later
    remaining_x_array = x_array[:]
    for y in y_array:
        differences = []
        for x in remaining_x_array:
            differences.append(abs(y - x))
        # min_index_remaining is the index position of the closest x value
        # to the given y in remaining_x_array
        min_index_remaining = differences.index(min(differences))
        # related_x is the closest x value of the given y
        related_x = remaining_x_array[min_index_remaining]
        print 'Y[%s] corresponds to X[%s]' % (y_array.index(y), x_array.index(related_x))
        # Remove the corresponding x value in remaining_x_array so it
        # cannot be selected twice
        remaining_x_array.pop(min_index_remaining)

这将输出以下内容。

find_closest(X1,Y1)
Y[0] corresponds to X[0]
Y[1] corresponds to X[1]
Y[2] corresponds to X[2]
Y[3] corresponds to X[3]

并且

find_closest(X2,Y2)
Y[0] corresponds to X[0]
Y[1] corresponds to X[2]
Y[2] corresponds to X[3]
Y[3] corresponds to X[1]

希望这可以帮助到您。

- Elliot Ledger

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jotasi · Accepted Answer

您可以按照此答案中所示的方式预计算距离矩阵：

import numpy as np

X = np.array([84.04467948,60.,52.42447842,39.13555678])
Y = np.array([78.86529444,52.42447842,38.74910101,21.99846595])

dist = np.abs(X[:, np.newaxis] - Y)

现在你可以沿着一个轴计算最小值（我选择了1，对应于找到每个X的最近元素Y）：

potentialClosest = dist.argmin(axis=1)

这仍然可能包含重复项（在您的情况下为2）。要检查这一点，您可以通过使用np.unique查找potentialClosest中出现的所有Y索引：

closestFound, closestCounts = np.unique(potentialClosest, return_counts=True)

现在，您可以通过检查closestFound.shape[0] == X.shape[0]来检查重复项。如果是这样，那么potentialClosest将包含每个元素在X中的伴侣。但在您的情况2中，一个元素会出现两次，因此closestFound将只有X.shape[0]-1个元素，而closestCounts不仅包含1，还有一个2。对于所有计数为1的元素，其伴侣已经找到。对于计数为2的两个候选者，您需要选择更接近的一个，而距离较大的那个的伴侣将是Y中不在closestFound中的那个元素。可以使用以下方法找到它：

missingPartnerIndex = np.where(
        np.in1d(np.arange(Y.shape[0]), closestFound)==False
        )[0][0]

你可以在循环中进行匹配（尽管可能有一些更好的方法，使用numpy）。这个解决方案相当丑陋，但是有效。非常欢迎任何改进建议。

partners = np.empty_like(X, dtype=int)
nonClosePartnerFound = False
for i in np.arange(X.shape[0]):
    if closestCounts[closestFound==potentialClosest[i]][0]==1:
        # A unique partner was found
        partners[i] = potentialClosest[i]
    else:
        # Partner is not unique
        if nonClosePartnerFound:
            partners[i] = potentialClosest[i]
        else:
            if np.argmin(dist[:, potentialClosest[i]]) == i:
                partners[i] = potentialClosest[i]
            else:
                partners[i] = missingPartnerIndex
                nonClosePartnerFound = True
print(partners)

只有一个元素对不是紧密的时，这个答案才适用。如果不是这种情况，您需要定义如何找到多个非紧密元素的正确配对。不幸的是，这既不是非常通用的解决方案，也不是非常好的解决方案，但希望您能将其作为有用的起点。