计算两个NumPy数组之间的距离

Question

计算两个NumPy数组之间的距离

6

我对计算两个numpy数组（x和y）之间的不同空间距离感兴趣。

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cdist.html

import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1,2,3,4,5],
               [5,6,7,8,5],
               [5,6,7,8,5]],
              [[11,22,23,24,5],
               [25,26,27,28,5],
               [5,6,7,8,5]]])
i,j,k = x.shape

xx = x.reshape(i,j*k).T

y = np.array([[[31,32,33,34,5],
               [35,36,37,38,5],
               [5,6,7,8,5]],
              [[41,42,43,44,5],
               [45,46,47,48,5],
               [5,6,7,8,5]]])

yy = y.reshape(i,j*k).T

results =  cdist(xx,yy,'euclidean')
print results

然而，上述结果产生了太多不需要的结果。我如何限制它只得到我所需的结果。

我想计算[1,11]和[31,41]之间的距离；[2,22]和[32,42]之间的距离，以此类推。

- Borys

2

我认为你的问题指出了API中的一个空缺。pdist和cdist计算输入点的所有组合之间的距离。也就是说，它们将距离计算应用于输入集合的外积。没有相应的函数将距离计算应用于输入参数的内积（即您想要的成对计算）。对于任何给定的距离，您都可以“自己动手”，但这违背了拥有scipy.spatial.distance模块的目的。 - Warren Weckesser

1

@WarrenWeckesser - 或者，scipy.spatial.distance 中的各个函数可以给定一个轴参数或类似的东西。这样就避免了使用 apply_along_axis 的hack方法。看起来只需要对 scipy.spatial.distance._validate_vector 进行一些微调即可。 - Joe Kington

1

@JoeKington：这是我刚想到的其中一种选择。你考虑在https://github.com/scipy/scipy提交一个pull request吗？ :) - Warren Weckesser

@WarrenWeckesser 在scipy.spatial.distance中加入'axis'参数会更容易-我认为。 - Borys

1

@WarrenWeckesser - 像往常一样，这比我想象的要复杂一些。不过我会继续努力的。这肯定会很有用！ - Joe Kington

显示剩余5条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joe Kington · Accepted Answer

如果你只需要每个点对之间的距离，那么你不需要计算完整的距离矩阵。相反，直接计算即可：

import numpy as np

x = np.array([[[1,2,3,4,5],
               [5,6,7,8,5],
               [5,6,7,8,5]],
              [[11,22,23,24,5],
               [25,26,27,28,5],
               [5,6,7,8,5]]])

y = np.array([[[31,32,33,34,5],
               [35,36,37,38,5],
               [5,6,7,8,5]],
              [[41,42,43,44,5],
               [45,46,47,48,5],
               [5,6,7,8,5]]])

xx = x.reshape(2, -1)
yy = y.reshape(2, -1)
dist = np.hypot(*(xx - yy))

print dist

为了更详细地解释发生的情况，我们首先重塑数组的形状，使其具有2xN的形式（-1是一个占位符，告诉numpy自动计算该轴上的正确大小）：

In [2]: x.reshape(2, -1)
Out[2]: 
array([[ 1,  2,  3,  4,  5,  5,  6,  7,  8,  5,  5,  6,  7,  8,  5],
       [11, 22, 23, 24,  5, 25, 26, 27, 28,  5,  5,  6,  7,  8,  5]])

因此，当我们从xx和yy中减去时，将得到一个2xN的数组：

In [3]: xx - yy
Out[3]: 
array([[-30, -30, -30, -30,   0, -30, -30, -30, -30,   0,   0,   0,   0,
          0,   0],
       [-30, -20, -20, -20,   0, -20, -20, -20, -20,   0,   0,   0,   0,
          0,   0]])

我们可以将其拆分为dx和dy两个组件：

In [4]: dx, dy = xx - yy

In [5]: dx
Out[5]: 
array([-30, -30, -30, -30,   0, -30, -30, -30, -30,   0,   0,   0,   0,
         0,   0])

In [6]: dy
Out[6]: 
array([-30, -20, -20, -20,   0, -20, -20, -20, -20,   0,   0,   0,   0,
         0,   0])

并计算距离（np.hypot等同于np.sqrt(dx ** 2 + dy ** 2)）：

In [7]: np.hypot(dx, dy)
Out[7]: 
array([ 42.42640687,  36.05551275,  36.05551275,  36.05551275,
         0.        ,  36.05551275,  36.05551275,  36.05551275,
        36.05551275,   0.        ,   0.        ,   0.        ,
         0.        ,   0.        ,   0.        ])

或者我们可以自动完成解压缩，并在一步完成所有操作：

In [8]: np.hypot(*(xx - yy))
Out[8]: 
array([ 42.42640687,  36.05551275,  36.05551275,  36.05551275,
         0.        ,  36.05551275,  36.05551275,  36.05551275,
        36.05551275,   0.        ,   0.        ,   0.        ,
         0.        ,   0.        ,   0.        ])

如果你想计算其他类型的距离，只需将np.hypot更改为你想使用的函数。例如，曼哈顿/城市街区距离：

In [9]: dist = np.sum(np.abs(xx - yy), axis=0)

In [10]: dist
Out[10]: array([60, 50, 50, 50,  0, 50, 50, 50, 50,  0,  0,  0,  0,  0,  0])