Numpy/Python中快速处理数组的性能

Question

Numpy/Python中快速处理数组的性能

5

我正在尝试找出处理存储在多个numpy数组中的坐标和测量数据的最佳方式（最快性能）。我需要计算从每个网格点（绿色中的lot，lon，alt值）到每个测量位置（灰色中的目标范围中的纬度，经度，高度，距离）的距离。考虑到有数百个网格点和数千个测量范围需要计算每个网格点的距离，我想以最有效的方式遍历数组。我正在尝试决定如何存储网格和测量的LLA测量值，以及根据测量范围值与实际范围之间的差异计算每个网格点的均方误差的理想方法。

任何关于最佳存储这些值的想法，以及在网格上迭代以确定来自每个测量的范围的想法都将不胜感激。谢谢！！

目前，我正在使用二维meshgrid存储网格的LLA值。

# Create a 2D Grid that will be used to store the MSE estimations
# First, create two 1-D arrays representing the X and Y coordinates of our grid
x_delta = abs(xmax-xmin)/gridsize_x
y_delta = abs(ymax-ymin)/gridsize_y
X = np.arange(xmin,xmax+x_delta,x_delta)
Y = np.arange(ymin,ymax+y_delta,y_delta)

# Next, pass arrays to meshgrid to return 2-D coordinate matrices from the 1-D coordinate arrays
grid_lon, grid_lat = np.meshgrid(X, Y)

我有来自测量数据的LLA点和范围值存储在一个测量类中

measurement_lon = [measurement.gps.getlon() for measurement in target_measurements]
measurement_lat = [measurement.gps.getlat() for measurement in target_measurements]
measurement_range = [measurement.getrange() for measurement in target_measurements]

测量类

class RangeMeasurement:

def __init__(self, lat, lon, alt, range):
  self.gps = GpsLocation(lat,lon,alt)
  self.range = range

计算范围的伪代码非常糟糕（迭代和非常慢）

for i in len(grid_lon):
  for j in len(measurement_lat):
    range_error += distance(grid_lon[i],grid_lat[i],measurement_lon[j],measurement_lat[j])-measurement_range[j]

- Alex

很遗憾，由于我是新用户，无法发布图片- 如果您有兴趣，可以给我发消息，我可以通过电子邮件向您发送示例图片。 - Alex

2

你可以将其发布在某些图像共享网站上，并放置链接，然后我们中有足够声望的人可以将其适当地整合到帖子中。 - mac

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- J. Kevin Corcoran · Accepted Answer

我认为scipy.spatial.distance模块可以帮助您解决这个问题：http://docs.scipy.org/doc/scipy/reference/spatial.distance.html 您应该将点存储为具有2列和N行的2维numpy数组，其中N是数组中点的数量。要将grid_lon和grid_lat转换为此格式，请使用：

N1 = grid_lon.size
grid_point_array = np.hstack([grid_lon.reshape((N1,1)), grid_lat.reshape((N1,1))])

这个操作会将所有的grid_lon值取出来，这些值被排列成一个和grid相同形状的矩形数组，并放入一个只有N行一列的数组中。grid_lat也是同样的处理方式。然后将这两个只有一列的数组合并成一个两列的数组。

类似的方法也可以用于转换你的测量数据：

N2 = len(measurement_lon)
measurment_data_array = np.hstack([np.array(measurement_lon).reshape((N2,1)),
    np.array(measurement_lat).reshape((N2,1))])

一旦您的数据以此格式呈现，您可以使用scipy.spatial.distance轻松找到每对点之间的距离：

d = scipy.spatial.distance.cdist(grid_point_array, measurement_data_array, 'euclidean')

d将是一个具有N1行和N2列的数组，d[i,j]将是网格点i和测量点j之间的距离。

编辑：感谢澄清范围错误。听起来是个有趣的项目。这将给出具有最小累积平方误差的网格点：

measurement_range_array = np.array(measurement_range)
flat_grid_idx = pow(measurement_range_array-d,2).sum(1).argmin()

这利用广播来获取点的测量范围与其与每个网格点的距离之间的差异。然后对于给定网格点的所有误差进行求和，得到的1-D数组应该是所需的累积误差。调用argmin()以找到最小值的位置。要从平坦索引中获取x和y网格坐标，请使用

grid_x = flat_grid_idx % gridsize_x
grid_y = flat_grid_idx // gridsize_x

（// 表示整数除法。）