你无法计算每对城市之间的距离。相反,你需要将城市放入一个空间划分数据结构中,以便可以进行快速的最近邻查询。
空间划分数据结构 SciPy带有适用于此应用程序的
kd-tree实现,
scipy.spatial.KDTree
。
这里有两个困难。首先,
scipy.spatial.KDTree
使用点之间的欧几里得距离,但你想要使用沿着地球表面的大圆距离。其次,经度会环绕,因此最近邻可能具有相差360°的经度。如果采取以下方法,则可以解决这两个问题:
将您的位置从大地坐标(纬度,经度)转换为ECEF(地心坐标系)坐标(x,y,z)。
将这些ECEF坐标放入{{link3:scipy.spatial.KDTree
}}中。
将您的大圆距离(例如30千米)转换为欧几里得距离。
调用{{link4:scipy.spatial.KDTree.query_ball_point
}}以获取范围内的城市。
这里有一些示例代码来说明这种方法。函数
geodetic2ecef
来自
David Parunakian的PySatel,并在GPL下获得许可。
from math import radians, cos, sin, sqrt
A = 6378.137
B = 6356.7523142
ESQ = 6.69437999014 * 0.001
def geodetic2ecef(lat, lon, alt=0):
"""Convert geodetic coordinates to ECEF."""
lat, lon = radians(lat), radians(lon)
xi = sqrt(1 - ESQ * sin(lat))
x = (A / xi + alt) * cos(lat) * cos(lon)
y = (A / xi + alt) * cos(lat) * sin(lon)
z = (A / xi * (1 - ESQ) + alt) * sin(lat)
return x, y, z
def euclidean_distance(distance):
"""Return the approximate Euclidean distance corresponding to the
given great circle distance (in km).
"""
return 2 * A * sin(distance / (2 * B))
让我们随机生成五万个城市位置并将它们转换为ECEF坐标:
>>> from random import uniform
>>> cities = [(uniform(-90, 90), uniform(0, 360)) for _ in range(50000)]
>>> ecef_cities = [geodetic2ecef(lat, lon) for lat, lon in cities]
将它们放入
scipy.spatial.KDTree
中:
>>> import numpy
>>> from scipy.spatial import KDTree
>>> tree = KDTree(numpy.array(ecef_cities))
查找距离伦敦约100公里的所有城市:
>>> london = geodetic2ecef(51, 0)
>>> tree.query_ball_point([london], r=euclidean_distance(100))
array([[37810, 15755, 16276]], dtype=object)
这个数组包含了你查询的每个点周围距离r
内的城市数组。每个邻居都以其在传递给KDTree
的原始数组中的索引表示。因此,在伦敦周围约100公里范围内有三个城市,即原始列表中索引为37810、15755和16276的城市:
>>> from pprint import pprint
>>> pprint([cities[i] for i in [37810, 15755, 16276]])
[(51.7186871990946, 359.8043453670437),
(50.82734317063884, 1.1422052710187103),
(50.95466110717763, 0.8956257749604779)]
注意:
You can see from the example output that neighbours with longitudes that differ by about 360° are correctly discovered.
The approach seems fast enough. Here we find neighbours within 30 km for the first thousand cities, taking about 5 seconds:
>>> from timeit import timeit
>>> timeit(lambda:tree.query_ball_point(ecef_cities[:1000], r=euclidean_distance(30)), number=1)
5.013611573027447
Extrapolating, we expect to find neighbours within 30 km for all 50,000 cities in about four minutes.
My euclidean_distance
function overestimates the Euclidean distance corresponding to a given great circle distance (so as not to miss any cities). This might be good enough for some applications—after all, cities are not point objects—but if you need more accuracy than this, then you could filter the resulting points using, say, one of the great circle distance functions from geopy.