如何使用geopanda或shapely在同一地理数据框中查找最近点

Question

如何使用geopanda或shapely在同一地理数据框中查找最近点

11

我有一个地理数据框，其中包含约25个点几何表示的位置。我正在尝试编写一个脚本，遍历每个点，识别最近的位置，并返回最近位置的名称和距离。

如果我有不同的地理数据框，可以轻松使用shapely.ops库中的nearest_points(geom1, geom2)完成此操作。然而，我所有的位置都存储在一个地理数据框中。我正试图进行循环，这就是我遇到问题的地方。

以下是我的示例文件：

geofile = gpd.GeoDataFrame([[0, 'location A', Point(55, 55)],
                            [1, 'location B', Point(66, 66)],
                            [2, 'Location C', Point(99, 99)],
                            [3, 'Location D', Point(11, 11)]],
                           columns=['ID','Location','geometry'])

这是我创建但无法正常运行的循环。

for index, row in geofile.iterrows():
    nearest_geoms=nearest_points(row, geofile)
    print('location:' + nearest_geoms[0])
    print('nearest:' + nearest_geoms[1])
    print('-------')

我遇到了这个错误：

AttributeError: 'Series' object has no attribute '_geom'

然而，我认为我的问题超出了错误的原因，因为我必须排除我正在循环遍历的行，因为那将自动返回最接近的位置，因为它就是那个位置。

对于一个位置，我的最终结果如下：

([0,'location A','location B', '5 miles', Point(55,55)], columns=['ID','Location','Nearest', 'Distance',geometry'])

- rzt101

2个回答

2

以下方法使用sklearn.neighbors.NearestNeighbors只需两行代码即可完成此任务，并且可以很好地扩展（无论是点的数量还是邻居的数量）。

import numpy as np
import pandas as pd
import geopandas as gpd
from sklearn.neighbors import NearestNeighbors

N_POINTS = 10_000
N_NEIGHBORS = 10

# generate larger dataframe with random points:
np.random.seed(23)
acoords = np.random.randint(0, 1000, (N_POINTS, 2))
df = gpd.GeoDataFrame({"ID": range(N_POINTS)}, geometry=gpd.points_from_xy(acoords[:, 0], acoords[:, 1]))

# 2d numpy array of the coordinates
coords = np.array(df.geometry.map(lambda p: [p.x, p.y]).tolist())

# "train"/initialize the NearestNeighbors model 
# NOTE: N_NEIGHBORS + 1 since we are dropping the nearest point 
#       (which is each point itself with distance 0)
knn = NearestNeighbors(n_neighbors=N_NEIGHBORS + 1, algorithm='kd_tree').fit(coords)
# retrieve neighbors (distance and index)
knn_dist, knn_idx = knn.kneighbors(coords)

# add results to dataframe:
df[list(map("NEIGHBOR_{}".format, range(1, N_NEIGHBORS + 1)))] = \
        df.geometry.values.to_numpy()[knn_idx[:, 1:]]

print(df)

结果：

        ID                 geometry       NEIGHBOR_1  ...       NEIGHBOR_8  \
0        0  POINT (595.000 742.000)  POINT (597 737)  ...  POINT (592 756)   
1        1   POINT (40.000 969.000)   POINT (40 971)  ...   POINT (27 961)   
...    ...                      ...              ...  ...              ...   
9998  9998   POINT (38.000 508.000)   POINT (34 507)  ...   POINT (50 516)   
9999  9999  POINT (891.000 936.000)  POINT (887 931)  ...  POINT (876 929)   

           NEIGHBOR_9      NEIGHBOR_10  
0     POINT (598 727)  POINT (606 730)  
1      POINT (31 954)   POINT (37 987)  
...               ...              ...  
9998   POINT (29 496)   POINT (23 511)  
9999  POINT (908 930)  POINT (901 951)  

[10000 rows x 12 columns]

旧的/过时的答案:

这里是另一种基于scipy.spatial.distance.cdist的方法。通过使用numpy屏蔽数组来避免iterrows。

import geopandas as gpd
from scipy.spatial import distance
import numpy.ma as ma
from shapely.geometry import Point
import numpy as np

df = gpd.GeoDataFrame([[0, 'location A', Point(55,55)], 
                       [1, 'location B', Point(66,66)],
                       [2, 'Location C', Point(99,99)],
                       [3, 'Location D' ,Point(11,11)]], 
                      columns=['ID','Location','geometry'])

coords = np.stack(df.geometry.apply(lambda x: [x.x, x.y]))
distance_matrix = ma.masked_where((dist := distance.cdist(*[coords] * 2)) == 0, dist)
df["closest_ID"] = np.argmin(distance_matrix, axis=0)
df = df.join(df.set_index("ID").geometry.rename("nearest_geometry"), on="closest_ID")
df.drop("closest_ID", axis=1)

# Out:
   ID    Location               geometry           nearest_geometry
0   0  location A  POINT (55.000 55.000)  POINT (66.00000 66.00000)
1   1  location B  POINT (66.000 66.000)  POINT (55.00000 55.00000)
2   2  Location C  POINT (99.000 99.000)  POINT (66.00000 66.00000)
3   3  Location D  POINT (11.000 11.000)  POINT (55.00000 55.00000)

多邻居的概括

由于distance_matrix包含了所有点对之间距离的完整信息，因此很容易将此方法推广到任意数量的邻居。例如，如果我们想要找到每个点的N_NEAREST = 2个邻居，我们可以对距离矩阵进行排序（使用np.argsort而不是像之前选择np.argmin），然后选择相应数量的列：

nearest_id_cols = list(map("nearest_id_{}".format, range(1, N_NEAREST + 1)))
nearest_geom_cols = list(map("nearest_geometry_{}".format, range(1, N_NEAREST + 1)))
df[nearest_id_cols] = np.argsort(distance_matrix, axis=1)[:, :N_NEAREST]
df[nearest_geom_cols] = df[nearest_id_cols].applymap(
                             lambda x: df.set_index("ID").geometry[x])

# out:
   ID    Location                  geometry  nearest_id_1  nearest_id_2  \
0   0  location A  POINT (55.00000 55.00000)             1             2   
1   1  location B  POINT (66.00000 66.00000)             0             2   
2   2  Location C  POINT (99.00000 99.00000)             1             0   
3   3  Location D  POINT (11.00000 11.00000)             0             1   

  nearest_geometry_1 nearest_geometry_2  
0       POINT (66 66)       POINT (99 99)  
1       POINT (55 55)       POINT (99 99)  
2       POINT (66 66)       POINT (55 55)  
3       POINT (55 55)       POINT (66 66)

- mcsoini

1

太棒了！对于更大的数据集来说，速度应该会快得多。干杯！ - CreekGeek

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Christoph Rieke · Accepted Answer

Shapely的nearest_points函数比较shapely几何体。要将单个Point几何体与多个其他Point几何体进行比较，可以使用.unary_union与结果MultiPoint几何体进行比较。而且，在每行操作时，请删除相应的点，以便不将其与自身进行比较。

import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points

df = gpd.GeoDataFrame([[0, 'location A', Point(55,55)], 
                       [1, 'location B', Point(66,66)],
                       [2, 'Location C', Point(99,99)],
                       [3, 'Location D' ,Point(11,11)]], 
                      columns=['ID','Location','geometry'])
df.insert(3, 'nearest_geometry', None)

for index, row in df.iterrows():
    point = row.geometry
    multipoint = df.drop(index, axis=0).geometry.unary_union
    queried_geom, nearest_geom = nearest_points(point, multipoint)
    df.loc[index, 'nearest_geometry'] = nearest_geom

导致

    ID  Location    geometry        nearest_geometry
0   0   location A  POINT (55 55)   POINT (66 66)
1   1   location B  POINT (66 66)   POINT (55 55)
2   2   Location C  POINT (99 99)   POINT (66 66)
3   3   Location D  POINT (11 11)   POINT (55 55)