我的数据框中有4列,包含以下数据:
Start_latitude<br>
Start_longitude<br>
Stop_latitude<br>
Stop_longitude<br>
我需要计算纬度和经度之间的距离,并创建一个新的列来存储所计算的距离。我找到了一个能够帮助我实现这一功能的包(geopy)。但是,我需要将一个元组传递给geopy。如何在pandas中对数据框中的所有记录应用此函数(geopy)?
我建议您使用pyproj而不是geopy。geopy依赖于在线服务,而pyproj是本地的(这意味着它将更快,并且不依赖于互联网连接),并且对其方法更透明(例如,请参见此处),其基于Proj4代码库,该库是所有开源GIS软件以及您可能使用的许多Web服务的基础。
#!/usr/bin/env python3
import pandas as pd
import numpy as np
from pyproj import Geod
wgs84_geod = Geod(ellps='WGS84') #Distance will be measured on this ellipsoid - more accurate than a spherical method
#Get distance between pairs of lat-lon points
def Distance(lat1,lon1,lat2,lon2):
az12,az21,dist = wgs84_geod.inv(lon1,lat1,lon2,lat2) #Yes, this order is correct
return dist
#Create test data
lat1 = np.random.uniform(-90,90,100)
lon1 = np.random.uniform(-180,180,100)
lat2 = np.random.uniform(-90,90,100)
lon2 = np.random.uniform(-180,180,100)
#Package as a dataframe
df = pd.DataFrame({'lat1':lat1,'lon1':lon1,'lat2':lat2,'lon2':lon2})
#Add/update a column to the data frame with the distances (in metres)
df['dist'] = Distance(df['lat1'].tolist(),df['lon1'].tolist(),df['lat2'].tolist(),df['lon2'].tolist())
PyProj有一些文档在这里。
从geopy的文档中得知:https://pypi.python.org/pypi/geopy。您可以通过以下方式来实现:
from geopy.distance import vincenty
# Define the two points
start = (start_latitute, start_longitude)
stop = (stop_latitude, stop_longitude)
# Print the vincenty distance
print(vincenty(start, stop).meters)
# Print the great circle distance
print(great_circle(start, stop).meters)
结合Pandas使用。假设您有一个数据框df
。我们首先创建函数:
def distance_calc (row):
start = (row['start_latitute'], row['start_longitude'])
stop = (row['stop_latitude'], row['stop_longitude'])
return vincenty(start, stop).meters
然后将其应用到数据帧:
df['distance'] = df.apply (lambda row: distance_calc (row),axis=1)
C. F. F. Karney, Algorithms for gedesics, J. Geodesy ‘’‘87’‘’(1), 43-55 (2013), DOI: 10.1007/s00190-012-0578-z; geo-addenda.html.
- Richard