Pandas纬度经度转换为相邻行之间的距离

24

我在Python 2.7的Pandas DataFrame中有以下内容:

Ser_Numb        LAT      LONG
       1  74.166061 30.512811
       2  72.249672 33.427724
       3  67.499828 37.937264
       4  84.253715 69.328767
       5  72.104828 33.823462
       6  63.989462 51.918173
       7  80.209112 33.530778
       8  68.954132 35.981256
       9  83.378214 40.619652
       10 68.778571 6.607066

我想计算数据框中相邻行之间的距离。输出结果应该类似于这个样子:

Ser_Numb          LAT        LONG   Distance
       1    74.166061   30.512811          0
       2    72.249672   33.427724          d_between_Ser_Numb2 and Ser_Numb1
       3    67.499828   37.937264          d_between_Ser_Numb3 and Ser_Numb2
       4    84.253715   69.328767          d_between_Ser_Numb4 and Ser_Numb3
       5    72.104828   33.823462          d_between_Ser_Numb5 and Ser_Numb4
       6    63.989462   51.918173          d_between_Ser_Numb6 and Ser_Numb5
       7    80.209112   33.530778   .
       8    68.954132   35.981256   .
       9    83.378214   40.619652   .
       10   68.778571   6.607066    .

尝试

这篇文章看起来有些类似,但它计算的是固定点之间的距离。我需要相邻点之间的距离。

我尝试将其调整为以下内容:

df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG'])
df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1))
df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1))
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

然而,我收到了以下错误:

Traceback (most recent call last):
  File "C:\Python27\test.py", line 115, in <module>
    df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper
    "{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
[Finished in 2.3s with exit code 1]

这个错误已经通过MaxU的评论得到修复。修复后,这个计算的输出结果不合理——距离近乎8000千米:

   Ser_Numb        LAT       LONG   LAT_rad   LON_rad      dLON      dLAT     distance
0         1  74.166061  30.512811  1.294442  0.532549       NaN       NaN          NaN
1         2  72.249672  33.427724  1.260995  0.583424  0.574129  1.238402  8010.487211
2         3  67.499828  37.937264  1.178094  0.662130  0.651947  1.156086  7415.364469
3         4  84.253715  69.328767  1.470505  1.210015  1.198459  1.449943  9357.184623
4         5  72.104828  33.823462  1.258467  0.590331  0.569212  1.232802  7992.087820
5         6  63.989462  51.918173  1.116827  0.906143  0.895840  1.094862  7169.812123
6         7  80.209112  33.530778  1.399913  0.585222  0.569407  1.380421  8851.558260
7         8  68.954132  35.981256  1.203477  0.627991  0.617777  1.179044  7559.609520
8         9  83.378214  40.619652  1.455224  0.708947  0.697986  1.434220  9194.371978
9        10  68.778571   6.607066  1.200413  0.115315  0.102942  1.175014          NaN

根据:

  • 使用这个在线计算器: 如果我使用 纬度1 = 74.166061, 经度1 = 30.512811, 纬度2 = 72.249672, 经度2 = 33.427724, 那么得到的距离是233公里
  • 这里找到的haversine函数为: print haversine(30.512811, 74.166061, 33.427724, 72.249672),然后我得 到232.55公里

答案应该是233公里,但我的方法却给出了约8000公里。我认为我在尝试迭代相邻行时出现了问题。

问题:有没有一种方法可以在Pandas中完成此操作?还是需要逐行循环数据帧?

额外信息:

要创建上面的DF,请选择它并复制到剪贴板。然后:

import pandas as pd
df = pd.read_clipboard()
print df

2
尝试将 math.cos 替换为 np.cos - MaxU - stand with Ukraine
1个回答

77

你可以使用这个优秀的解决方案 (c) @derricw(别忘了点赞 ;-)):

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of https://dev59.com/1l0b5IYBdhLWcg3wJ-Xb#29546836

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))


df['dist'] = \
    haversine(df.LAT.shift(), df.LONG.shift(),
                 df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])

结果:

In [566]: df
Out[566]:
   Ser_Numb        LAT       LONG         dist
0         1  74.166061  30.512811          NaN
1         2  72.249672  33.427724   232.549785
2         3  67.499828  37.937264   554.905446
3         4  84.253715  69.328767  1981.896491
4         5  72.104828  33.823462  1513.397997
5         6  63.989462  51.918173  1164.481327
6         7  80.209112  33.530778  1887.256899
7         8  68.954132  35.981256  1252.531365
8         9  83.378214  40.619652  1606.340727
9        10  68.778571   6.607066  1793.921854

更新:这将有助于理解逻辑:

In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
           0          1
0        NaN        NaN
1  74.166061  72.249672
2  72.249672  67.499828
3  67.499828  84.253715
4  84.253715  72.104828
5  72.104828  63.989462
6  63.989462  80.209112
7  80.209112  68.954132
8  68.954132  83.378214
9  83.378214  68.778571

1
我仍然遇到“TypeError:无法将系列转换为<type'float'> [Finished in 2.3s with exit code 1]”的错误。此外,我似乎在逻辑上遇到了麻烦。(A)为什么您在没有任何参数的情况下使用了.shift(),(B)使用df.ix[1:,'LONG']从第二行开始有理由吗?-为什么不使用df.ix[:,'LONG']并尝试通过shift(#)进行更正? - edesz
1
@WR,shift() == shift(1)1是默认值)。请检查更新 - 它将显示将传递给函数的参数对... - MaxU - stand with Ukraine
谢谢。好的,代码可以运行了,我不再遇到那个 TypeError 错误了。另外,感谢你在回答中的 更新。这很有帮助。我遇到的问题是不知道如何将移位后的值与原始值结合起来。感谢你的解释。 - edesz
@WR,当然,很高兴我能帮到你。 - MaxU - stand with Ukraine
@MaxU 谢谢您的解决方案!只有一个问题:当 df 只有 2 行时,为什么在 numpy.radians([lat1,lon1,lat2,lon2]) 这一行会出现错误?替代方案 map(numpy.radians,[lat1,lon1,lat2,lon2]) 是如何工作的,而且速度更快? - Leo

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接