Python中geopy geocoder的超时错误

3
我是一个相对新的Python用户,正在尝试使用“geopy”模块的函数返回城市和国家的纬度和经度。我因为城市名称拼写错误而遇到了错误,但现在遇到了超时错误。我阅读了这个问题Geopy: catch timeout error并相应地调整了超时参数。然而,它现在会在不同的时间长度内运行,然后我会得到一个超时错误。我尝试在更快的网络上运行它,并且它在一定程度上起作用。问题是我需要对10万行数据做这个操作,而最大迭代行数为2万,导致程序超时。非常感谢任何帮助和建议来解决这个问题。
import os
from geopy.geocoders import Nominatim
os.getcwd() #check current working directory
os.chdir("C:\Users\Philip\Documents\HDSDA1\Project\Global Terrorism Database")

#import file as a csv
import csv
gtd=open("gtd_original.csv","r")
csv_f=csv.reader(gtd)
outf=open("r_ready.csv","wb")
writer=csv.writer(outf,dialect='excel')
for row in csv_f:
    if row[13] in ("","NA") or row[14] in ("","NA"):   
        lookup = row[12] + "," + row[8]  # creates a city,country
        geolocator = Nominatim()
        location = geolocator.geocode(lookup, timeout = None) #looks up the city/country on maps
        try:
            location.latitude
        except:
            lookup = row[8]
            location = geolocator.geocode(lookup) 
        row[13] = location.latitude
        row[14] = location.longitude
    writer.writerow(row)      
gtd.close()
outf.close()
2个回答

5

我希望您能够遵守 Nominatim 服务的使用政策 (http://wiki.openstreetmap.org/wiki/Nominatim_usage_policy)。请在请求之间间隔1秒钟,并缓存结果,可能会有很多重复。

等待部分:

from time import sleep
### your code
row[14] = location.longitude
sleep(1) # after last line in if

缓存:

coords = {}
if coords.has_key([row[8], row[12] ]):
    row[13] , row[14] = coords[ [ row[8], row[12] ] ]
else:
    #geolocate

更新

性能:每秒1个请求 --> 每小时3600个请求 --> 每10小时36,000个请求

import os
from time import sleep
from geopy.geocoders import Nominatim
os.getcwd() #check current working directory
os.chdir("C:\Users\Philip\Documents\HDSDA1\Project\Global Terrorism Database")

#import file as a csv
import csv
gtd=open("gtd_original.csv","r")
csv_f=csv.reader(gtd)
outf=open("r_ready.csv","wb")
writer=csv.writer(outf,dialect='excel')
coords = {}
for row in csv_f:
    if row[13] in ("","NA") or row[14] in ("","NA"):   
        lookup = row[12] + "," + row[8]  # creates a city,country

        if coords.has_key( (row[8], row[12]) ):   ## test if result is already cached
            row[13] , row[14] = coords[ (row[8], row[12]) ]
        else:    
            geolocator = Nominatim()
            location = geolocator.geocode(lookup, timeout = None) #looks up the city/country on maps
            try:
                location.latitude
            except:
                lookup = row[8]
                location = geolocator.geocode(lookup) 
            row[13] = location.latitude
            row[14] = location.longitude
            coords[ (row[8], row[12]) ] = (row[13] , row[14])  # cache the new coords
            sleep(1) # sleep for 1 sec (required by Nominatim usage policy)

    writer.writerow(row)      
gtd.close()
outf.close()

谢谢您的快速回答。您能解释一下如何在结果之间加入延迟吗?并提供一个我可以实现它的示例吗?我感觉有点力不从心。 - pitts_brother
1
感谢您详细的回答和花费时间编写此答案。不幸的是,我仍然使用此代码收到了超时错误。 - pitts_brother
@pitts_brother 我猜你已经知道了,但是:导入时间模块(import time),然后在你的循环中使用 - time.sleep(秒数)。 - Ben Love

5
您可以使用 GeocoderTimedOut。下面是一个示例函数,可以帮助您。
import geopy
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

def do_geocode(address):
    geopy = Nominatim()
    try:
        return geopy.geocode(address)
    except GeocoderTimedOut:
        return do_geocode(address)

如果超时发生,则会进行重试,这很简单。希望能对您有所帮助。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接