我想使用多进程对大量地址进行地理编码。这是我的代码:
import multiprocessing
import geocoder
addresses = ['New York City, NY','Austin, TX', 'Los Angeles, CA', 'Boston, MA'] # and on and on
def geocode_worker(address):
return geocoder.arcgis(address)
def main_process():
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
return pool.map(geocode_worker, addresses)
if __name__ == '__main__':
main_process()
但是它给我返回了这个错误:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/anaconda3/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
task = get()
File "/opt/anaconda3/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 599, in __getattr__
if not self.ok:
File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 536, in ok
return len(self) > 0
File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 422, in __len__
return len(self._list)
错误的最后三行反复出现,然后回溯信息的最后一行是:
RecursionError: maximum recursion depth exceeded while calling a Python object
有人可以帮我弄清楚为什么吗?
multiprocessing
尝试在主进程中反序列化由geocoder.arcgis
返回的结果时,问题就会发生。geocoder
中存在一个错误导致无限循环。 - danogeocoder.arcgis
返回的对象无法被 pickle 序列化。您可以通过执行pickle.loads(pickle.dumps(<geocoder.arcgs()返回的对象>))
来重现它。最简单的解决方法是尽可能从返回的内容中提取所需的数据。 - dano