Python的multiprocessing.Pool.imap
非常方便,可以逐行处理大型文件:
import multiprocessing
def process(line):
processor = Processor('some-big.model') # this takes time to load...
return processor.process(line)
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
with open('lines.txt') as infile, open('processed-lines.txt', 'w') as outfile:
for processed_line in pool.imap(process, infile):
outfile.write(processed_line)
如何确保像上面示例中的Processor
这样的帮助程序仅加载一次?是否有可能在不使用涉及队列的更复杂/冗长结构的情况下实现?
Pool
参数initializer 的用途(示例)。 - Darkonaut