Python,调用进程池而不阻塞事件循环

3

如果我运行以下代码:

import asyncio
import time
import concurrent.futures

def cpu_bound(mul):
    for i in range(mul*10**8):
        i+=1
    print('result = ', i)
    return i

async def say_after(delay, what):
    print('sleeping async...')
    await asyncio.sleep(delay)
    print(what)

# The run_in_pool function must not block the event loop
async def run_in_pool():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        result = executor.map(cpu_bound, [1, 1, 1])

async def main():
    task1 = asyncio.create_task(say_after(0.1, 'hello'))
    task2 = asyncio.create_task(run_in_pool())
    task3 = asyncio.create_task(say_after(0.1, 'world'))

    print(f"started at {time.strftime('%X')}")
    await task1
    await task2
    await task3
    print(f"finished at {time.strftime('%X')}")

if __name__ == '__main__':
    asyncio.run(main())

输出结果为:
started at 18:19:28
sleeping async...
result =  100000000
result =  100000000
result =  100000000
sleeping async...
hello
world
finished at 18:19:34

这表明事件循环会阻塞直到CPU密集型任务(task2)完成,然后才会继续执行task3。如果我只运行一个CPU密集型任务(run_in_pool如下所示):
async def run_in_pool():
    loop = asyncio.get_running_loop()
    with concurrent.futures.ProcessPoolExecutor() as executor:
        result = await loop.run_in_executor(executor, cpu_bound, 1)

那么看起来事件循环没有被阻塞,因为输出结果是:

started at 18:16:23
sleeping async...
sleeping async...
hello
world
result =  100000000
finished at 18:16:28

如何在进程池中运行多个占用CPU的任务(在task2中),而不阻塞事件循环?


实际上,对于这个主题的正确问题应该是:如何模拟executor.map()方法,以便它可以被等待,从而不会阻塞事件循环。 - dimyG
1个回答

12

如您所发现,您需要使用asyncio自带的run_in_executor来等待提交的任务完成,而不会阻塞事件循环。Asyncio没有提供与map相当的功能,但模拟它并不难:

async def run_in_pool():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = [loop.run_in_executor(executor, cpu_bound, i)
                   for i in (1, 1, 1)]
        result = await asyncio.gather(*futures)

1
谢谢,这正是我要找的。实际上,在 run_in_pool 协程中,您需要使用 await 语句将控制权交还给事件循环。 - dimyG

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接