Python: 使用 asyncio 的 Pyppeteer

Question

Python: 使用 asyncio 的 Pyppeteer

3

我正在进行一些测试，我想知道下面的脚本是否在异步运行？

# python test.py  It took 1.3439464569091797 seconds.

31（网站）x 1.34 = 41.54秒 - 因此理论上它所需要的时间应该只与最长的请求一样长？

# python test.py  It took 28.129364728927612 seconds.

也许在这里打开浏览器不是异步的，我应该使用执行器（executor）吗？executor

# cat test.py 
import asyncio
import time

from pyppeteer import launch
from urllib.parse import urlparse

WEBSITE_LIST = [
    'http://envato.com',
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://internet.org',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://yiiframework.com',
    'http://shopify.com',
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://yandex.ru',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

start = time.time()

async def fetch(url):
    browser = await launch(headless=True, args=['--no-sandbox'])
    page = await browser.newPage()
    await page.goto(f'{url}', {'waitUntil': 'load'})
    await page.screenshot({'path': f'img/{urlparse(url)[1]}.png'})
    await browser.close()

async def run():
    tasks = []

    for url in WEBSITE_LIST:
        task = asyncio.ensure_future(fetch(url))
        tasks.append(task)

    responses = await asyncio.gather(*tasks)
    #print(responses)

#asyncio.get_event_loop().run_until_complete(fetch('http://yahoo.com'))
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)

print(f'It took {time.time()-start} seconds.')

- HTF

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Fantix King · Accepted Answer

根据pyppeteer源代码，它使用没有管道的子进程来管理Chromium进程，并使用websockets进行通信，因此是异步的。

如果您有31个站点，则将拥有31个+1个进程。因此，除非您有32个核心的CPU（可能还有线程、系统进程、锁、超线程和所有不同影响结果的因素，因此这只是一个不精确的示例），否则它不会完全并行执行。因此，我认为瓶颈是CPU打开浏览器，渲染网页并转储到图像中。使用执行程序是没有帮助的。

但是，它仍然是异步的。这意味着您的Python进程没有被阻塞，您仍然可以同时运行其他代码或等待网络结果。只是当CPU被其他进程完全加载时，Python进程更难“窃取”CPU时间。