Python库美丽汤（beautiful soup）如何使用aiohttp？

Question

Python库美丽汤（beautiful soup）如何使用aiohttp？

11

有人知道如何做：

import html5lib
import urllib
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib.request.urlopen('http://someWebSite.com').read().decode('utf-8'), 'html5lib')

使用aiohttp代替urllib？

谢谢 ^^

- APianist

我很好奇你为什么想要这样做。 - Bill Bell

2

因为urllib是阻塞的，我需要一个非阻塞的库。 - APianist

我无法直接回答你的问题。不过，我了解到阻塞也可能会因为超时而出现。你可能会对这个页面感兴趣：http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests（“阻塞或非阻塞？”和“超时”）。 - Bill Bell

2个回答

9

针对正在寻找更多答案的人：

还有一种运行循环同步代码的方式：loop.run_in_executor。

详见文档：https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

示例代码：

import asyncio
import time

def blocking_func():
    time.sleep(5)
    return 42

async def main(loop):
    result = await loop.run_in_executor(None, blocking_func)
    return result

loop = asyncio.get_event_loop()
loop_result = loop.run_until_complete(main(loop))
print(loop_result) # => 42

因此，您可以像使用协程一样等待任务await

- Deathik

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Yuval Pruss · Accepted Answer

你可以这样做：

import asyncio
import aiohttp
import html5lib
from bs4 import BeautifulSoup

SELECTED_URL = 'http://someWebSite.com'

async def get_site_content():
    async with aiohttp.ClientSession() as session:
        async with session.get(SELECTED_URL) as resp:
            text = await resp.read()

    return BeautifulSoup(text.decode('utf-8'), 'html5lib')

loop = asyncio.get_event_loop()
sites_soup = loop.run_until_complete(get_site_content())
print(sites_soup)
loop.close()