Python - 使用BeautifulSoup和Urllib进行网络爬虫

3

我正在尝试阅读网站,但不幸的是出现了一些问题。

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('https://csgoempire.com/withdraw').read()
soup = bs.BeautifulSoup(sauce,'lxml')

print(soup.find_all('p'))

错误:

Traceback (most recent call last):
  File "F:/Informatika/Python3X/GamblinSitesBot/GamblingSitesBot.py", line 4, in <module>
    sauce = urllib.request.urlopen('https://csgoempire.com/').read()
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Process finished with exit code 1

此外,这段代码还可以与其他网站(如google.com)配合使用。

2
我认为这不是完整的堆栈跟踪?如果是,请提供完整的错误信息。 - DeepSpace
似乎该URL需要身份验证。会抛出403错误。 - sytech
你有代理吗? - Hari
可能是Python 3网络爬虫中的HTTP错误403的重复问题。 - Abhishek Keshri
1个回答

5

您可以使用 request 库来实现相同的功能。这个方法很好用。

import bs4 as bs
import requests

sauce = requests.get('https://csgoempire.com/withdraw')
soup = bs.BeautifulSoup(sauce.content,'html.parser')
print(soup.find_all('p'))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接