如何消除BeautifulSoup用户警告？

Question

如何消除BeautifulSoup用户警告？

75

在我安装了BeautifulSoup之后，每当我从命令行运行Python时，就会出现以下警告：

D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166:
UserWarning: No parser was explicitly specified, so I'm using the best 
available HTML parser for this system ("html.parser"). This usually isn't a
problem, but if you run this code on another system, or in a different
virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

我不知道为什么会出现这个问题，也不知道如何解决它。

- jellyfishhuang

16

这条消息告诉你需要做什么：BeautifulSoup([你的标记], "html.parser")。你已经这样做并查看了输出结果吗？BeautifulSoup试图让你的生活更轻松，听从它的建议吧。 :) - idjaw

6

将您的代码从soup = BeautifulSoup(html)更改为soup = BeautifulSoup(html, "html.parser")。 - Remi Guan

4个回答

23

文档建议您安装并使用lxml以提高速度。

BeautifulSoup(html, "lxml")

如果你正在使用的Python版本早于2.7.3，或者早于3.2.2的Python 3版本，则必须安装lxml或html5lib - 在旧版本中，Python内置的HTML解析器并不太好。

安装LXML解析器

在Ubuntu（Debian）上
```
apt-get install python-lxml 
```
Fedora（基于RHEL）
```
dnf install python-lxml
```
使用 PIP
```
pip install lxml
```

- Gayan Weerakutti

1

或者 apt-get install python3-lxml。 - Andriy Makukha

13

我认为之前的帖子没有回答这个问题。

确实，正如大家所说，您可以通过指定解析器来消除警告。
而且正如文档所指出的那样，这是一种最佳性能实践 ¹ 和一致性实践 ²。

但在某些情况下，您想要消除警告... 因此写下了这篇文章。

since BeautifulSoup 4 rev 460, the warning message does not appear in interactive (REPL) mode
there are more generalist answers at: How to disable Python warnings? to control Python warnings (TL;DL: PYTHONWARNINGS=ignore or -Wignore)

suppressing the warning explicitly (bs4 ≥ rev 569) by adding to your code:

import warnings
from bs4 import GuessedAtParserWarning
warnings.filterwarnings('ignore', category=GuessedAtParserWarning)

cheating by letting bs4 think you provided the parser, i.e.:

bs4.BeautifulSoup(
  your_markup,
  builder=bs4.builder_registry.lookup(*bs4.BeautifulSoup.DEFAULT_BUILDER_FEATURES)
)

- bufh

引用PEP-20中的第二点：“明确胜于含蓄”。当正确的修复方法可以用更少的代码实现时，不要隐藏警告。在IDE或代码编辑器中添加“html5lib”或“html.parser”是微不足道的，而在命令行上几乎微不足道。只需修复问题，不要隐藏症状。 - Mike 'Pomax' Kamermans

2

谢谢！有时候一个库会不停地发出警告，过滤它是解决方案。 - sor.rge

1

有时候你实际上是在解析XHTML，它毕竟是一个XML文档。因此指定lxml是正确的，但你仍然会收到警告。 - Craig Trader

6

对于HTML解析器，您需要安装html5lib，运行以下命令：

pip install html5lib

然后在BeautifulSoup方法中添加html5lib：

htmlDoc = bs4.BeautifulSoup(req1.text, 'html5lib')
print(htmlDoc)

- Wilson Wu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ethan Bierlein · Accepted Answer

你的问题的解决方案已经在错误信息中明确说明了。像下面这样的代码没有指定XML/HTML等解析器。

BeautifulSoup( ... )

为了修复错误，您需要指定要使用的解析器，如下所示：

BeautifulSoup( ..., "html.parser" )

如果您愿意，您还可以安装第三方解析器。