使用BeautifulSoup获取标签内的所有内容

Question

使用BeautifulSoup获取标签内的所有内容

3

我正在尝试获取位于文章标签内的所有内容，例如http://magazine.magix.com/de/5-tipps-fuer-die-fotobearbeitung/。

然而，在使用

时遇到了一些问题。

print soup.article

它只会执行到“...Foto auf verschiedene Art und Weise und für verschiedene Zwecke bearbeiten. ”

完整代码：

from bs4 import BeautifulSoup
import requests

request_page = requests.get('http://magazine.magix.com/de/5-tipps-fuer-die-fotobearbeitung/', 'html.parser')
source = request_page.text
soup = BeautifulSoup(source, "html.parser")
print soup.article.text

我怎么能得到所有的东西？

- eLudium

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Arount · Accepted Answer

好的，终于找到了。欢迎来到令人惊叹的爬虫世界。

在 <article> 标签内，存在一些 </br> 标签，这个人肯定是想说 <br/>。

无论如何，它会破坏 HTML 的流程，所以 BS 会很难解析它。

以下是我解决的方法:

from bs4 import BeautifulSoup
import requests

request_page = requests.get('http://magazine.magix.com/de/5-tipps-fuer-die-fotobearbeitung/', 'html.parser')
source = request_page.text
source = source.replace('</br>', '<br/>')
soup = BeautifulSoup(source, "html.parser")
print soup.article

我把 </br> 替换成了 <br/>...

这是一个非常好的爬虫教程，这种内容很多，可以指望它 :)