美丽汤 - 如何获取 href

Question

美丽汤 - 如何获取 href

3

我似乎无法从以下HTML代码中提取href（该页面上只有一个<strong>网站:</strong>）：

<div id='id_Website'>
<strong>Website:</strong> 
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a>
</div></div><div>

这是我认为应该有效的方法。

href = soup.find("strong" ,text=re.compile(r'Website')).next["href"]

- howtodothis

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mark Longair · Accepted Answer

在这种情况下，.next 是一个包含 <strong> 标签和 <a> 标签之间空格的 NavigableString。此外，text= 属性用于匹配 NavigableString，而不是元素。

以下代码应该可以实现你想要的功能：

import re
from BeautifulSoup import BeautifulSoup

html = '''<div id='id_Website'>
<strong>Website:</strong> 
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a>
</div></div><div>'''

soup = BeautifulSoup(html)

for t in soup.findAll(text=re.compile(r'Website:')):
    # Find the parent of the NavigableString, and see
    # whether that's a <strong>:
    s = t.parent
    if s.name == 'strong':
        print s.nextSibling.nextSibling['href']

...但这并不是非常健壮的。如果包含的div有可预测的ID，那么最好找到它，然后在其中找到第一个<a>元素。