如何使用BeautifulSoup从内联样式中提取CSS属性

Question

如何使用BeautifulSoup从内联样式中提取CSS属性

9

我有一个类似这样的东西：

<img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/>

我正在使用beautifulsoup解析HTML。有没有办法提取“background”CSS属性中的“url”？

- thegreyspot

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matt Luongo · Accepted Answer

你有两个选择——一种是快速而不太严谨的方法，另一种是正确的做法。快速而不太严谨的方法（如果标记更改将很容易出错）如下：

>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>')
>>> style = soup.find('img')['style']
>>> urls = re.findall('url\((.*?)\)', style)
>>> urls
[u'/theRealImage.jpg']

显然，你需要尝试一下才能使它适用于多个img标签。

正确的方法，因为我会觉得糟糕建议某人在CSS字符串上使用正则表达式：），使用CSS解析器。我刚在Google上发现了一个名为cssutils的库，可以在PyPi上获取，看起来可能可以胜任这项工作。