美丽汤（Beautiful Soup）使用类“Contains”还是正则表达式？

Question

美丽汤（Beautiful Soup）使用类“Contains”还是正则表达式？

43

如果我的类名经常不同，例如：

listing-col-line-3-11 dpt 41
listing-col-block-1-22 dpt 41
listing-col-line-4-13 CWK 12

通常我可以做到：

for EachPart in soup.find_all("div", {"class" : "ClassNamesHere"}):
            print EachPart.get_text()

这里有太多的类名需要处理，因此其中许多都不符合要求。

我知道Python没有我通常使用的“.contains”，但它确实有一个“in”。虽然我还没有找到一种将其结合起来的方法。

我希望能用正则表达式完成这个任务。尽管我的Python语法真的让我失望了，我一直在尝试各种变化：

regex = re.compile('.*listing-col-.*')
    for EachPart in soup.find_all(regex):

但是那似乎并没有起作用。

- PoweredByCoffee

3个回答

33

您可以尝试使用以下for循环：

regex = re.compile('.*listing-col-.*')
for EachPart in soup.find_all("div", {"class" : regex}):
        print EachPart.get_text()

- Walid Saad

4

你可以使用gazpacho进行部分匹配，从而避免使用正则表达式...

html = """\
<div class="listing-col-line-3-11 dpt 41">A</div>
<div class="listing-col-block-1-22 dpt 41">B</div>
<div class="listing-col-line-4-13 CWK 12">C</div>
"""

部分匹配代码：

from gazpacho import Soup

soup = Soup(html)
divs = soup.find("div", {"class": "listing-col-"}, partial=True)
[div.text for div in divs]

输出：

['A', 'B', 'C']

- emehex

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mfitzp · Accepted Answer

BeautifulSoup支持CSS选择器，这使您可以根据特定属性的内容选择元素。其中包括使用*=选择器以获取包含内容的属性。

以下内容将返回所有带有包含文本“listing-col-”的class属性的div元素：

for EachPart in soup.select('div[class*="listing-col-"]'):
    print EachPart.get_text()