Python中与PHP的preg_match函数对应的是什么？

Question

Python中与PHP的preg_match函数对应的是什么？

16

我计划将我的一个网页爬虫转移到Python。在PHP中，我喜欢使用 preg_match 和 preg_match_all 函数。然而在Python中并没有像 preg_match 这样的适用函数。请问有谁能帮我吗？

例如，如果我想要获取 <a class="title" 和 </a> 之间的内容，在PHP中我会使用以下函数：

preg_match_all('/a class="title"(.*?)<\/a>/si',$input,$output);

然而在Python中我无法找到类似的函数。

- funnyguy

1

这是Python正则表达式文档：http://docs.python.org/howto/regex.html - Ben Lee

2

在Python中，我们不使用正则表达式来解析HTML，我们使用BeautifulSoup。请参见https://dev59.com/X3I-5IYBdhLWcg3wq6do#1732454。 - johnsyweb

3个回答

5

我认为你需要类似这样的东西：

output = re.search('a class="title"(.*?)<\/a>', input, flags=re.IGNORECASE)
    if output is not None:
        output = output.group(0)
        print(output)

您可以在正则表达式开头添加(?s)以启用多行模式：

output = re.search('(?s)a class="title"(.*?)<\/a>', input, flags=re.IGNORECASE)
    if output is not None:
        output = output.group(0)
        print(output)

- Vasin Yuriy

2

您可能对阅读关于Python正则表达式操作感兴趣。

- Tudor Constantin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- RanRag · Accepted Answer

你正在寻找Python的re模块。

看一下re.findall和re.search。

如果你提到你正在尝试解析HTML，请使用HTML解析器。在Python中有几个选项可用，如lxml或BeautifulSoup。

看看这个为什么你不应该使用正则表达式解析HTML。