正则表达式捕获字符串部分

Question

正则表达式捕获字符串部分

pythonregexmarkdownregex-lookaroundsregex-group

3

我正在试图使用Python的re库在一个.md文档中获取顶级Markdown标题（即以单个井号开头的标题--# Introduction），但是我无法想出解决方法。

这是我尝试执行的代码：

import re

pattern = r"(# .+?\\n)"

text = r"# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"

header = re.search(pattern, text)
print(header.string)

print(header.string)的结果是：# 标题\n## 章节\n### 子章节#### 多么美好的一天.\n我只需要# 标题\n。虽然这个regex101的例子说这个应该可以工作，但我不知道为什么它不能工作。https://regex101.com/r/u4ZIE0/9。

- Garrett Edel

2个回答

1

我猜想我们希望提取# Title\n，在这种情况下，您的表达式似乎可以正常工作，只需稍作修改即可：

(# .+?\\n)(.+)

DEMO

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(# .+?\\n)(.+)"

test_str = "# Title\\n## Chapter\\n### sub-chapter#### The Bar\\nIt was a fall day.\\n"

subst = "\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 1)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

- Emma

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- The fourth bird · Accepted Answer

你得到这个结果是因为你使用了 header.string，它调用了.string在一个Match对象上，这会返回传递给match()或search()的字符串。

该字符串已经包含换行符：

text = r"# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"

因此，如果您使用该模式（请注意它也会匹配换行符），您可以将您的代码更新为：

import re

pattern = r"(# .+?\\n)"
text = r"# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"
header = re.search(pattern, text)
print(header.group())

Python演示

请注意，re.search会查找正则表达式第一次匹配的位置。

另一种匹配值的选项是从字符串开头匹配一个#后跟一个空格，然后匹配任何非换行符的字符，直到字符串结尾:

^# .*$

例如：

import re

pattern = r"^# .*$"
text = "# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"
header = re.search(pattern, text, re.M)
print(header.group())

Python演示

如果在#之后不能再有任何字符，你可以使用否定字符类来匹配不是#或换行符的内容：

^# [^#\n\r]+$