正则表达式:捕获两个引号之间的部分

4
我似乎无法正确使用正则表达式来捕获引号之间的短语。例如,在加粗的文本中(注意:输入具有字符串前后):

“我可以理解你的想法。”我说道。 “当然,在你作为非官方顾问和帮助者的职位上,在三个大陆范围内,你接触到所有奇怪和离奇的事情。但是在这里”

“当然,在你作为非官方顾问和帮助者的职位上,在三个大陆范围内,你接触到所有奇怪和离奇的事情。但是在这里” ——我从地上捡起早报—— “让我们进行实际测试。这是我遇到的第一个标题。‘丈夫对妻子的残忍行为。’有半栏印刷文字,但我知道它对我来说都是非常熟悉的。当然,还有其他女人,饮料,推动,打击,淤伤,同情的姐妹或房东。最简单的作家也不能再编造更粗俗的东西了。”

我尝试获取引号之前和之后的文本,但无法获得所需的输出。必须有一种方法来分组正则表达式,以便我可以捕获引号之间的字符串以及周围的两个引号

尝试过:

import re

def get_quotes(paragraph):
    quote_rx = r'''([""])(?:(?=(\\?))\2.)*?\1'''
    return [i.group(0) for i in \
           re.finditer(quote_rx, paragraph, re.S)]

def get_said(paragraph, quote):
    quote_start = paragraph.index(quote)
    quote_end = quote_start + len(quote)
    before = paragraph[:quote_start]
    after = paragraph[quote_end:]
    return before, after


paragraphs = ['''I smiled and shook my head. "I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."''', 
'''Such was the remarkable narrative to which I listened on that April evening -- a narrative which would have been utterly incredible to me had it not been confirmed by the actual sight of the tall, spare figure and the keen, eager face, which I had never thought to see again. In some manner he had learned of my own sad bereavement, and his sympathy was shown in his manner rather than in his words. "Work is the best antidote to sorrow, my dear Watson," said he, "and I have a piece of work for us both to-night which, if we can bring it to a successful conclusion, will in itself justify a man's life on this planet." In vain I begged him to tell me more. "You will hear and see enough before morning," he answered. "We have three years of the past to discuss. Let that suffice until half-past nine, when we start upon the notable adventure of the empty house."''']

for p in paragraphs:
    saids = set()
    for i in get_quotes(p):
        b,a = get_said(p,i)
        print b
        print a
        print

期望输出:

in-btw: I said.
quotes: ["I can quite understand your thinking so.","Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"]
section: "I can quite understand your thinking so." **I said.** "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"


in-btw: --I picked up the morning paper from the ground--
quotes: ['''"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"''', '''"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."''']
section: "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"**--I picked up the morning paper from the ground--**"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."

([^"]*"[^"]*")+ 应该可以工作(假设您从引号外部开始)。 [^"]* 适用于外部,"[^"]*" 适用于内部。 - Danstahr
1
请展示您尝试过的代码和期望的输出,以便我们更好地帮助您。 - Games Brainiac
1个回答

2

非常简单,你需要的正则表达式是r'^("[^"]+")([^"]+)("[^"]+")'

import re

s = """
"I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"

"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."
"""

for segment in s.splitlines():
    if not segment:
        continue
    first, said, second = re.match(r'^("[^"]+")([^"]+)("[^"]+")', segment).groups()
    print first
    print said
    print second

>>> 
"I can quite understand your thinking so."
 I said. 
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
--I picked up the morning paper from the ground--
"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."

谢谢@Inbar,正则表达式非常好用。有趣的是,你尝试在原始帖子中使用这个正则表达式了吗?我得到了“'NoneType' object has no attribute 'groups'” 的错误提示。 - alvas
是因为 ^("[^"]+") 表示引号应该在句子开头吗?我确实在句子开头有一些噪音。 - alvas
1
然后删除第一个 ^,它表示字符串/行/匹配的开头,而不是使用 re.match,可以使用 re.search。下次请以正确的形式包含您需要解决问题的实际数据,否则您将无法获得适合您的答案。 - Inbar Rose
为什么我在去掉 ^ 后,无法使用 re.match 匹配分组? - alvas
1
因为re.match从字符串的开头开始查找,而re.search会在字符串中搜索直到它可以开始匹配...下次请阅读文档。 - Inbar Rose
1
啊哈,去掉 ^ 并使用 re.search 就可以了 =) 谢谢!! - alvas

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接