按逗号分割字符串，但忽略括号内的逗号。

Question

按逗号分割字符串，但忽略括号内的逗号。

3

我正在尝试使用Python按逗号拆分字符串：

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"

但是我希望忽略方括号 [] 内的任何逗号。因此，上述内容的结果将是：

["year:2020", "concepts:[ab553,cd779]", "publisher:elsevier"]

有人对如何做到这一点有建议吗？我尝试使用 re.split，像这样：

params = re.split(",(?![\w\d\s])", param)

但它不能正常工作。

- Casey

5个回答

1

这个正则表达式适用于你的示例：

,(?=[^,]+?:)

在这里，我们使用正向预查来查找逗号后面跟着非逗号和冒号字符，然后是一个冒号。这样可以正确地找到您正在搜索的<comma><key>模式。当然，如果键允许有逗号，那么这就需要进一步进行调整。

您可以在此处检查regexrhere

- pvandyken

0

我改编了@Bemwa的解决方案（它对我的用例无效）

def split_by_commas(s):
    lst = list()
    brackets = 0
    word = ""
    for c in s:
        if c == "[":
            brackets += 1
        elif c == "]":
            if brackets > 0:
                brackets -= 1
        elif c == "," and not brackets:
            lst.append(word)
            word = ""
            continue
        word += c
    lst.append(word)
    return lst

- mnieber

0

你可以使用用户定义函数来解决这个问题，而不是使用 split 函数：

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"


def split_by_commas(s):
    lst = list()
    last_bracket = ''
    word = ""
    for c in s:
        if c == '[' or c == ']':
            last_bracket = c
        if c == ',' and last_bracket == ']':
            lst.append(word)
            word = ""
            continue
        elif c == ',' and last_bracket == '[':
            word += c
            continue
        elif c == ',':
            lst.append(word)
            word = ""
            continue
        word += c
    lst.append(word)
    return lst
main_lst = split_by_commas(s)

print(main_lst)

以上代码运行的结果为：

['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

- Bemwa Malak

0

使用仅具有前瞻性的模式来断言右侧的字符，如果左侧存在相应的字符，则不会断言。

您可以匹配方括号之间的一个或多个值的重复项，或匹配除逗号以外的任何字符，而不是使用split。

(?:[^,]*\[[^][]*])+[^,]*|[^,]+

正则表达式演示

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
params = re.findall(r"(?:[^,]*\[[^][]*])+[^,]*|[^,]+", s)
print(params)

输出

['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

- The fourth bird

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dean Taylor · Accepted Answer

result = re.split(r",(?!(?:[^,\[\]]+,)*[^,\[\]]+])", subject, 0)

,                 # Match the character “,” literally
(?!               # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
   (?:               # Match the regular expression below
      [^,\[\]]          # Match any single character NOT present in the list below
                           # The literal character “,”
                           # The literal character “[”
                           # The literal character “]”
         +                 # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ,                 # Match the character “,” literally
   )
      *                 # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   [^,\[\]]          # Match any single character NOT present in the list below
                        # The literal character “,”
                        # The literal character “[”
                        # The literal character “]”
      +                 # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   ]                 # Match the character “]” literally
)

更新以支持括号中超过2个项目的情况。例如：

year:2020,concepts:[ab553,cd779],publisher:elsevier,year:2020,concepts:[ab553,cd779,xx345],publisher:elsevier