Python glob中的花括号扩展

Question

Python glob中的花括号扩展

21

我有Python 2.7，正在尝试执行以下命令：

glob('{faint,bright*}/{science,calib}/chip?/')

我没有找到匹配项，但是在 shell 中运行 echo {faint,bright*}/{science,calib}/chip? 命令会输出：

faint/science/chip1 faint/science/chip2 faint/calib/chip1 faint/calib/chip2 bright1/science/chip1 bright1/science/chip2 bright1w/science/chip1 bright1w/science/chip2 bright2/science/chip1 bright2/science/chip2 bright2w/science/chip1 bright2w/science/chip2 bright1/calib/chip1 bright1/calib/chip2 bright1w/calib/chip1 bright1w/calib/chip2 bright2/calib/chip1 bright2/calib/chip2 bright2w/calib/chip1 bright2w/calib/chip2

我的表达有什么问题？

- astabada

我认为glob模块不支持花括号，请参见http://bugs.python.org/issue9584。 - Andrew Clark

fnmatch 模块（由 glob 用于实现文件名匹配）远不如支持 {...} 大括号扩展语法那么复杂。 - Martijn Pieters

6个回答

8

{..} 被称为花括号扩展，是在globbing发生之前应用的单独步骤。

它不是glob的一部分，也不被python glob函数支持。

- that other guy

5

由于Python中的glob()不支持{}，因此您可能需要使用类似以下的方法：

import os
import re

...

match_dir = re.compile('(faint|bright.*)/(science|calib)(/chip)?')
for dirpath, dirnames, filenames in os.walk("/your/top/dir")
    if match_dir.search(dirpath):
        do_whatever_with_files(dirpath, files)
        # OR
        do_whatever_with_subdirs(dirpath, dirnames)

- DouglasDD

3

如其他答案所述，大括号扩展是glob的预处理步骤：您扩展所有大括号，然后在每个结果上运行glob。（大括号扩展将一个字符串转换为字符串列表。）

Orwellophile推荐使用braceexpand库。对我来说，这似乎太小的问题来证明依赖关系（尽管这是一个常见的问题，理想情况下应该打包在glob模块中的标准库中）。

因此，以下是一种用几行代码完成的方法。

import itertools
import re

def expand_braces(text, seen=None):
    if seen is None:
        seen = set()

    spans = [m.span() for m in re.finditer("\{[^\{\}]*\}", text)][::-1]
    alts = [text[start + 1 : stop - 1].split(",") for start, stop in spans]

    if len(spans) == 0:
        if text not in seen:
            yield text
        seen.add(text)

    else:
        for combo in itertools.product(*alts):
            replaced = list(text)
            for (start, stop), replacement in zip(spans, combo):
                replaced[start:stop] = replacement

            yield from expand_braces("".join(replaced), seen)

### testing

text_to_expand = "{{pine,}apples,oranges} are {tasty,disgusting} to m{}e }{"

for result in expand_braces(text_to_expand):
    print(result)

打印

pineapples are tasty to me }{
oranges are tasty to me }{
apples are tasty to me }{
pineapples are disgusting to me }{
oranges are disgusting to me }{
apples are disgusting to me }{

这里发生的事情是：

嵌套括号可能会产生非唯一结果，因此我们使用“seen”仅生成尚未被看到的结果。
“spans”是“text”中所有最内层、平衡括号的起始和停止索引。通过“[::-1]”切片反转顺序，使索引从最高到最低（稍后将相关）。
“alts”的每个元素都是逗号分隔的备选项对应的列表。
如果没有匹配项（“text”不包含平衡括号），则生成“text”本身，确保它通过“seen”是唯一的。
否则，使用“itertools.product”来迭代逗号分隔备选项的笛卡尔积。
用当前备选项替换花括号中的文本。由于我们在原地替换数据，所以它必须是可变序列（“list”，而不是“str”），并且我们必须先替换最高索引。如果我们先替换最低索引，那么后面的索引就会改变为它们在“spans”中的值。这就是为什么在首次创建“spans”时要反转它的原因。
“text”可能包含花括号中的花括号。正则表达式只找到不包含其他花括号的平衡花括号，但嵌套的花括号是合法的。因此，我们需要递归直到没有嵌套的花括号（“len(spans) == 0”情况）。使用Python生成器进行递归时，使用“yield from”从递归调用中重新生成每个结果。

在输出中，“{{pine,}apples,oranges}”首先扩展为“{pineapples,oranges}”和“{apples,oranges}”，然后扩展每个结果。如果我们没有使用“seen”请求唯一结果，则“oranges”结果将出现两次。

像“m{}e”中的空括号会扩展为空，所以这只是“me”。

不平衡的括号，如“}{”，保持不变。

如果需要大型数据集的高性能，则不要使用此算法，但它是一个适用于相当大小数据的通用解决方案。

- Jim Pivarski

3

正如那个人指出的，Python不直接支持大括号展开。但由于大括号展开是在通配符评估之前完成的，因此您自己可以执行该操作，例如，

result = glob('{faint,bright*}/{science,calib}/chip?/')

变得

result = [
    f 
    for b in ['faint', 'bright*'] 
    for s in ['science', 'calib'] 
    for f in glob('{b}/{s}/chip?/'.format(b=b, s=s))
]

- Matthias Fripp

0

wcmatch 库具有类似于 Python 标准 glob 的接口，可选择启用花括号扩展、波浪线扩展等选项。例如，启用花括号扩展：

from wcmatch import glob

glob.glob('{faint,bright*}/{science,calib}/chip?/', flags=glob.BRACE)

- Bluu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Orwellophile · Accepted Answer

11

将通配符与花括号扩展结合使用。

pip install braceexpand

示例:

from glob import glob
from braceexpand import braceexpand

def braced_glob(path):
    l = []
    for x in braceexpand(path):
        l.extend(glob(x))
            
    return l

>>> braced_glob('/usr/bin/{x,z}*k')  
['/usr/bin/xclock', '/usr/bin/zipcloak']

- Orwellophile

请问这如何适用于 glob？ - duff18

@ duff18，如果您能仔细阅读OP的问题，您会发现他首先需要解决花括号扩展，然后将glob.glob应用于每个结果。 - Orwellophile

这并不是 OP 所说的，他想要直接使用 glob。他必须使用两步方法的事实从你的回答中并不清楚。 - duff18

我认为之前的回答已经充分涵盖了这个问题，我只是在补充有关花括号扩展的缺失信息。但是，我会为你添加一个将所有内容联系起来的示例，专门为你准备。 - Orwellophile

1

默认情况下，答案按“投票”排序，因此不能保证读者在阅读其他详细解释的答案后会看到您的答案。因此，您的答案需要是自包含的，就像现在这样。 - duff18

只是为了补充@Orwellophile提到的内容，并且为了完整起见，这里有另一个执行相同功能的包：bracex（https://pypi.org/project/bracex/） - shahensha