如何去除Python三引号多行字符串的额外缩进？

Question

如何去除Python三引号多行字符串的额外缩进？

111

我有一个Python编辑器，用户输入脚本或代码，然后将其放入主方法中，同时每行都缩进。问题在于，如果用户有一个多行字符串，整个脚本的缩进会影响该字符串，每个空格都会插入一个制表符。一个有问题的脚本可能如下所示：

"""foo
bar
foo2"""

因此，在主方法中，它看起来应该像：

def main():
    """foo
    bar
    foo2"""

现在这个字符串的每一行开头都会多了一个制表符。

- Mike

1

http://codereview.stackexchange.com/questions/60366/avoiding-python-multiline-string-indentation - Ciro Santilli OurBigBook.com

10个回答

73

根据我的看法，这里更好的答案可能是inspect.cleandoc，它可以执行textwrap.dedent的大部分功能，但还可以解决textwrap.dedent在首行存在问题的情况。

下面的示例显示了区别：

>>> import textwrap
>>> import inspect
>>> x = """foo bar
    baz
    foobar
    foobaz
    """
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n    baz\n    foobar\n    foobaz\n'
>>> y = """
...     foo
...     bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar     baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'

请注意，inspect.cleandoc还将内部制表符(expands internal tabs)扩展为空格。这可能不适合某些使用情况，但对我来说很好用。

- bbenne10

3

请注意，这两者并不完全等同，而cleandoc函数所做的处理不仅仅是删除缩进。至少需要将制表符('\t')扩展为四个空格(' ')。 - Brian

1

这是正确的，但当时我没有注意到。我会更新答案，至少反映制表符扩展。 - bbenne10

3

也可以使用 textwrap.dedent(s).strip() 来避免更改制表符并仍然处理前导和尾随换行符。 - DocOc

1

我写这篇答案的背景比提问时的更为普遍。我想要重新排版文档字符串以便于文档编写（因此折叠很有帮助）。你说得对，你可以针对更具体的情况后处理textwrap.dedent的输出。当我回答这个问题时，我忽略了原始问题的细微差别。然而，我相信我的答案在更广泛的情况下会更有帮助。 - bbenne10

我不知道这是否是Python世界中的一个愚蠢错误，但是在三引号字符串中使用\n时应该小心。 inspect.cleandoc 不会清除它。（有经验的人会知道） - eddym

22

多行字符串的第一行之后的内容属于该字符串的一部分，解释器不会将其作为缩进处理。你可以自由编写：

def main():
    """foo
bar
foo2"""
    pass

另一方面，那段代码不易读，并且Python也知道这点。因此，如果文档字符串在它的第二行包含空格，则使用help()查看文档字符串时会去掉这些空格。因此，help(main)和下面的help(main2)产生相同的帮助信息。

def main2():
    """foo
    bar
    foo2"""
    pass

- SingleNegationElimination

谢谢回复。不幸的是，缩进完全是自动化的，因为我的代码将脚本作为字符串（在Java中）读入，并缩进该字符串中的每一行。 - Mike

我认为只有文档字符串使用三引号。这个自动化不会应用在其他地方。 - tribbloid

@tribbloid 对于docstrings的特殊逻辑，是针对于默认情况下使help()表现得更好而设计的使用案例。为了在其他地方使用相同的去除缩进逻辑，您可以使用textwrap.dedent()，并且这个方法已经在基本上每一个答案中被介绍过了。 - SingleNegationElimination

2

我希望保留三引号行之间的内容，只去掉公共的前导缩进。我发现 texwrap.dedent 和 inspect.cleandoc 并不能完全做到这一点，所以我写了这个函数。它使用了 os.path.commonprefix。

import re
from os.path import commonprefix

def ql(s, eol=True):
    lines = s.splitlines()
    l0 = None
    if lines:
        l0 = lines.pop(0) or None
    common = commonprefix(lines)
    indent = re.match(r'\s*', common)[0]
    n = len(indent)
    lines2 = [l[n:] for l in lines]
    if not eol and lines2 and not lines2[-1]:
        lines2.pop()
    if l0 is not None:
        lines2.insert(0, l0)
    s2 = "\n".join(lines2)
    return s2

这可以引用任何缩进的字符串。我想默认包括尾随换行符，但提供一个选项以便于整齐地引用任何字符串。

示例：

print(ql("""
     Hello
    |\---/|
    | o_o |
     \_^_/
    """))

print(ql("""
         World
        |\---/|
        | o_o |
         \_^_/
    """))

第二个字符串有4个公共缩进空格，因为最后的"""缩进比引用文本少：

 Hello
|\---/|
| o_o |
 \_^_/

     World
    |\---/|
    | o_o |
     \_^_/

我本以为这会更简单些，否则我就不会费心去做了！

- Sam Watkins

2

展示textwrap.dedent和inspect.cleandoc之间差异的更加清晰的方法：

前导部分未缩进时的行为

"最初的回答"

import textwrap
import inspect

string1="""String
with
no indentation
       """
string2="""String
        with
        indentation
       """
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

输出

string1 plain='String\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n        with\n        indentation\n'

前导部分缩进的行为

最初的回答

string1="""
String
with
no indentation
       """
string2="""
        String
        with
        indentation
       """

print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

输出

string1 plain='\nString\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n        String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'

- codeforester

1

如果我正确理解问题，那么这就是一个解决方法。lstrip()会移除前导空格，因此它将移除制表符和空格。

from os import linesep

def dedent(message):
    return linesep.join(line.lstrip() for line in message.splitlines())

例子：

name='host'
config_file='/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
message = f"""Missing env var or configuration entry for 'host'. 
              Please add '{name}' entry to file
              {config_file}
              or export environment variable 'mqtt_{name}' before
              running the program.
           """

>>> print(message)
Missing env var or configuration entry for 'host'. 
              Please add 'host' entry to
              '/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
              or export environment variable 'mqtt_host' before
              running the program.

>>> print(dedent(message))
Missing env var or configuration entry for 'host'. 
Please add 'host' entry to file
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.

上述解决方案将删除所有缩进。如果要删除整个多行字符串中普遍存在的缩进，请使用textwrap.dedent()。但请注意，多行字符串的第一行和最后一行也需要有缩进，否则.dedent()将不起作用。

- Nic

1

我唯一看到的方法是为每行从第二个字符开始，剥离前n个tab，其中n是主方法的已知缩进。

如果事先不知道该缩进-您可以在插入之前添加尾随换行符，并从最后一行剥离制表符数量...

第三种解决方案是解析数据并查找多行引用的开头，在它关闭之前，不要在每行后面添加您的缩进。

认为有更好的解决方案..

- Mikhail Churbanov

谢谢回复。所以您建议我去掉每行插入的缩进？我有点困惑... - Mike

0

我曾经也遇到过类似的问题：我想让我的三引号字符串缩进，但我不想让每行开头都有那么多空格。我使用了re来解决这个问题：

        print(re.sub('\n *','\n', f"""Content-Type: multipart/mixed; boundary="===============9004758485092194316=="
`           MIME-Version: 1.0
            Subject: Get the reader's attention here!
            To: recipient@email.com

            --===============9004758485092194316==
            Content-Type: text/html; charset="us-ascii"
            MIME-Version: 1.0
            Content-Transfer-Encoding: 7bit

            Very important message goes here - you can even use <b>HTML</b>.
            --===============9004758485092194316==--
        """))

在上面的代码中，我保持了代码的缩进，但是字符串基本上被修剪了。每行开头的所有空格都被删除了。这很重要，因为在SMTP或MIME特定行前面的任何空格都会破坏电子邮件消息。

我做出的权衡是将Content-Type留在第一行，因为我使用的regex没有删除初始的\n（这会破坏电子邮件）。如果这让我很困扰，我想我可以像这样添加一个lstrip：

print(re.sub('\n *','\n', f"""
    Content-Type: ...
""").lstrip()

阅读了这篇10年前的文章后，我决定坚持使用re.sub，因为我并没有真正理解textwrap和inspect的所有细微差别。

- Mark

0

有一个更简单的方法：

    foo = """first line\
             \nsecond line"""

- Kostia

这需要您手动添加换行符，并将缩进空格添加到上一行。 - bohrax

不确定添加“\n”存在什么问题。如果您从头开始进行格式化，那么添加就很容易，而且在用户输入或提取的文本中添加额外符号也没有问题。并且它不会添加到以“\”结尾的行。也许它不适用于所有用例，但对我来说，它比我能找到的任何东西都要好得多。 - Kostia

它确实添加了缩进空格（在后面），但它并没有解决原始问题，因为数据来自用户。 - bohrax

-15

如果我理解正确的话，您会正确缩进并将用户输入与程序余下部分整合在一起（然后运行整个程序）。

因此，在将用户输入放入程序后，您可以运行正则表达式，基本上将强制缩进恢复。类似于：在三引号内，用一个“新行标记”替换所有跟随四个空格（或制表符）的“新行标记”。

- FlorianH

是的，没错。这是我想到的唯一可能的解决办法。不确定为什么当时没有执行...如果没有更好的方法出现，我想我可能得这样做。 - Mike

25

@thraxil建议使用textwrap.dedent是正确的方法。考虑修改你所接受的答案。 - Chris Calo

3

@ChrisCalo @bbenne10的答案甚至更好。 - user2297550

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- thraxil · Accepted Answer

162

标准库中的 textwrap.dedent 可以自动去除奇怪的缩进。

- thraxil

18

标准库永远不会停止给我们带来惊喜。 - thraxil

34

请注意，如果第一行以"""foo开始，则第一行缺少其他行具有的前导缩进，因此 dedent 不会起作用。如果您等到下一行开始使用 "foo "并像这样转义第一个换行符："""\\，那么它就可以正常工作了。 - Scott H

4

为了解决@ScottH提到的缺点，请参阅我关于“inspect.cleandoc”的回答。 - bbenne10