如何在Python 3中实现字节对象的sprintf格式化？

Question

如何在Python 3中实现字节对象的sprintf格式化？

pythonpython-3.xtemplatesincompatibility

10

我想在Python3中使用原始字节对象进行sprintf操作，而无需进行任何手动转换以使％s正常工作。因此，将字节对象作为“模板”，加上任意数量的任何类型的对象，并返回渲染后的字节对象。这就是Python 2的sprintf％操作符始终起作用的方式。

b'test %s %s %s' % (5, b'blah','strblah') # python3 ==> error
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'int'

def to_bytes(arg):
    if hasattr(arg,'encode'): return arg.encode()
    if hasattr(arg,'decode'): return arg
    return repr(arg).encode()

def render_bytes_template(btemplate : bytes, *args):
    return btemplate % tuple(map(to_bytes,args))

render_bytes_template(b'this is how we have to write raw strings with unknown-typed arguments? %s %s %s',5,b'blah','strblah')

# output: b'this is how we have to render raw string templates with unknown-typed arguments? 5 blah strblah'

但在Python 2中，它只是内置的：

'example that just works %s %s %s' % (5,b'blah',u'strblah')
# output: 'example that just works 5 blah strblah'

在Python 3中有没有一种方法可以做到与Python 2相同的性能？请告诉我我是否错过了什么。此处的备选方案是使用cython实现（或者在Python 3中是否有库可以帮助实现？），但仍然不明白为什么除了字符串对象的隐式编码之外，它被从标准库中删除。我们不能只添加一个类似于format_any()的bytes方法吗？

顺便说一句，这并不像一种简单的逃避方式:

def render_bytes_template(btemplate : bytes, *args):
    return (btemplate.decode() % args).encode()

我不仅不想进行任何不必要的编码/解码，而且字节参数会被repr而不是原始插入。

- parity3

1

请注意，Python 3 现在可以保护您免受 Python 2 中隐藏在水线下的错误。例如，尝试使用 'unicode: %s' % (u'Ünîcódæ',) 进行测试。 - Martijn Pieters

2个回答

1

对于您来说，类似这样的东西是否有效？您只需要确保在开始一些 bytes 对象时，将其包装在新的 B 类型的类似字节对象中，该对象重载了 % 和 %= 运算符：

class B(bytes):
    def __init__(self, template):
        self._template = template

    @staticmethod
    def to_bytes(arg):
        if hasattr(arg,'encode'): return arg.encode()
        if hasattr(arg,'decode'): return arg
        return repr(arg).encode()

    def __mod__(self, other):
        if hasattr(other, '__iter__') and not isinstance(other, str):
            ret = self._template % tuple(map(self.to_bytes, other))
        else: 
            ret = self._template % self.to_bytes(other)
        return ret

    def __imod__(self, other):
        return self.__mod__(other)

a = B(b'this %s good')
b = B(b'this %s %s good string')
print(a % 'is')
print(b % ('is', 'a'))

a = B(b'this %s good')
a %= 'is'
b = B(b'this %s %s good string')
b %= ('is', 'a')
print(a)
print(b)

这将输出：

b'this is good'
b'this is a good string'
b'this is good'
b'this is a good string'

- mattjegan

1

老实说，我不知道我的问题是更多的抱怨还是关于设计阻碍性能的诚实问题。感谢您的贡献。如果一周内没有人回答，我会给你奖励。 - parity3

我认为这是一个公正的问题，但我不确定与.format或f-strings相比的性能成本如何。 - mattjegan

1

.format和f-strings需要解码（decode()），所以速度会更慢。我在其他帖子上读到，使用Unicode大约比使用字节慢一半。所以并不是很糟糕，但对于许多工作负载来说，当你只想从其他字节组合字节时，这会造成影响，而且答案是在组合之前处理所有输入，这是一个重大的改进。使用six或其他辅助程序也无法解决任何性能下降的问题。我知道你想要明确，但请注意print()命令接受字节和Unicode（所以不完全正确）。 - parity3

这在Unicode字符串上会出问题。根据上面的例子，它可以处理实际上不包含Unicode字符的Unicode字符串，但通常情况下不能。 - danny

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- danny · Accepted Answer

我想在Python3中使用原始字节对象进行sprintf操作，而不需要手动将%s转换为其他格式。

为了使此操作成功，所有的格式化参数也需要是字节类型。

这与Py2不同，因为在Py2中，即使是Unicode字符串也可以在字节字符串中进行格式化。但是，当引入带有Unicode字符的Unicode字符串时，Py2实现容易出错。

例如，在Python 2中：

In [1]: '%s' % (u'é',)
Out[1]: u'\xe9'

从技术上讲，这是正确的，但不是开发人员想要的。它也没有考虑到使用的任何编码。

然而，在Python 3中：

In [2]: '%s' % ('é',)
Out[2]: 'é'

对于格式化字节字符串，请使用字节字符串参数（仅适用于Py3.5+）

b'%s %s' % (b'blah', 'strblah'.encode('utf-8'))

其他类型，比如整数也需要转换成字节字符串。