如何从标准输入中读取输入并强制执行编码？

Question

如何从标准输入中读取输入并强制执行编码？

3

目标是在Python2和Python3中持续从stdin读取并强制使用utf8编码。我尝试了以下解决方案：

我已经尝试过：

#!/usr/bin/env python

from __future__ import print_function, unicode_literals
import io
import sys

# Supports Python2 read from stdin and Python3 read from stdin.buffer
# https://dev59.com/5IDba4cB1Zd3GeqPASR5#23932488
user_input = getattr(sys.stdin, 'buffer', sys.stdin)


# Enforcing utf-8 in Python3
# https://dev59.com/_mQn5IYBdhLWcg3wzZ36#16549381
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
    for line in fin:
        # Reads the input line by line
        # and do something, for e.g. just print line.
        print(line)

这段代码在Python3中可以运行，但在Python2中无法使用TextIOWrapper的read函数，并且会抛出以下错误：

Traceback (most recent call last):
  File "testfin.py", line 12, in <module>
    with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
AttributeError: 'file' object has no attribute 'readable'

这是因为在Python中，user_input即sys.stdin.buffer是一个_io.BufferedReader对象，它有一个属性readable：

<class '_io.BufferedReader'>

['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']

在Python2中，user_input是一个文件对象，它的属性没有readable属性：

<type 'file'>

['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

- alvas

2个回答

-1

你尝试过在Python中强制使用UTF-8编码吗？可以按照以下方式进行：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

- sancelot

重点是避免设置语言环境。因此，该脚本支持Python2和Python3。另外，重新加载默认编码是不鼓励的 =( - alvas

3

这影响Unicode和ASCII之间的隐式转换，全局性地影响。这是一个糟糕的想法，因为已经构建了依赖于非ASCII数据抛出异常的库。这种变更打破了这种预期。 - Martijn Pieters

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- lenz · Accepted Answer

如果您不需要完整的 io.TextIOWrapper，只需要一个解码后的读取流，您可以使用 codecs.getreader() 创建一个解码包装器：

reader = codecs.getreader('utf8')(user_input)
for line in reader:
    # do whatever you need...
    print(line)

codecs.getreader('utf8')创建了一个codecs.StreamReader的工厂，然后使用原始流实例化。

我不确定StreamReader是否支持with上下文，但这可能并不是严格必要的（我猜在读取后没有关闭STDIN...）。

我已经成功地在底层流只提供非常有限接口的情况下使用了这个解决方案。

更新（第二版）

从评论中可以看出，您实际上需要一个io.TextIOWrapper才能在交互模式下具有适当的行缓冲等功能；codecs.StreamReader仅适用于管道输入等情况。

使用this answer，我能够使交互式输入正常工作：

#!/usr/bin/env python
# coding: utf8

from __future__ import print_function, unicode_literals
import io
import sys

user_input = getattr(sys.stdin, 'buffer', sys.stdin)

with io.open(user_input.fileno(), encoding='utf8') as f:
    for line in f:
        # do whatever you need...
        print(line)

这将从二进制 STDIN 缓冲区创建一个带有强制编码的 io.TextIOWrapper。