从字节流中读取UTF-8字符

Question

从字节流中读取UTF-8字符

python-3.xutf-8utf8-decode

5

给定一个字节流（生成器、文件等），如何读取单个utf-8编码的字符？

此操作必须从流中消耗该字符的字节。
此操作不得消耗超过第一个字符的任何流字节。
此操作应在任何Unicode字符上成功。

我可以通过编写自己的utf-8解码函数来实现这一点，但我更愿意不重新发明轮子，因为我相信这种功能肯定已经被用于解析utf-8字符串的其他地方。

- arcyqwerty

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kevin · Accepted Answer

将流用encoding='utf8'包装在TextIOWrapper中，然后在其上调用.read(1)。

这假设您从BufferedIOBase或与其兼容的东西（即具有read()方法）开始。如果您有生成器或迭代器，则可能需要适应接口。

示例：

from io import TextIOWrapper

with open('/path/to/file', 'rb') as f:
  wf = TextIOWrapper(f, 'utf-8')
  wf._CHUNK_SIZE = 1  # Implementation detail, may not work everywhere

  wf.read(1) # gives next utf-8 encoded character
  f.read(1)  # gives next byte