UnicodeDecodeError: 'ascii' 编解码器无法解码第8位的0xc3字节：超出范围（128）

Question

UnicodeDecodeError: 'ascii' 编解码器无法解码第8位的0xc3字节：超出范围（128）

4

我在这一行遇到了这个错误：

logger.debug(u'__call__ with full_name={}, email={}'.format(full_name, email))

为什么？

name变量的内容是Gonçalves。

- quant

1

这是因为记录器（logger）只接收UTF-8编码字符，因此您无法记录“ç”。 - Maxxik CZ

2

可能是重复的问题：UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) - Legorooj

这更像是在Python中记录异常时出现了UnicodeDecodeError。logger可以处理Unicode，但控制台可能无法处理。 - FiddleStix

@FiddleStix 针对那个问题的建议解决方案是使用Unicode字符串，我已经这样做了。 - quant

@Legorooj 我不明白这两者是如何相同的。更普遍地说，我发现答案中链接的文档非常无用（我看了一半就感到困惑，然后放弃了）。这里是否有一个简单的解释，说明正在发生什么以及如何修复它？ - quant

1

正如 @MaxxikCZ 所说，Unicode 字符串不支持某些字符。 full_name 和 email 变量已经是 Unicode 吗？如果不是，请在预处理之前将它们转换为 Unicode。这可能会（很可能）在其他地方引发错误，但更容易捕获。 - Legorooj

3个回答

2

这应该可以解决你的问题：

full_name, email = [unicode(x, 'utf-8') for x in [full_name, email]]

logger.debug(u'__call__ with full_name={}, email={}'.format(full_name, email))

问题在于unicode字符串的默认编码是ASCII，只支持128个字符。使用UTF-8可以解决这个问题。

免责声明：这可能在具体细节上有误，我只编写py3代码。大约花费了5分钟学习所有这些内容。

- Legorooj

这似乎解决了我的问题。谢谢。 - quant

@quant 很高兴能帮忙 - 我承认我不理解可能是重复的文档链接 - 我花了四年时间才搞懂了如何理解令人困惑的文档。 - Legorooj

@quant 在 Python 3 中，字符串和 Unicode 是相同的，并且默认使用 UTF-8 编码。 - Legorooj

0

我重新挖掘了这个旧帖子，提出了一个解决方案，它向记录器添加了上下文过滤器，从而确保在使用Python 2.x时，传递给记录器的每个字符串都是Unicode字符串。

TL:DR; 查看帖子末尾的现成解决方案

<!-- lang: python -->
# First, let's create a string to unicode failsafe function
def safe_string_convert(string):
"""
Allows to encode strings for hacky UTF-8 logging in python 2.7
"""

try:
    return string.decode('utf8')
except UnicodeDecodeError:
    try:
        return string.decode('unicode-escape')
    except Exception:
        try:
            return string.decode('latin1')
        except Exception:
            return(b"String cannot be decoded. Passing it as binary blob" + bytes(string))


# Create a logger contextFilter class that will fix encoding for Python 2

class ContextFilterWorstLevel(logging.Filter):
    """
    This class re-encodes strings passed to logger
    Allows to change default logging output or record events
    """

    def __init__(self):
        self._worst_level = logging.INFO
        if sys.version_info[0] < 3:
            super(logging.Filter, self).__init__()
        else:
            super().__init__()


    def filter(self, record):
        # type: (str) -> bool
        """
        A filter can change the default log output
        This one simply records the worst log level called
        """
        # Examples
        # record.msg = f'{record.msg}'.encode('ascii', errors='backslashreplace')
        # When using this filter, something can be added to logging.Formatter like '%(something)s'
        # record.something = 'value'
        # python 2.7 comapt fixes
        if sys.version_info[0] < 3:
            record.msg = safe_string_convert(record.msg)
        return True

#####
# Now let's create a new logger and try it
#####

log_filter = ContextFilterWorstLevel()
logger = logging.getLogger()

# Remove earlier handlers if exist
while _logger.handlers:
    _logger.handlers.pop()

# Add context filter
logger.addFilter(log_filter)

# Test
logger.info('Café non unicode string")

可直接使用的解决方案: ofuntions.logger_utils 包。使用 pip install ofunctions.logger_utils 进行安装。

用法:

from ofunctions.logger_utils import logger_get_logger

logger = logger_get_logger(log_file='somepath')
logger.info('Café non unicode')

希望这能让 Python 2.x 的后移者更轻松。

- Orsiris de Jong

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- FiddleStix · Accepted Answer

问题在于full_name是一个字符串，而不是Unicode对象。

# -*- coding: UTF-8 -*-
import logging

logging.basicConfig()
logger = logging.getLogger()
logger.warning('testing')

# unicode.format(str) raises an error
name = 'Gonçalves'
print type(name)
print name
try:
    message = u'{}'.format(name)
except UnicodeDecodeError as e:
    print e

# but logger(unicode) is fine
logging.warn(u'Gonçalves')

# so unicode.format(str.decode()) doesn't raise
name = 'Gonçalves'
print type(name)
print name
message = u'{}'.format(name.decode('utf-8'))
logging.warning(message)


# and neither does unicode.format(unicode)
name = u'Gonçalves'
print type(name)
print name
message = u'{}'.format(name)
logging.warning(message)