Python中str(u'a')和u'a'.encode('utf-8')有什么区别？

Question

Python中str(u'a')和u'a'.encode('utf-8')有什么区别？

8

作为标题，不使用str()将unicode字符串转换为str的原因是什么？

>>> str(u'a')
'a'
>>> str(u'a').__class__
<type 'str'>
>>> u'a'.encode('utf-8')
'a'
>>> u'a'.encode('utf-8').__class__
<type 'str'>
>>> u'a'.encode().__class__
<type 'str'>

更新：感谢回答，我也不知道如果使用特殊字符创建字符串，它会自动转换为UTF-8。

>>> a = '€'
>>> a.__class__
<type 'str'>
>>> a
'\xe2\x82\xac'

在 Python 3 中，Also 是一个 Unicode 对象。

- James Lin

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mark Byers · Accepted Answer

当您编写str(u'a')时，它会使用默认编码将Unicode字符串转换为字节串，该编码（除非您费心地更改了它）将是ASCII。

第二个版本明确将字符串编码为UTF-8。

如果尝试使用包含非ASCII字符的字符串，则差异更加明显。第二个版本仍将起作用：

>>> u'€'.encode('utf-8')
'\xc2\x80'

第一个版本将引发异常：

>>> str(u'€')
Traceback (most recent call last):
  File "", line 1, in 
    str(u'€')
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: ordinal not in range(128)