适用于Python 3和2的Unicode文字量化符号

Question

适用于Python 3和2的Unicode文字量化符号

pythonpython-3.xunicodepython-2.xunicode-literals

39

我有一个Python脚本，我希望它能够在Python 3.2和2.7上都能运行，以方便使用。

是否有一种方法可以在两个版本中同时使用Unicode字面值？例如：

#coding: utf-8
whatever = 'שלום'

在Python 2.x中，以上代码需要使用Unicode字符串(u'')。但在Python 3.x中，这个小的u会导致语法错误。

- ubershmekel

@ubershmekel，您会推荐哪个解决方案？您的还是被接受的答案？ - satoru

我建议使用 u''，因为它在Python 3.3中得到了支持。 - ubershmekel

2个回答

0

在3.0, 3.1和3.2中：

from __future__ import unicode_literals

源代码: ubershmekel在问题中提供。请参见修订版4获取原始内容。

- wjandrea

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lennart Regebro · Accepted Answer

编辑 - 自从Python 3.3版以来，u''文字面值再次可用，因此不再需要使用u()函数。

最好的选择是创建一个方法，在Python 2中将字符串对象转换为Unicode对象，但在Python 3中保持字符串对象不变（因为它们已经是Unicode）。

import sys
if sys.version < '3':
    import codecs
    def u(x):
        return codecs.unicode_escape_decode(x)[0]
else:
    def u(x):
        return x

然后你可以像这样使用它：

>>> print(u('\u00dcnic\u00f6de'))
Ünicöde
>>> print(u('\xdcnic\N{Latin Small Letter O with diaeresis}de'))
Ünicöde