在Windows中检测具有Unicode字符的文件名

Question

在Windows中检测具有Unicode字符的文件名

5

Python版本：2.7.3

文件名：测试雪人字符--☃--.mp3

运行了以下测试，但都没有成功。

>>> os.path.exist('test snowman character --☃--.mp3')
False
>>> os.path.exist(repr('test snowman character --☃--.mp3'))
False
>>> os.path.isfile('test snowman character --\\xe2\\x98\\x83--.mp3')
False
>>> os.path.isfile(r'test snowman character --\\xe2\\x98\\x83--.mp3')
False
>>> os.path.isfile('test snowman character --☃--.mp3'.decode('utf-8'))
False

尝试使用glob检索文件，即使测试也失败了。

目标是检测并将此文件复制到另一个文件夹中，请提供建议。

- Karthikeyan S

1

os.listdir(u'.') 告诉你当前目录中有什么？ - Martijn Pieters

注意：在此处转义UTF8字节序列是行不通的；但是，由于Windows NTFS文件系统使用UTF16，使用UTF8也行不通。在这里给出Python unicode路径值；您的上一个版本仅在Unicode雪人真正以UTF8的形式输入到终端时才能工作。 - Martijn Pieters

3个回答

1

Windows NTFS文件系统使用UTF-16编码（可以问Martijn Pieters），因此请尝试以下操作：

>>> os.path.exists(u'test snowman character --☃--.mp3'.encode("UTF-16"))

但是首先确保解释器的输入编码是正确的。print repr(u'test snowman character --☃--.mp3')应该输出：

u'test snowman character --\u2603--.mp3'

注意: 我无法测试此内容，因为Windows CMD不允许我输入雪人符号。无论如何，如果您只是给Python一个Unicode字符串，它会做正确的事情，所以编码调用是多余的。总之，我推荐Martijn Pieters的答案。

- Hubro

输出将在 u 前使用单个反斜杠。 - Martijn Pieters

@MartijnPieters：我直接从解释器中复制了它。由于输出表示是“字符串中的字符串”，因此必须转义反斜杠。 - Hubro

你在解释器中只使用了 repr()。解释器再次使用了 repr()。通常情况下，你会在解释器中省略显式的 repr()，或者使用 print repr()；重点是获得一个可循环使用的表示形式，类似于可以再次重用的 Unicode 文本。这里多余的反斜杠使得该值不可重用，在问题和另一个答案之间有足够的混淆关于何时转义转义字符。 - Martijn Pieters

@MartijnPieters：好的，没问题。已编辑。 - Hubro

0

字面上的Unicode字符串应该以u'开头，尝试使用os.path.exist(u'test snowman character --☃--.mp3')

如果您想使用转义序列，则为ur'，例如os.path.isfile(ur'test snowman character --\\xe2\\x98\\x83--.mp3')

http://docs.python.org/2.7/reference/lexical_analysis.html#strings

- SpliFF

1

.decode('utf8') 可能已经处理了那种情况。 - Martijn Pieters

一个原始的Unicode字符串保留反斜杠。UTF8“字节”被按照字面意义而不是作为代码点处理。在Unicode字符串中使用\u2603代替。 - Martijn Pieters

.decode() 方法的运行结果取决于源文档的默认编码是否正确。通常情况下，默认编码是 ASCII，但是你的字符串不是 ASCII 编码（至少不是有效的 ASCII 编码）。请参考：https://wiki.python.org/moin/DefaultEncoding 和 http://docs.python.org/2/howto/unicode.html。 - SpliFF

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

使用Unicode值，最好使用Unicode转义序列：

os.path.isfile(u'test snowman character --\u2603--.mp3')

当你提供一个Unicode路径时，Windows上的Python将使用正确的Windows API来列出UTF16文件。

有关Python如何通过Unicode和字节字符串文件路径改变行为的更多信息，请参见Python Unicode HOWTO。