Python专家:
我有一个句子,像这样:
"this time air\u00e6\u00e3o was filled\u00e3o"
我希望删除非ASCII的Unicode字符。
我可以使用以下代码和函数:
def remove_non_ascii(text):
return ''.join(i for i in text if ord(i) < 128)
def removeNonAscii(s):
return "".join(filter(lambda x: ord(x)<128, s))
sentence = "this time air\u00e6\u00e3o was filled\u00e3o"
sentence = removeNonAscii(sentence)
print(sentence)
接着它会显示:"this time airo was filledo"
,很好地去除了"\00..",但是当我把这句话写入文件中,然后读取并作为一个循环时:
def removeNonAscii(s):
return "".join(filter(lambda x: ord(x)<128, s))
hand = open('test.txt')
for sentence in hand:
sentence = removeNonAscii(sentence)
print(sentence)
这里显示"this time air\u00e6\u00e3o was filled\u00a3o"
,但它根本不起作用。发生了什么?如果函数正常工作,就不应该出现这种情况...
\u00e6
(一个字符串字面量)与它在文件中的表现非常不同。尝试将该句子写入文件,然后在代码中读取:open('test.txt', 'w').write("air\u00e6\u00e3o")
或类似的操作。 - Nick Tcodecs.open()
。 - martineau