unicode()
函数:
>>> s = " foo “bar bar ” weasel"
>>> s.encode('utf-8', 'ignore')
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
s.encode('utf-8', 'ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 5: ordinal not in range(128)
>>> unicode(s)
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 5: ordinal not in range(128)
我在这里大多数时间都是摸索。我需要做什么来从字符串中删除不安全的字符?
与此问题有些相关,但我无法从中解决我的问题。
这也失败了:
>>> s
' foo \x93bar bar \x94 weasel'
>>> s.decode('utf-8')
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
s.decode('utf-8')
File "C:\Python25\254\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x93 in position 5: unexpected code byte
str
具有encode
函数,以及“编码”参数是否指定了结果的编码还是输入的编码。你到底想在这里做什么? - Thanatosu'my unicode str'.encode('ascii','xmlcharrefreplace')
。 - 4Z4T4R