如何在Python中解码此字符串？

Question

如何在Python中解码此字符串？

4

我下载了一份Facebook信息数据集，格式如下：

f\u00c3\u00b8rste student

这里应该是“første student”，但我似乎无法正确解码。

我尝试过：

str = 'f\u00c3\u00b8rste student'
print(str)
# 'fÃ¸rste student'

str = 'f\u00c3\u00b8rste student'
print(str.encode('utf-8')) 
# b'f\xc3\x83\xc2\xb8rste student'

但它没有起作用。

- vhflat

1

'ø' is '\u00f8' - timgeb

2

@Rafael 那样做没有帮助，# -*- coding: utf-8 -*- 只是指定源代码文件的编码方式。 - quant

@Prune 这不是UTF-8编码问题。问题在于有多个看起来像“ø”的相似字符。因此，\u00f8是这样一个字符，但\xC3\xb8也是。有了这个，答案就很明显了。 - quant

3

抱歉，我重新开放了。 - Prune

1

可能是Facebook JSON格式编码错误的重复问题。 - snakecharmerb

显示剩余8条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jwodder · Accepted Answer

如果遇到编码问题，需要将字符转换为具有相同序数的字节，可先使用ISO-8859-1（Latin-1）进行编码，然后再使用UTF-8进行解码来撤销可能发生的编码错误：

>>> 'f\u00c3\u00b8rste student'.encode('iso-8859-1').decode('utf-8')
'første student'