Python HTML编码 \xc2\xa0

19

我对这个问题已经苦苦挣扎了一段时间。我试图将字符串写入HTML,但在清理它们后格式出现了问题。以下是一个示例:

paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ', 
'But behind the problems are still the makings of a formidable company']

x = str(" ")
for item in paragraphs:
    x = x + str(item)
x

输出:

"Grocery giant and household name\xc2\xa0Woolworths is battered and\xc2\xa0bruised. 
But behind the problems are still the makings of a formidable\xc2\xa0company"

期望的输出:

"Grocery giant and household name Woolworths is battered and bruised. 
But behind the problems are still the makings of a formidable company"

我希望您能够解释为什么会发生这种情况,以及我如何解决。提前感谢!


2
你有检查源字符串中是否存在异常的Unicode空格吗? - Nathan Tuggy
1个回答

34

\xc2\xa0表示的是0xC2 0xA0,也被称为

不间断空格

它是UTF-8编码中一种不可见的控制字符。想了解更多关于它的信息,请查看维基百科:https://en.wikipedia.org/wiki/Non-breaking_space

我复制了你在问题中粘贴的内容,并得到了预期的输出。


8
谢谢。问题解决了。我添加了以下代码: x.replace("\xc2\xa0", " ") - Sam Perry

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接