Python: ValueError: 无法将字符串转换为浮点数：'0'

Question

Python: ValueError: 无法将字符串转换为浮点数：'0'

4

为了完成一些学校作业，我尝试使用pyplot根据Logger Pro提供的数据绘制一些科学图表。但是我遇到了错误：

ValueError: could not convert string to float: '0'

这是程序：

plot.py
-------------------------------
import matplotlib.pyplot as plt 
import numpy as np

infile = open('text', 'r')

xs = []
ys = []

for line in infile:
    print (type(line))
    x, y = line.split()
    # print (x, y)
    # print (type(line), type(x), type(y))

    xs.append(float(x))
    ys.append(float(y))

xs.sort()
ys.sort()

plt.plot(xs, ys, 'bo')
plt.grid(True)

# print (xs, ys)

plt.show()

infile.close()

输入文件包含以下内容：

text
-------------------------------
0 1.33
1 1.37
2 1.43
3 1.51
4 1.59
5 1.67
6 1.77
7 1.86
8 1.98
9 2.1

我运行程序时收到的错误消息如下:

Traceback (most recent call last):
  File "\route\to\the\file\plot01.py", line 36, in <module>
    xs.append(float(x))
ValueError: could not convert string to float: '0'

- Emil Lykke Diget

在编辑过程中，我似乎已经破坏了证据。BOM字节不再出现在帖子中，甚至不在原始修订版中。感谢Stack！（咳咳）。 - Martijn Pieters

那很好，我想... - Emil Lykke Diget

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

您的数据文件中有UTF-8 BOM；这是我的Python 2交互会话所述的被转换为浮点数的内容：

>>> '0'
'\xef\xbb\xbf0'

"

\xef\xbb\xbf 字节是 UTF-8 编码的 U+FEFF ZERO WIDTH NO-BREAK SPACE，通常作为字节顺序标记使用，特别是由 Microsoft 产品使用。UTF-8 没有字节顺序问题，该标记不需要记录像 UTF-16 或 UTF-32 那样的字节顺序；相反，Microsoft 使用它来检测编码。

在 Python 3 中，您可以使用 utf-8-sig 编解码器打开文件；此编解码器期望 BOM 在开头并将其删除：

"

infile = open('text', 'r', encoding='utf-8-sig')

在Python 2中，您可以使用codecs.BOM_UTF8常量来检测和删除BOM。

for line in infile:
    if line.startswith(codecs.BOM_UTF8):
        line = line[len(codecs.BOM_UTF8):]
    x, y = line.split()

正如codecs文档所解释的那样:

As UTF-8 is an 8-bit encoding no BOM is required and any U+FEFF character in the decoded string (even if it’s the first character) is treated as a ZERO WIDTH NO-BREAK SPACE.

Without external information it’s impossible to reliably determine which encoding was used for encoding a string. Each charmap encoding can decode any random byte sequence. However that’s not possible with UTF-8, as UTF-8 byte sequences have a structure that doesn’t allow arbitrary byte sequences. To increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python 2.5 calls "utf-8-sig") for its Notepad program: Before any of the Unicode characters is written to the file, a UTF-8 encoded BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written. As it’s rather improbable that any charmap encoded file starts with these byte values (which would e.g. map to
LATIN SMALL LETTER I WITH DIAERESIS
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
INVERTED QUESTION MARK
in iso-8859-1), this increases the probability that a utf-8-sig encoding can be correctly guessed from the byte sequence. So here the BOM is not used to be able to determine the byte order used for generating the byte sequence, but as a signature that helps in guessing the encoding. On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file. On decoding utf-8-sig will skip those three bytes if they appear as the first three bytes in the file. In UTF-8, the use of the BOM is discouraged and should generally be avoided.