值错误：无法将字符串转换为浮点数：id

Question

值错误：无法将字符串转换为浮点数：id

pythonstringfloating-pointtype-conversionvalueerror

111

我正在运行以下Python脚本：

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    list1=[float(x) for x in l1]
    list2=[float(x) for x in l2]
    result=stats.ttest_ind(list1,list2)
    print result[1]

然而我遇到了如下错误：

ValueError: could not convert string to float: id

我对此感到困惑。当我在交互式部分尝试仅针对一行执行此操作时，而不是使用脚本的for循环：

>>> from scipy import stats
>>> import numpy as np
>>> f=open('data2.txt','r').readlines()
>>> w=f[1].split()
>>> l1=w[1:8]
>>> l2=w[8:15]
>>> list1=[float(x) for x in l1]
>>> list1
[5.3209183842, 4.6422726719, 4.3788135547, 5.9299061614, 5.9331108706, 5.0287087832, 4.57...]

它运行良好。

有人能解释一下吗？谢谢。

- LookIntoEast

3

当从csv文件中读取数据帧时，若数据类型为df = df[['p']].astype({'p': float})，可能会出现ValueError: could not convert string to float:这种错误。如果csv文件中存在空格，则Python将无法将空格识别为NaN。您需要使用df = df.replace(r'^\s*$', np.nan, regex=True)将空单元格覆盖为NaN。 - Alfred Wallace

12个回答

36

我的错误非常简单：包含数据的文本文件在最后一行上有一些空格（因此不可见）字符。

作为 grep 命令的输出，我得到了 45 而不是只有 45。

- Sopalajo de Arrierez

2

空格和制表符是可见的；例如，字符\n、\r等行尾字符是不可见的。 - Oleg Melnikov

我猜现在大多数人都意识到 Lib/re.py 和 .replace(' ', '') 的存在了。 - Ole Aldric

23

这个错误信息比较冗长:

ValueError: could not convert string to float: id

在你的文本文件中，有一行包含单词id，但这个单词实际上不能被转换为数字。

你的测试代码之所以有效是因为line 2中并不存在单词id。

如果你想捕获那一行，试试这段代码。我稍微整理了一下你的代码：

#!/usr/bin/python

import os, sys
from scipy import stats
import numpy as np

for index, line in enumerate(open('data2.txt', 'r').readlines()):
    w = line.split(' ')
    l1 = w[1:8]
    l2 = w[8:15]

    try:
        list1 = map(float, l1)
        list2 = map(float, l2)
    except ValueError:
        print 'Line {i} is corrupt!'.format(i = index)'
        break

    result = stats.ttest_ind(list1, list2)
    print result[1]

- Blender

18

如果有一个包含逗号的数字列的Pandas数据框，可以使用以下代码：

df["Numbers"] = [float(str(i).replace(",", "")) for i in df["Numbers"]]

因此，像4,200.42这样的数值将被转换为浮点数4200.42。

奖励1：这是快速的。

奖励2：如果将数据帧保存在类似Apache Parquet格式的文件中，则更节省空间。

- Contango

8

也许你的数字并不是真正的数字，而是伪装成数字的字母？

在我的情况下，我使用的字体导致“l”和“1”看起来非常相似。我有一个字符串像'l1919'，我原以为它是'11919'，结果搞砸了。

- Tom Roth

7

你的数据可能与你预期不同——似乎你希望得到浮点数，但并没有得到。为了找出这种情况，一个简单的解决方案是在for循环中添加try/except语句块：

for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
      list1=[float(x) for x in l1]
      list2=[float(x) for x in l2]
    except ValueError, e:
      # report the error in some way that is helpful -- maybe print out i
    result=stats.ttest_ind(list1,list2)
    print result[1]

- Matt Fenwick

5

最短的方法：

df["id"] = df['id'].str.replace(',', '').astype(float) - 如果逗号是问题

df["id"] = df['id'].str.replace(' ', '').astype(float) - 如果空格是问题

- João Vitor Gomes

2

将空字符串值更新为0.0值：如果您知道可能的非浮点值，则进行更新。

df.loc[df['score'] == '', 'score'] = 0.0


df['score']=df['score'].astype(float)

- Ramesh Ponnusamy

1

我使用pandas基本技巧解决了类似的情况。首先使用pandas加载csv或文本文件。这非常简单。

data=pd.read_excel('link to the file')

然后将数据的索引设置为需要更改的相应列。例如，如果您的数据具有ID作为一个属性或列，则将索引设置为ID。

 data = data.set_index("ID")

然后使用以下命令删除所有“id”值而非数字的行。

  data = data.drop("id", axis=0).

希望这能帮到你。

- Kapilfreeman

0

在pandas中

当使用astype()或apply()将pandas列的dtype从object更改为float时，通常会出现此错误（或非常相似的错误）。原因是存在无法转换为浮点数的非数字字符串。一种解决方法是改用pd.to_numeric()，并传递errors='coerce'。这将将非数字值（如字面字符串'id'）替换为NaN。

df = pd.DataFrame({'col': ['id', '1.5', '2.4']})

df['col'] = df['col'].astype(float)                     # <---- ValueError: could not convert string to float: 'id'
df['col'] = df['col'].apply(lambda x: float(x))         # <---- ValueError

df['col'] = pd.to_numeric(df['col'], errors='coerce')   # <---- OK
#                                    ^^^^^^^^^^^^^^^ <--- converts non-numbers to NaN


0    NaN
1    1.5
2    2.4
Name: col, dtype: float64

pd.to_numeric()只能逐列操作，所以如果你需要一次性更改多列的数据类型（类似于使用.astype(float)），那么将其传递给apply()应该可以完成任务。

df = pd.DataFrame({'col1': ['id', '1.5', '2.4'], 'col2': ['10.2', '21.3', '20.6']})
df[['col1', 'col2']] = df.apply(pd.to_numeric, errors='coerce')


   col1  col2
0   NaN  10.2
1   1.5  21.3
2   2.4  20.6

有时候会有千位分隔符逗号，这会导致类似的错误。

ValueError: could not convert string to float: '2,000.4'

在这种情况下，在调用pd.to_numeric()之前先将它们移除可以解决这个问题。

df = pd.DataFrame({'col': ['id', '1.5', '2,000.4']})
df['col'] = df['col'].replace(regex=',', value='')
#                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^  <--- remove commas
df['col'] = pd.to_numeric(df['col'], errors='coerce')


0       NaN
1       1.5
2    2000.4
Name: col, dtype: float64

在scikit-learn中

当您将包含字符串的数据拟合到期望数值数据的模型中时，也会引发此错误。一个例子是各种缩放器，例如StandardScaler()。在这种情况下，解决方案是通过独热编码或标签编码将文本输入处理为数值输入。下面是一个示例，其中字符串输入首先进行独热编码，然后输入到缩放器模型中。

from sklearn.preprocessing import StandardScaler, OneHotEncoder
data = [['a'], ['b'], ['c']]
sc = StandardScaler().fit(data)  # <--- ValueError: could not convert string to float: 'a'


data = OneHotEncoder().fit_transform(data).toarray()
sc = StandardScaler().fit(data)  # <--- OK

- cottontail

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Anurag Uniyal · Accepted Answer

显然，您的某些行没有有效的浮点数据，具体而言，有些行有文本id，无法转换为浮点数。

当您在交互式提示符中尝试时，您只尝试了第一行，因此最好的方法是打印出出错的那一行，这样您就会知道错误的行，例如：

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
        list1=[float(x) for x in l1]
        list2=[float(x) for x in l2]
    except ValueError,e:
        print "error",e,"on line",i
    result=stats.ttest_ind(list1,list2)
    print result[1]