在pandas中出现错误:TypeError: object of type 'float' has no len()

3

我有一个pandas数据框 df

import numpy as np
import pandas as pd

df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
      "type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
      "F_ID" :["0", "[7 8 9]", "[10]", "0", "[2]", "0", "0", "0", "0"]})

# convert the string representations of list structures to actual lists
F_ID_as_series_of_lists = df["F_ID"].str.replace("[","").str.replace("]","").str.split(" ")

#type(F_ID_as_series_of_lists) is pd.Series, make it a list for pd.DataFrame.from_records
F_ID_as_records = list(F_ID_as_series_of_lists)

f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)

我在以下代码行中遇到了错误:

f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)

错误信息为: TypeError: object of type 'float' has no len()

如何解决这个问题?

2个回答

1

还有另一种方法,使用列表推导式并利用我们从类型错误中学到的知识。

假设您有一个pandas系列,其数据类型为字符串,您想根据'/'符号将该列拆分为两个部分,但并非所有列都被填充。

pd.DataFrame({'TEXT_COLUMN' : ['12/4', '54/19', np.NaN, '89/33']})

我们希望将该列分成两个不同的列,但我们知道当我们将其放回到DataFrame中时,pandas会搞乱它,因此让我们将其放入列表中:

split_list = list(df.TEXT_COLUMN.str.split('/'))

split_list返回,我们可以看到为什么在尝试解析时会出现浮点错误:

>> [['12','4'],['54','19'], np.NaN, ['89','33']]

现在我们有了这个列表,我们希望将它放置在一个理正空值问题的推导式中。我们可以通过在推导式中创建对类型的条件来实现:
better_split_list = [x if type(x) != np.float else [None,None] for x in split_list]
返回:
>> [['12','4'],['54','19'], [None,None], ['89','33']]

这使我们能够将列表的列表放入一个自己的pandas DataFrame中,列以更健壮的方式分隔:
pd.DataFrame(better_split_list, columns = ['VALUE_1','VALUE_2'])

0
问题显然是一些NoneNaN值,但如果使用参数expand=Truestr.split来处理新的DataFrame,它会正确处理。
此外,可以使用str.strip代替replace
df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
      "type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
      "F_ID" :[None, "[7 8 9]", "[10]", np.nan, "[2]", "0", "0", "0", "0"]})

print (df)
   ID type     F_ID
0   2    A     None
1   3    B  [7 8 9]
2   4    B     [10]
3   5    A      NaN
4   6    A      [2]
5   7    B        0
6   8    A        0
7   9    A        0
8  10    A        0

f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True)
print (f_id_df)
      0     1     2
0  None  None  None
1     7     8     9
2    10  None  None
3   NaN   NaN   NaN
4     2  None  None
5     0  None  None
6     0  None  None
7     0  None  None
8     0  None  None

最后如果需要将值转换为数字:

f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True).astype(float)
print (f_id_df)
      0    1    2
0   NaN  NaN  NaN
1   7.0  8.0  9.0
2  10.0  NaN  NaN
3   NaN  NaN  NaN
4   2.0  NaN  NaN
5   0.0  NaN  NaN
6   0.0  NaN  NaN
7   0.0  NaN  NaN
8   0.0  NaN  NaN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接