在pandas中出现错误：TypeError: object of type 'float' has no len()

Question

在pandas中出现错误：TypeError: object of type 'float' has no len()

3

我有一个pandas数据框 df

import numpy as np
import pandas as pd

df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
      "type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
      "F_ID" :["0", "[7 8 9]", "[10]", "0", "[2]", "0", "0", "0", "0"]})

# convert the string representations of list structures to actual lists
F_ID_as_series_of_lists = df["F_ID"].str.replace("[","").str.replace("]","").str.split(" ")

#type(F_ID_as_series_of_lists) is pd.Series, make it a list for pd.DataFrame.from_records
F_ID_as_records = list(F_ID_as_series_of_lists)

f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)

我在以下代码行中遇到了错误:

f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)

错误信息为: TypeError: object of type 'float' has no len()

如何解决这个问题？

- Archit

2个回答

0

问题显然是一些None或NaN值，但如果使用参数expand=True的str.split来处理新的DataFrame，它会正确处理。

此外，可以使用str.strip代替replace：

df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
      "type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
      "F_ID" :[None, "[7 8 9]", "[10]", np.nan, "[2]", "0", "0", "0", "0"]})

print (df)
   ID type     F_ID
0   2    A     None
1   3    B  [7 8 9]
2   4    B     [10]
3   5    A      NaN
4   6    A      [2]
5   7    B        0
6   8    A        0
7   9    A        0
8  10    A        0

f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True)
print (f_id_df)
      0     1     2
0  None  None  None
1     7     8     9
2    10  None  None
3   NaN   NaN   NaN
4     2  None  None
5     0  None  None
6     0  None  None
7     0  None  None
8     0  None  None

最后如果需要将值转换为数字：

f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True).astype(float)
print (f_id_df)
      0    1    2
0   NaN  NaN  NaN
1   7.0  8.0  9.0
2  10.0  NaN  NaN
3   NaN  NaN  NaN
4   2.0  NaN  NaN
5   0.0  NaN  NaN
6   0.0  NaN  NaN
7   0.0  NaN  NaN
8   0.0  NaN  NaN

- jezrael

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sean O'Malley · Accepted Answer

还有另一种方法，使用列表推导式并利用我们从类型错误中学到的知识。

假设您有一个pandas系列，其数据类型为字符串，您想根据'/'符号将该列拆分为两个部分，但并非所有列都被填充。

pd.DataFrame({'TEXT_COLUMN' : ['12/4', '54/19', np.NaN, '89/33']})

我们希望将该列分成两个不同的列，但我们知道当我们将其放回到DataFrame中时，pandas会搞乱它，因此让我们将其放入列表中：

split_list = list(df.TEXT_COLUMN.str.split('/'))

split_list返回，我们可以看到为什么在尝试解析时会出现浮点错误：

>> [['12','4'],['54','19'], np.NaN, ['89','33']]

现在我们有了这个列表，我们希望将它放置在一个理正空值问题的推导式中。我们可以通过在推导式中创建对类型的条件来实现：

better_split_list = [x if type(x) != np.float else [None,None] for x in split_list]

返回：

>> [['12','4'],['54','19'], [None,None], ['89','33']]

这使我们能够将列表的列表放入一个自己的pandas DataFrame中，列以更健壮的方式分隔：

pd.DataFrame(better_split_list, columns = ['VALUE_1','VALUE_2'])