数据框中非数字和零值单元格的索引

Question

数据框中非数字和零值单元格的索引

3

我有一个数据集（比这个样本更大），其中存储了浮点数。但是有些数据缺失了。如何找出所有缺失数据或非数字值的索引？我在SO上寻找类似的问题，但大多数都是关于删除行，但可能之前有类似的问题，但我找不到它。我需要替换这些值，所以我需要识别它们。

我想获得单元格 [86,2], [87,2], [87,3] 的索引。如何轻松地检索它们？

         0       1       2       3       4
85  1645.0  1596.0  1578.0  1567.0  1580.0
86  1554.0  1506.0     0.0  1466.0  1469.0
87  1588.0  1510.0    'ff'       0  1489.0

如果有人需要重新创建示例，则我会包含JSON.

{"0":{"85":1645.0,"86":1554.0,"87":1588.0},"1":{"85":1596.0,"86":1506.0,"87":1510.0},"2":{"85":1578.0,"86":0.0,"87":'ff'},"3":{"85":1567.0,"86":1466.0,"87":0},"4":{"85":1580.0,"86":1469.0,"87":1489.0}}

- Jsowa

1个回答

阿里云服务器只需要99元/年，新老用户同享，点击查看详情

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，

- Shubham Sharma · Accepted Answer

您可以使用带有可选参数errors='coerce'的pd.to_numeric将数据框中每个系列转换为数字类型（如果可能），否则无法转换的值将被替换为NaN值。然后，您可以在d等于0或NaN的条件下创建一个掩码m。接下来，使用DataFrame.stack使掩码m中的列堆叠到多级索引中，创建一个系列s。现在，过滤值为True的这个系列s。然后，您可以使用Series.index.tolist()获取所需的indices。

d = df.apply(lambda s: pd.to_numeric(s, errors="coerce"))
m = d.eq(0) | d.isna()
s = m.stack()
indices = s[s].index.tolist()

中间步骤：

# print(d)
         0       1       2       3       4
85  1645.0  1596.0  1578.0  1567.0  1580.0
86  1554.0  1506.0     0.0  1466.0  1469.0
87  1588.0  1510.0     NaN     0.0  1489.0

# print(m)
       0      1      2      3      4
85  False  False  False  False  False
86  False  False   True  False  False
87  False  False   True   True  False

# print(s)
85  0    False
    1    False
    2    False
    3    False
    4    False
86  0    False
    1    False
    2     True
    3    False
    4    False
87  0    False
    1    False
    2     True
    3     True
    4    False
dtype: bool

# print(s[s])
86  2    True
87  2    True
    3    True
dtype: bool

结果：

# print(indices)

[('86', '2'), ('87', '2'), ('87', '3')]