在Pandas中将多个值替换为缺失值（None）

Question

在Pandas中将多个值替换为缺失值（None）

4

我有一个数据集 d，其中包含不同形式的缺失值：

 d = {'col1': [1, 2, '', 'N/A', 'unknown', None], 
      'col2': [3, 4, 'N/A', None, 'N/A_N/A', '']}
d = pd.DataFrame(data=d)

          col1     col2
0        1        3
1        2        4
2               N/A
3      N/A     None
4  unknown  N/A_N/A
5     None

我想看看每一列实际上有多少个值是缺失的。因此我想将所有空格、n/a和未知值转换为None。我尝试了这段代码并得到了以下结果:

d.replace(to_replace =['N/A', '', 'unknown', 'N/A_N/A'],  
                            value = None)

   col1  col2
0     1     3
1     2     4
2     2     4
3     2  None
4     2  None
5  None  None

我不明白为什么 d.replace 会这样做，有没有更好的解决方案？我希望它像这样：

     col1     col2
0        1        3
1        2        4
2      None     None
3      None     None
4      None     None
5      None     None

- Ping

你的代码按预期工作，你想要的输出是在替换值的基础上额外的东西。 - gold_cy

你可能想要用 np.NaN 替换原生的缺失值（然后，只需使用 df.isna().sum()）。 - nocibambi

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cs95 · Accepted Answer

这是已知的行为，并且在目标替换值为None时发生。可以说这是一种预期的状态，因为它是由参数处理方式导致的。

我可以建议使用to_numeric吗？

pd.to_numeric(df.stack(), errors='coerce').unstack()

   col1  col2
0   1.0   3.0
1   2.0   4.0
2   NaN   NaN
3   NaN   NaN
4   NaN   NaN
5   NaN   NaN

另外，如果您将字典传递给replace，您的代码将正常工作。

# df.replace({'': None, 'N/A': None, 'N/A_N/A': None, 'unknown': None})
df.replace(dict.fromkeys(['N/A', '', 'unknown', 'N/A_N/A'], None))

   col1  col2
0   1.0   3.0
1   2.0   4.0
2   NaN   NaN
3   NaN   NaN
4   NaN   NaN
5   NaN   NaN