为什么 pd.Series([np.nan]) | pd.Series([True]) 的结果是 False？

Question

为什么 pd.Series([np.nan]) | pd.Series([True]) 的结果是 False？

9

为什么以下代码会返回False？

>>> pd.Series([np.nan]) | pd.Series([True])
0    False
dtype: bool

- JZ1

看起来像是一个 bug，因为交换顺序后的 yield 也是 True。应该在他们的 Github 上开一个 issue。 - rafaelc

这很有趣。请注意，np.nan or True 的计算结果为 nan，基本上，nan 将在您的操作中传播。真正奇怪的是，bool(np.nan) 实际上会返回 True，更加奇怪的是，pd.Series([np.nan],dtype=np.bool) 会给你一个只有一个 True 的系列。 - juanpa.arrivillaga

为了使故事更有趣，pd.NA（而不是np.nan）不会传播。 - rafaelc

3

这是来自pandas GitHub页面的相关讨论。 - ayhan

确实很有趣，因为 np.logical_or(np.nan, True) 的结果是 True。 - Roy2012

相关线程在这里：https://dev59.com/0loU5IYBdhLWcg3wxI2A - Ji Wei

2个回答

1

将您的案例（使用显式的dtype来强调推断出的类型）进行比较：

In[11]: pd.Series([np.nan], dtype=float) | pd.Series([True])

Out[11]: 
0    False
dtype: bool

和类似的另一个（只是 dtype 现在是 bool）：

In[12]: pd.Series([np.nan], dtype=bool) | pd.Series([True])

Out[12]: 
0    True
dtype: bool

你看到区别了吗？

解释：

In the first case (yours), np.nan propagates itself in the logical operation or (under the hood)
```
In[13]: np.nan or True
Out[13]: nan
```
and pandas treated np.nan as False in the context of an boolean operation result.
In the second case the output is unambiguous, as the first series has a boolean value (True, as all non-zero values are considered True, including np.nan, but it doesn't matter in this case):
```
In[14]: pd.Series([np.nan], dtype=bool)
```
```
Out[14]: 
0    True
dtype: bool
```
and True or True gives True, of course:
```
In[15]: True or True
Out[15]: True
```

- MarianD

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Reuben · Accepted Answer

我认为这是因为np.nan具有float的元类，并且我猜测覆盖了__bool__以使其非零：

np.nan.__bool__() == True

以同样的方式：

>>>np.nan or None
nan

在pandas中的解决方案是：

pd.Series([np.nan]).fillna(False) | pd.Series([True])

为了更清楚，pandas 0.24.1版本中，在方法_bool_method_SERIES的1816行，位于.../pandas/core/ops.py文件中，有一个赋值语句：

    fill_bool = lambda x: x.fillna(False).astype(bool)

这就是你所描述的行为的来源。也就是说，它被特意设计成在进行或运算时像False值一样对待np.nan。