Python:如何处理列表中NaN的相等性?

4
我只想找出这些结果背后的逻辑:
>>>nan = float('nan')
>>>nan == nan
False 
# I understand that this is because the __eq__ method is defined this way
>>>nan in [nan]
True 
# This is because the __contains__ method for list is defined to compare the identity first then the content?

但是在这两种情况下,我认为幕后调用的函数是PyObject_RichCompareBool,对吗?为什么会有差异?它们不应该有相同的行为吗?


第一个不奇怪,因为nan在所有编程语言中都是这样的(来自标准),对于第二个我不确定。 - simonzack
__contains__ 可能会短路,因为 nan is nan == True。此外,float('nan') in [float('nan')] == False - lunixbochs
3个回答

6
但在两种情况下,我认为幕后调用了函数PyObject_RichCompareBool,为什么会有差异?它们不应该有相同的行为吗? ==从未直接在浮点对象上调用PyObject_RichCompareBool,浮点数有它们自己的rich_compare方法(用于__eq__),具体取决于传递给它的参数,可能会或可能不会调用PyObject_RichCompareBool
 /* Comparison is pretty much a nightmare.  When comparing float to float,
 * we do it as straightforwardly (and long-windedly) as conceivable, so
 * that, e.g., Python x == y delivers the same result as the platform
 * C x == y when x and/or y is a NaN.
 * When mixing float with an integer type, there's no good *uniform* approach.
 * Converting the double to an integer obviously doesn't work, since we
 * may lose info from fractional bits.  Converting the integer to a double
 * also has two failure modes:  (1) a long int may trigger overflow (too
 * large to fit in the dynamic range of a C double); (2) even a C long may have
 * more bits than fit in a C double (e.g., on a a 64-bit box long may have
 * 63 bits of precision, but a C double probably has only 53), and then
 * we can falsely claim equality when low-order integer bits are lost by
 * coercion to double.  So this part is painful too.
 */

static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
    double i, j;
    int r = 0;

    assert(PyFloat_Check(v));
    i = PyFloat_AS_DOUBLE(v);

    /* Switch on the type of w.  Set i and j to doubles to be compared,
     * and op to the richcomp to use.
     */
    if (PyFloat_Check(w))
        j = PyFloat_AS_DOUBLE(w);

    else if (!Py_IS_FINITE(i)) {
        if (PyInt_Check(w) || PyLong_Check(w))
            /* If i is an infinity, its magnitude exceeds any
             * finite integer, so it doesn't matter which int we
             * compare i with.  If i is a NaN, similarly.
             */
            j = 0.0;
        else
            goto Unimplemented;
    }
...

另一方面,list_contains 直接在项目上调用 PyObject_RichCompareBool,因此在第二种情况下您会得到 True。
请注意,这仅适用于 CPython。PyPy 的list.__contains__ 方法似乎只通过调用它们的 __eq__ 方法来比较项目。
$~/pypy-2.4.0-linux64/bin# ./pypy
Python 2.7.8 (f5dcc2477b97, Sep 18 2014, 11:33:30)
[PyPy 2.4.0 with GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>> nan = float('nan')
>>>> nan == nan
False
>>>> nan is nan
True
>>>> nan in [nan]
False

1
您说的是正确的,PyObject_RichCompareBool被调用了,可以看到listobject.c中的list_contains函数。

文档中说:

这相当于Python表达式o1 op o2,其中op是与opid对应的运算符。

但是,这似乎并不完全正确。

在cpython源代码中,我们有以下部分:

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

在这种情况下,由于对象相同,我们具有相等性。

0

从数学上讲,将无限与无限进行比较是没有意义的。这就是为什么nan没有定义相等性的原因。

对于nan in [nan]的情况,引用了不可变变量。但要小心:

>>> nan is nan
True

>>> float('nan') is float('nan')
False

在第一个案例中,引用了不可变变量。在第二个案例中,创建了两个不同的浮点数并进行了比较。


哇,那你怎么知道解析出来的 nan 是一个 nan 呢?你既不能用 == 也不能用 is 来与默认的 nan 对象进行比较。 - blueFast
我回答自己:在数学库中有一个 isnan 函数。 - blueFast

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接