标量的奇怪Numpy除法行为

Question

标量的奇怪Numpy除法行为

3

我一直在尝试升级一个库，其中包含许多针对标量的几何操作，使它们也能与numpy数组一起使用。在这个过程中，我注意到了一些关于numpy除法的奇怪行为。

在原始代码中，检查两个变量之间的归一化差异，如果两个变量都不为零，则交换到numpy后看起来像这样：

import numpy as np

a = np.array([0, 1, 2, 3, 4])
b = np.array([1, 2, 3, 0, 4])

o = np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(a != 0, b != 0))
print(f'First implementation: {o}')

当我传入一个初始化为零的输出缓冲区来处理无法计算的情况时，它会返回:

First implementation: [ 0.         -0.5        -0.33333333  0.          0.        ]

对于标量，我不得不稍微修改一下，因为out需要一个数组，但看起来还不错。

a = 0
b = 4
o = None if np.isscalar(a) else np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(b != 0, a != 0))
print(f'Modified out for scalar: {o}')

返回值

Modified out for scalar: 0.0.

接着我通过一些测试函数运行了这段代码，结果发现其中很多函数都失败了。深入研究后，我发现当我第一次调用带有 where 设置为 False 的除法函数时，它会返回零值，但如果我再次调用该函数，第二次它将返回不可预测的结果。

a = 0
b = 4
print(f'First divide: {np.divide(b, a, where=False)}')
print(f'Second divide: {np.divide(b, a, where=False)}')

返回值

First divide: 0.0
Second divide: 4.0

看文档上说，“其中条件为 False 的位置将保持未初始化”，所以我猜 numpy 有一些内部缓冲区，最初被设置为零，然后随后它就会携带早期的中间值。

我很难看出如何使用 divide，无论是否使用 where 子句; 如果我使用 where，我会得到一个不可预测的输出，如果我不使用，则无法防止被零除。我是错过了什么，还是我只需要在这些情况下具有不同的代码路径？我意识到我已经有了一个不同的代码路径与 out 变量。

我会很感激任何建议。

- John M.

在使用where时，必须使用 out 参数。否则非where值将是未指定的（不可预测的）。 - hpaulj

out 必须是 ndarray、None 或 ndarray 和 None 的元组。如果传入标量将会失败。 - John M.

抱歉，我错过了您已经找到使用数组（np.zeros）的正确方法这一事实。至于使用标量进行np.divide - 为什么？您可以将其设置为 out=np.array(0)，结果将是一个类似大小的数组，而不是“真正”的标量。 - hpaulj

现在我正在更新的库使用标量，有很多代码在使用它。我想改变它的行为方式，使其对标量的处理与以前完全相同，但是任何你可以传递标量的地方也可以传递一个numpy数组。这是完全可行的，我只想尽可能少地分叉代码路径。 - John M.

1

当它说“未初始化”时，意味着您不应该依赖于任何特定的值。这与使用np.empty相同。有时你会看到一些奇怪的数字，其他时候则是类似于先前缓冲区的值。这完全取决于内存如何被使用或重用。对于out=None，where=False的值是不可预测的。 - hpaulj

显示剩余2条评论

2个回答

0

说句实话，如果我能帮到任何人，我得出的结论是我需要包装np.divide以便在可以接受数组和标量的函数中安全使用它。这是我的包装函数：

import numpy as np

def divide_where(a, b, where, out=None, fill=0):
    """ wraps numpy divide to safely handle where clause for both arrays and scalars
    - a: dividend array or scalar
    - b: divisor array or scalar
    - where: locations where is True a/b will be set
    - out: location where data is written to; if None, an output array will be created using fill value
    - fill: defines fill value. if scalar and where True value will used; if out not set fill value is used creating output array
    """
    if (a_is_scalar := np.isscalar(a)) and np.isscalar(b):
        return fill if not where else a / b
    if out is None:
        out = np.full_like(b if a_is_scalar else a, fill)
    return np.divide(a, b, out=out, where=where)

- John M.

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- tdelaney · Accepted Answer

在我看来，这似乎是一个错误。但是出于性能原因，我认为您想要在标量的情况下短路ufuncs的调用，因此问题在于如何避免它变得太混乱。由于a或b都可以是标量，所以需要检查它们两个。将该检查放入条件性返回输出数组或None的函数中，您可以这样做：

def scalar_test_np_zeros(a, b):
    """Return np.zeros for the length of arguments unless both
    arguments are scalar, then None."""
    if a_is:=np.isscalar(a) and np.isscalar(b):
        return None
    else:
        return np.zeros(len(a) if a_is else len(b))

a = 0
b = 4
if o := scalar_test_np_zeros(a, b) is None:
    o = (a-b)/b if a and b else 0.0
else:
    np.divide(np.subtract(a, b), b, out=o, 
        where=np.logical_and(b != 0, a != 0))

标量测试可在其他存在类似问题的代码中发挥作用。