如何一次从数组中删除多个值

Question

如何一次从数组中删除多个值

8

有没有更好的方法（更简单，更易读，更符合Python风格，更高效等）可以从数组中删除多个值？比如下面这种方法：

import numpy as np

# The array.
x = np.linspace(0, 360, 37)

# The values to be removed.
a = 0
b = 180
c = 360

new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a,
                                                              x == b),
                                                x == c)))

对于这个问题，一个好的答案应产生与上面代码相同的结果（即新数组new_array），但可能在处理浮点数之间的等式时比上面的代码更好。

额外奖励

有人能解释一下为什么这会产生错误的结果吗？

In [5]: np.delete(x, x == a)
/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer
  "of casting it to integer", FutureWarning)
Out[5]: 
array([  20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,  100.,
        110.,  120.,  130.,  140.,  150.,  160.,  170.,  180.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.,  360.])

值 0 和 10 都已被删除，而不仅仅是 0 (a)。

注意，x == a 如预期一样（因此问题在于 np.delete 内部）：

In [6]: x == a
Out[6]: 
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False, False], dtype=bool)

请注意，np.delete(x, np.where(x == a))也可以得到正确的结果。因此，我认为np.delete无法处理布尔索引。

- abcd

2

如果您有布尔索引，就不需要使用 delete。它指定了 obj: slice、int 或 int 数组（没有布尔值）。 - hpaulj

3个回答

6

你的代码看起来有些复杂。我想知道你是否考虑过使用numpy的布尔向量索引。

在与你相同的设置下，我计时了你的代码：

In [175]: %%timeit
   .....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
   .....:
10000 loops, best of 3: 32.9 µs per loop

然后我计时了两次使用布尔索引的应用程序。

In [176]: %%timeit
   .....: x1 = x[x != a]
   .....: x2 = x1[x1 != b]
   .....: new_array = x2[x2 != c]
   .....:
100000 loops, best of 3: 6.56 µs per loop

最后，为了方便编程并将该技术扩展到任意数量的排除值，我将相同的代码重写为循环。这会稍微慢一些，因为需要先进行复制，但仍然相当可观。

In [177]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[new_array != val]
   .....:
100000 loops, best of 3: 7.61 µs per loop

我认为真正的收益在于编程的清晰度。最后，我认为最好验证这三个算法确实给出了相同的结果...

In [179]: new_array1 = np.delete(x,
   .....:                 np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))

In [180]: x1 = x[x != a]

In [181]: x2 = x1[x1 != b]

In [182]: new_array2 = x2[x2 != c]

In [183]: new_array3 = x.copy()

In [184]: for val in (a, b, c):
   .....:         new_array3 = new_array3[new_array3 != val]
   .....:

In [185]: all(new_array1 == new_array2)
Out[185]: True

In [186]: all(new_array1 == new_array3)
Out[186]: True

为了处理浮点数比较的问题，你需要使用numpy的isclose()函数。正如预期的那样，这会导致时间变慢：

In [188]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[~np.isclose(new_array, val)]
   .....:
10000 loops, best of 3: 126 µs per loop

你的奖励答案包含在警告中，但是除非你知道False和True分别与零和一相等，否则这个警告并不是很有用。因此，你的代码等同于：

np.delete(1, 1)

正如警告所说的那样，numpy团队最终打算改变使用布尔参数调用np.delete()的结果，但目前它只接受索引参数。

- holdenweb

哎呀，浮点数比较修复带来了巨大的速度代价。 - abcd

关于警告，感谢您的解释。警告说“插入”，但实际上是“删除” - 这让我有些困惑。 - abcd

1

是的，F-P比较代价很高，因为函数不是计算x == y，而是要计算x-delta <= y <= x+delta，这是一个更复杂的计算。我已经报告了错误信息的问题 - 这是错误报告。 - holdenweb

2

你可以借鉴np.allclose的方法来测试浮点数是否相等：

def float_equal(x,y,rtol=1.e-5, atol=1.e-8):
   return np.less_equal(abs(x-y), atol + rtol * abs(y))

np.delete(x,np.where(np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]])))

where部分产生：

(array([ 0, 18, 36]),)

float_equal 可以改为对 x 进行广播，消除列表理解。

我使用了 logical_or 是一个 ufunc 并且有一个 reduce 方法的事实。

您不需要使用 where；只需将 logical_or 的结果用作布尔索引：

I = np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]])
x[~I]

通过这个小例子可以看出，直接使用布尔值比np.delete(np.where(...))方法快2倍。

对于这个x，==得到的结果是相同的：

np.where(np.logical_or.reduce([x==y for y in [0,180,360]]))
# (array([ 0, 18, 36]),)

那么这种向量化的方法也是如此：

abc = np.array([0,180,360])
np.where(np.sum(x==abc[:,None],axis=0))
# (array([ 0, 18, 36]),)

x==abc[:,None]是一个(3,37)的布尔数组；np.sum的作用类似于逻辑或。

我的float_equal也是这样工作的：

float_equal(x,abc[:,None]).sum(axis=0)

- hpaulj

np.delete(x,np.where(x==a)) 的行为是删除数组中等于 a 的元素。x==a 是 delete 函数错误的输入类型。 - hpaulj

啊，我明白了——在你调用np.delete时，你没有使用布尔索引。我之前误读了你的回答。 - abcd

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- styvane · Accepted Answer

您也可以使用np.ravel获取values的索引，然后使用np.delete删除它们。

In [32]: r =  [a,b,c]

In [33]: indx = np.ravel([np.where(x == i) for i in r])

In [34]: indx
Out[34]: array([ 0, 18, 36])

In [35]: np.delete(x, indx)
Out[35]: 
array([  10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,
        100.,  110.,  120.,  130.,  140.,  150.,  160.,  170.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.])