如何在numpy数组中删除特定元素

Question

如何在numpy数组中删除特定元素

338

如何从NumPy数组中删除特定元素？假设我有：

import numpy as np

a = np.array([1,2,3,4,5,6,7,8,9])

我想从a中删除3,4,7。我知道的是这些值的索引（index=[2,3,6]）。

- Daniel Thaagaard Andreasen

13个回答

117

使用np.setdiff1d函数：

import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([3,4,7])
>>> c = np.setdiff1d(a,b)
>>> c
array([1, 2, 5, 6, 8, 9])

- Zong

10

了解了，我曾认为np.delete速度较慢，但经过测算1000个整数的时间，发现删除速度比预期快两倍。 - wbg

需要注意的是，这是一个集合差异，因此如果数组中有重复元素，它们也将被删除。考虑以下情况：a = np.array([-1,1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,9,10,1,2,3,99])和b=np.array([-1,99])，那么c= np.setdiff1d(a,b) => array( [1,2,3,4,5,6,7,8,9,10 ] )。 - athina.bikaki

55

Numpy数组是不可变的, 这意味着你在技术上不能从中删除一个元素。然而，你可以构建一个新的数组，其中不包含你不想要的值，像这样：

b = np.delete(a, [2,3,6])

- Digitalex

60

从技术上讲，NumPy数组是可变的。例如，a[0]=1会直接修改a。但它们不能被重新调整大小。 - btel

4

定义说明它是不可变的，但如果通过分配新值来修改它，那它怎么是不可变的呢？ - Devesh

数组的元素是可变的吗？ - undefined

51

按值删除：

modified_array = np.delete(original_array, np.where(original_array == value_to_delete))

- Prakhar Pandey

2

从numpy 1.19开始，可以直接执行以下操作： np.delete(original_array, original_array==value) https://numpy.org/doc/stable/reference/generated/numpy.delete.html - Alessandro Romancino

10

如果我们知道要删除的元素的索引，那么使用np.delete是最快的方法。但是，为了完整起见，让我介绍另一种使用np.isin创建的布尔掩码的“删除”数组元素的方法。该方法允许我们直接指定要删除的元素或它们的索引：

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

按索引删除:

indices_to_remove = [2, 3, 6]
a = a[~np.isin(np.arange(a.size), indices_to_remove)]

按元素删除（不要忘记重新创建原始的a，因为它在前一行中被重写）：

elements_to_remove = a[indices_to_remove]  # [3, 4, 7]
a = a[~np.isin(a, elements_to_remove)]

- Andreas K.

6

作为不熟悉numpy的人，我尝试了以下方法：

>>> import numpy as np
>>> import itertools
>>> 
>>> a = np.array([1,2,3,4,5,6,7,8,9])
>>> index=[2,3,6]
>>> a = np.array(list(itertools.compress(a, [i not in index for i in range(len(a))])))
>>> a
array([1, 2, 5, 6, 8, 9])

根据我的测试结果，这个方法的表现比numpy.delete()更好。我不知道为什么，也许是由于初始数组的大小较小所致？

python -m timeit -s "import numpy as np" -s "import itertools" -s "a = np.array([1,2,3,4,5,6,7,8,9])" -s "index=[2,3,6]" "a = np.array(list(itertools.compress(a, [i not in index for i in range(len(a))])))"
100000 loops, best of 3: 12.9 usec per loop

python -m timeit -s "import numpy as np" -s "a = np.array([1,2,3,4,5,6,7,8,9])" -s "index=[2,3,6]" "np.delete(a, index)"
10000 loops, best of 3: 108 usec per loop

这是一个相当显著的差异（与我预期的方向相反），有人知道为什么会这样吗？

更奇怪的是，将列表传递给 numpy.delete（） 的性能比循环遍历列表并给出单个索引要差。

python -m timeit -s "import numpy as np" -s "a = np.array([1,2,3,4,5,6,7,8,9])" -s "index=[2,3,6]" "for i in index:" "    np.delete(a, i)"
10000 loops, best of 3: 33.8 usec per loop

编辑：看起来与数组的大小有关。对于大型数组，使用numpy.delete()可以显着提高速度。

python -m timeit -s "import numpy as np" -s "import itertools" -s "a = np.array(list(range(10000)))" -s "index=[i for i in range(10000) if i % 2 == 0]" "a = np.array(list(itertools.compress(a, [i not in index for i in range(len(a))])))"
10 loops, best of 3: 200 msec per loop

python -m timeit -s "import numpy as np" -s "a = np.array(list(range(10000)))" -s "index=[i for i in range(10000) if i % 2 == 0]" "np.delete(a, index)"
1000 loops, best of 3: 1.68 msec per loop

显然，这些都不是很重要，因为你应该始终追求清晰，并避免重新发明轮子，但我觉得有一点有趣，所以我想把它留在这里。

- Gareth Latty

4

在比较时要小心！在第一次迭代中，您使用 a = delte_stuff(a) 会使 a 在每次迭代中变小。当您使用内置函数时，您没有将值存回 a，这将保持 a 的原始大小！此外，当您创建了一个 index 集并针对其进行检查以确定是否删除项时，可以大大加快函数的速度。修复这两件事后，对于10k项目，我获得：使用您的函数6.22毫秒每个循环，使用 numpy.delete 4.48 毫秒，这大致是您所期望的。 - Michael

3

两个提示：使用np.arange(x)代替np.array(list(range(x)))，用np.s_[::2]创建索引。 - Michael

6

如果您没有要删除的元素的索引，您可以使用numpy提供的in1d函数。

该函数返回True，如果第一个1-D数组中的元素也存在于第二个数组中。要删除这些元素，只需否定此函数返回的值即可。

请注意，该方法保留了原始数组的顺序。

In [1]: import numpy as np

        a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
        rm = np.array([3, 4, 7])
        # np.in1d return true if the element of `a` is in `rm`
        idx = np.in1d(a, rm)
        idx

Out[1]: array([False, False,  True,  True, False, False,  True, False, False])

In [2]: # Since we want the opposite of what `in1d` gives us, 
        # you just have to negate the returned value
        a[~idx]

Out[2]: array([1, 2, 5, 6, 8, 9])

- Luiz Otavio V. B. Oliveira

2

如果您不知道索引，就无法使用 logical_and

x = 10*np.random.randn(1,100)
low = 5
high = 27
x[0,np.logical_and(x[0,:]>low,x[0,:]<high)]

- idnavid

2

移除特定索引（我从矩阵中移除了16和21）

import numpy as np
mat = np.arange(12,26)
a = [4,9]
del_map = np.delete(mat, a)
del_map.reshape(3,4)

输出：

array([[12, 13, 14, 15],
      [17, 18, 19, 20],
      [22, 23, 24, 25]])

- Raja Ahsan Zeb

2

列表推导式也可以是一个有趣的方法。

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
index = np.array([2, 3, 6]) #index is changed to an array.  
out = [val for i, val in enumerate(a) if all(i != index)]
>>> [1, 2, 5, 6, 8, 9]

- Mauricio Arboleda-Zapata

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Levon · Accepted Answer

使用 numpy.delete()，它返回一个带有已删除沿轴的子数组的新数组。

numpy.delete(a, index)

对于您的具体问题：

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
index = [2, 3, 6]

new_a = np.delete(a, index)

print(new_a)
# Output: [1, 2, 5, 6, 8, 9]

请注意，numpy.delete() 返回一个新的数组，因为array scalars是不可变的，类似于Python中的字符串，所以每次对它进行更改时，都会创建一个新对象。即，引用delete() docs：

"删除了由obj指定的元素的arr的副本。请注意，删除不会发生在原地..."

如果我发布的代码有输出，那么这是运行代码的结果。