在numpy数组中每隔n个时间删除一系列元素

5

我知道如何删除numpy数组中的每个第四个元素:

frame = np.delete(frame,np.arange(4,frame.size,4))

现在我想知道是否有一个简单的命令可以删除每第n(例如4)次出现的3个值。
一个基本的例子:
输入:[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20....]
将导致:
输出:[1,2,3,7,8,9,13,14,15,19,20,...]
我希望能有一个简单的numpy / python功能,而不是编写一个必须迭代向量的函数(因为在我的情况下它相当长......)。
谢谢你的帮助
3个回答

2

方法 #1:以下是一种使用模数布尔索引的方法 -

a[np.mod(np.arange(a.size),6)<3]

作为一个函数,它的翻译如下:
def select_in_groups(a, M, N): # Keep first M, delete next N and so on.
    return a[np.mod(np.arange(a.size),M+N)<M]

样例逐步运行 -

# Input array
In [361]: a
Out[361]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

# Create a range array that spans along the length of array
In [362]: np.arange(a.size)
Out[362]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

# Use modulus to create "intervaled" version of it that shifts at
# the end of each group of 6 elements
In [363]: np.mod(np.arange(a.size),6)
Out[363]: array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1])

# We need to select the first three as valid ones, so compare against 3
# creating a boolean array or mask
In [364]: np.mod(np.arange(a.size),6) < 3
Out[364]: 
array([ True,  True,  True, False, False, False,  True,  True,  True,
       False, False, False,  True,  True,  True, False, False, False,
        True,  True], dtype=bool)

# Use the mask to select valid elements off array
In [365]: a[np.mod(np.arange(a.size),6)<3]
Out[365]: array([ 1,  2,  3,  7,  8,  9, 13, 14, 15, 19, 20])

方法二:为了提高性能,这里介绍另一种使用NumPy数组步幅的方法 -

def select_in_groups_strided(a, M, N): # Keep first M, delete next N and so on.
    K = M+N
    na = a.size
    nrows = (1+((na-1)//K))
    n = a.strides[0]
    out = np.lib.index_tricks.as_strided(a, shape=(nrows,K), strides=(K*n,n))
    N = M*(na//K) + (na - (K*(na//K)))
    return out[:,:M].ravel()[:N]

样例运行 -

In [545]: a = np.arange(1,21)

In [546]: a
Out[546]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [547]: select_in_groups_strided(a,3,3)
Out[547]: array([ 1,  2,  3,  7,  8,  9, 13, 14, 15, 19, 20])

In [548]: a = np.arange(1,25)

In [549]: a
Out[549]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24])

In [550]: select_in_groups_strided(a,3,3)
Out[550]: array([ 1,  2,  3,  7,  8,  9, 13, 14, 15, 19, 20, 21])

运行时测试

使用与@Daniel Forsman的时间测试相同的设置-

In [637]: a = np.arange(1,21)

In [638]: %timeit block_delete(a,3,3)
10000 loops, best of 3: 21 µs per loop

In [639]: %timeit select_in_groups_strided(a,3,3)
100000 loops, best of 3: 6.44 µs per loop

In [640]: a = np.arange(1,2100)

In [641]: %timeit block_delete(a,3,3)
10000 loops, best of 3: 27 µs per loop

In [642]: %timeit select_in_groups_strided(a,3,3)
100000 loops, best of 3: 9.1 µs per loop

In [643]: a = np.arange(999999) + 1

In [644]: %timeit block_delete(a,3,3)
100 loops, best of 3: 2.24 ms per loop

In [645]: %timeit select_in_groups_strided(a,3,3)
1000 loops, best of 3: 1.12 ms per loop

Strided 在不同的尺寸上表现出色,如果你考虑性能方面。


谢谢,这个有效,你能解释一下这个命令是做什么的吗?我不太清楚",6"是什么意思。 - Kev1n91
是的,我想到了as_strided的答案,只是无法理解它。 - Daniel F

2
一种使用布尔索引的方法:
def block_delete(a, n, m):  #keep n, remove m
    mask = np.tile(np.r_[np.ones(n), np.zeros(m)].astype(bool), a.size // (n + m) + 1)[:a.size]
    return a[mask]

与 @Divakar 相比较:
def mod_delete(a, n, m):
    return a[np.mod(np.arange(a.size), n + m) < n]

a = np.arange(19) + 1

%timeit block_delete(a, 3, 4)
10000 loops, best of 3: 50.6 µs per loop

%timeit mod_delete(a, 3, 4)
The slowest run took 9.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.69 µs per loop

让我们试着使用更长的数组:

a = np.arange(999) + 1

%timeit block_delete(a, 3, 4)
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 54.8 µs per loop

%timeit mod_delete(a, 3, 4)
The slowest run took 5.13 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 14.5 µs per loop

更长的:

a = np.arange(999999) + 1

%timeit block_delete(a, 3, 4)
100 loops, best of 3: 3.93 ms per loop

%timeit mod_delete(a, 3, 4)
100 loops, best of 3: 12.3 ms per loop

因此,哪个更快将取决于您的数组大小。

0
import numpy as np
a = np.array([10, 0, 0, 20, 0, 30, 40, 50, 0, 60, 70, 80, 90, 100,0])
print("Original array:")
print(a)
index=np.zeros(0)
for i in range(len(a)):
    if a[i]==0:
        index=np.append(index, i)
print("index=",index)
new_a=np.delete(a,index)
print("new_a=",new_a)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接