假设范围不重叠,您可以构建一个掩码,在索引在由
array1
和
array2
指定的范围之间为非零,然后使用
np.flatnonzero
获取索引数组--所需的
array3
:
import numpy as np
array1 = np.array([10, 65, 200])
array2 = np.array([14, 70, 204])
first, last = array1.min(), array2.max()
array3 = np.zeros(last-first+1, dtype='i1')
array3[array1-first] = 1
array3[array2-first] = -1
array3 = np.flatnonzero(array3.cumsum())+first
print(array3)
产量
[ 10 11 12 13 65 66 67 68 69 200 201 202 203]
对于较大的array1
,使用using_flatnonzero
可能比using_loop
更快:
def using_flatnonzero(array1, array2):
first, last = array1.min(), array2.max()
array3 = np.zeros(last-first+1, dtype='i1')
array3[array1-first] = 1
array3[array2-first] = -1
return np.flatnonzero(array3.cumsum())+first
def using_loop(array1, array2):
return np.concatenate([np.arange(array1[i], array2[i]) for i in
np.arange(0,len(array1))])
array1, array2 = (np.random.choice(range(1, 11), size=10**4, replace=True)
.cumsum().reshape(2, -1, order='F'))
assert np.allclose(using_flatnonzero(array1, array2), using_loop(array1, array2))
In [260]: %timeit using_loop(array1, array2)
100 loops, best of 3: 9.36 ms per loop
In [261]: %timeit using_flatnonzero(array1, array2)
1000 loops, best of 3: 564 µs per loop
如果范围重叠,那么
using_loop
将返回一个包含重复项的
array3
。而
using_flatnonzero
则返回一个没有重复项的数组。
解释:让我们来看一个小例子。
array1 = np.array([10, 65, 200])
array2 = np.array([14, 70, 204])
目标是构建一个类似下面的数组
goal
。1的位置位于索引值
[10, 11, 12, 13, 65, 66, 67, 68, 69, 200, 201, 202, 203]
(即
array3
)。
In [306]: goal
Out[306]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int8)
一旦我们有了goal
数组,就可以通过调用np.flatnonzero
来获得array3
:
In [307]: np.flatnonzero(goal)
Out[307]: array([ 10, 11, 12, 13, 65, 66, 67, 68, 69, 200, 201, 202, 203])
goal
的长度与array2.max()
相同:
In [308]: array2.max()
Out[308]: 204
In [309]: goal.shape
Out[309]: (204,)
因此,我们可以开始分配
goal = np.zeros(array2.max()+1, dtype='i1')
然后在由array1
给出的索引位置处填入1,在由array2
给出的索引位置处填入-1:
In [311]: goal[array1] = 1
In [312]: goal[array2] = -1
In [313]: goal
Out[313]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
-1], dtype=int8)
现在应用
cumsum
(累计求和)可以生成所需的
goal
数组:
In [314]: goal = goal.cumsum(); goal
Out[314]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0])
In [315]: np.flatnonzero(goal)
Out[315]: array([ 10, 11, 12, 13, 65, 66, 67, 68, 69, 200, 201, 202, 203])
这就是使用
using_flatnonzero
的主要思想。减去
first
只是为了节省一点内存。
numpy-onic
要求消除显式循环。 :) - hpaulj