我不知道你是否有使用Numba,但它对于这样的特殊问题非常方便(而且快速):
import numba as nb
import math
@nb.njit
def max_consecutive_nan(arr):
max_ = 0
current = 0
idx = 0
while idx < arr.size:
while idx < arr.size and math.isnan(arr[idx]):
current += 1
idx += 1
if current > max_:
max_ = current
current = 0
idx += 1
return max_
针对你的例子:
>>> from numpy import nan
>>> max_consecutive_nan(np.array([nan, nan, 2, 1, 1, nan, nan, nan, nan, 0.101, nan, 0.16]))
4
>>> max_consecutive_nan(np.array([nan, nan, nan, 0.16, 1, 0.16, 0.9999, 0.0001, 0.16, 0.101, nan, 0.16]))
3
>>> max_consecutive_nan(np.array([0.16, 0.16, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]))
22
使用@Divarkar提出的基准测试,并按性能排序(完整的基准测试代码可以在此gist中找到):
arr = np.random.rand(10000)
arr[np.random.choice(range(len(arr)),size=1000,replace=0)] = np.nan
%timeit mine(arr) # 10000 loops, best of 3: 67.7 µs per loop
%timeit Divakar_v2(arr) # 1000 loops, best of 3: 196 µs per loop
%timeit Divakar(arr) # 1000 loops, best of 3: 252 µs per loop
%timeit Tagc(arr) # 100 loops, best of 3: 6.92 ms per loop
%timeit Kasramvd(arr) # 10 loops, best of 3: 38.2 ms per loop
%timeit pltrdy(arr) # 10 loops, best of 3: 70.9 ms per loop
np.concatenate
的代码,但是这对于更大的数组来说将会有很好的扩展性。 - MSeifert他的帖子
中发布了时间。请查看! - Divakar