使用 NumPy
:
>>> import numpy as np
>>> a = np.array([ True, True, True, False, False, False, False, True, True, True, False, False, True, False], dtype=bool)
>>> np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
array([3, 3, 1])
>>> a = np.array([True, False, False, True, True, False, False, True, False])
>>> np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
array([1, 2, 1])
虽然不能说这是最好的NumPy解决方案,但它仍然比itertools.groupby
更快:
>>> lis = [ True, True, True, False, False, False, False, True, True, True, False, False, True, False]*1000
>>> a = np.array(lis)
>>> %timeit [len(list(group)) for value, group in groupby(lis) if value]
100 loops, best of 3: 9.58 ms per loop
>>> %timeit np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
1000 loops, best of 3: 1.4 ms per loop
>>> lis = [ True, True, True, False, False, False, False, True, True, True, False, False, True, False]*10000
>>> a = np.array(lis)
>>> %timeit [len(list(group)) for value, group in groupby(lis) if value]
1 loops, best of 3: 95.5 ms per loop
>>> %timeit np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
100 loops, best of 3: 14.9 ms per loop
正如 @justhalf 和 @Mark Dickinson 在评论中指出的那样,上述代码在某些情况下不起作用,因此您需要首先在两端附加 False
:
In [28]: a
Out[28]:
array([ True, True, True, False, False, False, False, True, True,
True, False, False, True, False], dtype=bool)
In [29]: np.diff(np.where(np.diff(np.hstack([False, a, False])))[0])[::2]
Out[29]: array([3, 3, 1])