这里提供一种基于差分的向量化方法,可与
differentiation
相关。
import numpy as np
import pandas as pd
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
样例输入,输出 -
原始样例:
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
修改大小写:
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
这是一个纯NumPy版本,与之前的版本相同,直到我们有了开始和停止。这是完整的实现 -
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
intvs = stops[:,1] - starts[:,1]
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
out = intvs2D.max(1)