使用 NumPy 计算连续出现的次数

Question

使用 NumPy 计算连续出现的次数

3

我正在尝试使用纯粹的numpy计算正面、负面和无连续次数。问题在于我需要找到方程中groupby组件的方法，所有我的研究都表明我需要这个东西。我在这里找到了一个pandas的解决方案Pythonic way to calculate streaks in pandas dataframe 除了groupby之外，我已经能够转换所有内容。感谢任何帮助。

以下是我想要复制的pandas代码。唯一不适用于numpy的是groupby。我还创建了自己的numpy移位函数。

Pandas 版本：

def streaks(df, col):
    sign = np.sign(df[col])
    s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
    return df.assign(u_streak=s.where(s>0, 0.0), 
    d_streak=s.where(s<0,0.0).abs())

我部分的NumPy版本：

arr = np.array([0.2,0.1,0.1,0.0,-0.2,-0.1,0.0])
sign = np.sign(arr)
s = np.not_equal(sign, shift(sign))
# now I need to groupby and then sum and sum again 
np.cumsum(groupby(np.cumsum(s)))

期望的结果应该是：

array([1.,2.,3.,0.,-1.,-2.,0.])

- John Holmes

可能是有没有numpy分组函数？的重复问题。 - bjschoenfeld

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ben.T · Accepted Answer

如果要使用完整版的numpy，你不必使用一种groupby的方式，可以这样做：

最初的回答：

arr = np.array([0.2,0.1,0.1,0.0,-0.2,-0.1,0.0])
sign = np.sign(arr)
s = np.abs(sign).cumsum() # or s = (arr != 0).cumsum()
streaks = (s - np.maximum.accumulate(np.where(arr == 0, s, 0)))*sign
print (streaks)
#[ 1.  2.  3.  0. -1. -2.  0.]

它的作用是，每当arr中的值不为0时，s就会增加，然后您将从实际为0的位置的累积最大值中删除它，以便在下一个连续计数开始时将其“重新启动”为1，您只需将其乘以sign即可获得您预期的输出。 编辑：以上方法假定正负连续之间存在0，为了不假设这一点，您可以通过将正负情况分开来实现。

arr = np.array([1.2,-1.2,0.2,0.1,0.1,0.0,-0.2,-0.1,0.0])
pos = np.clip(arr, 0, 1).astype(bool).cumsum()
neg = np.clip(arr, -1, 0).astype(bool).cumsum()
streaks = np.where(arr >= 0, pos-np.maximum.accumulate(np.where(arr <= 0, pos, 0)),
                             -neg+np.maximum.accumulate(np.where(arr >= 0, neg, 0)))
print (streaks)
#[ 1 -1  1  2  3  0 -1 -2  0]