在numpy数组中查找连续的一个

Question

在numpy数组中查找连续的一个

4

如何在下面的numpy数组中找到每行连续的1（或其他任何值）的数量？我需要一个纯numpy的解决方案。

array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
       [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

我的问题分为两部分，首先是：一行中最多有多少个1？应为

array([2,3,2])

首先，例子中有什么问题。

其次，第一个连续的一串1在哪里开始？对于这个例子来说，它应该是

array([3,9,9])

在这个例子中，我将两个连续的1放在一起。但是应该有可能将其改为连续5个1，这很重要。

一个类似的问题使用np.unique进行回答，但它仅适用于单个行而不是具有多个行的数组，因为结果会有不同的长度。

- Nickpick

第一行为什么是“2”？ - Divakar

更正所需的输出。 - Nickpick

很不幸，由于速度非常重要，所以它必须是纯numpy。Pandas会慢得多。 - Nickpick

如果pandas不慢，你会使用它吗？ - Divakar

当然。如果只回答第一个问题，我会非常高兴。然后我可以继续前进，看看还缺什么。 - Nickpick

显示剩余3条评论

2个回答

0

我认为一个非常相似的问题是检查排序行之间的元素差是否为某个特定值。如果5个连续元素之间有1的差异，则如下所示。也可以针对两张卡片的0差异进行操作：

cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)

- Nickpick

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

这里提供一种基于差分的向量化方法，可与differentiation相关。

import numpy as np
import pandas  as pd

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))

# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]

样例输入，输出 -

原始样例：

In [574]: counts
Out[574]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
       [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)

修改大小写：

In [577]: counts
Out[577]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
   [0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
   [0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])

In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)

这是一个纯NumPy版本，与之前的版本相同，直到我们有了开始和停止。这是完整的实现 -

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]

# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs

# Get max along each row as final output
out = intvs2D.max(1)