在一个numpy数组中找到所有值的序列（以及最长的序列）

Question

在一个numpy数组中找到所有值的序列（以及最长的序列）

3

我有一个案例，需要找到一个numpy数组中数值为1的若干项的最优分布。假设我有以下数组，该数组仅以随机顺序包含0和1：

import numpy as 

# this 1d array can have up to 10000 elements

data = np.array([
0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 0, 0, 1, 1,
1, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 1,
])

num_of_ones_to_fill_gaps = 5

此外，我有一定数量的n个1 （num_of_ones_to_fill_gaps），需要以某种方式在数组中分布，以建立最长可能的连续1序列。使用num_of_ones_to_fill_gaps=5（可以用五个1来填补值为0的空隙），例如有三种结果，它们都具有长度为11的最长1序列。

        a)                           b)                       c)
result = np.array([    |    result = np.array([    |  result = np.array([ 
0, 0, 0, 0, 0, 1, 0,   |    0, 0, 0, 0, 0, 1, 1,   |  0, 0, 0, 0, 0, 1, 1, 
                                                                     ^  ^
0, 1, 0, 0, 0, 1, 1,   |    0, 1, 0, 0, 0, 1, 1,   |  0, 1, 0, 0, 0, 1, 1, 
                                                      ^  ^  ^  ^  ^  ^  ^
1, 1, 0, 0, 0, 1, 1,   |    1, 1, 0, 0, 0, 1, 1,   |  1, 1, 0, 0, 0, 1, 1, 
                                                      ^  ^
0, 0, 0, 0, 1, 1, 1,   |    0, 0, 0, 1, 1, 1, 1,   |  0, 0, 0, 0, 1, 1, 1, 
            ^  ^  ^    |             ^  ^  ^  ^    |   
1, 1, 1, 1, 1, 1, 1,   |    1, 1, 1, 1, 1, 1, 1,   |  0, 0, 0, 0, 1, 1, 1, 
^  ^  ^  ^  ^  ^  ^    |    ^  ^  ^  ^  ^  ^  ^    |   
1, 0, 0, 0, 0, 0, 1,   |    0, 0, 0, 0, 0, 0, 1,   |  0, 0, 0, 0, 0, 0, 1, 
^                      |                           |   
])                     |    ])                     |  ])

我的第一个问题是，是否有可能numpy提供内置的向量化方法来计算最长可能的1序列，并返回相同长度结果的起始和结束索引（多个）？

result = np.array([
(22, 32),
(21, 31),
(5, 15),
])

我的第二个问题是是否存在一个numpy向量化的方法，可以提取所有可能的1序列（包括填充了间隙的序列），不管它们的长度是多少。结果可能如下所示：

result = np.array([
(0, 4),  # data[0:4], data.size == 5
(1, 6),  # data[1:6], data.size == 6 because index at position 5 is a 1
(2, 7),  # data[2:7], data.size == 6 because index at position 5 is a 1
(3, 9),  # data[3:9], data.size == 7 because indices at position 5 and 8 are a 1
...
])

我尝试以易懂的方式概述了问题。我在文档和stackoverflow上进行了研究，但不知道如何开始。我找到的都是迭代解决方案。非常感谢您能给出任何建议和解决方案。再次感谢！

- hetsch

当你分配1时，你是否被限制按顺序在数组中放置它们？即，一旦我放置了第一个，我是否被限制将其他的放在第一个之后的第一个空闲位置上？ - ndrplz

那是一个非常好的问题...就我现在所知，不会，只要它们被放置在一种有助于形成最佳/最长的连续 1 序列的方式。 - hetsch

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ndrplz · Accepted Answer

这是我的当前解决方案，假设我可以在空闲的位置（即零）中以任何组合填充数字。

免责声明：我没有进行广泛测试。

from itertools import combinations

import numpy as np
from scipy.ndimage.measurements import find_objects
from scipy.ndimage.measurements import label


data = np.array(
    [0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1,]
)
m = len(data)

num_of_ones_to_fill_gaps = 4

# Find all possible combinations of indexes which we could set to 1
zero_idxs, = np.where(np.equal(data, 0))
combs = list(combinations(zero_idxs, num_of_ones_to_fill_gaps))

# Convert combinations into one-hot vectors; the len of each vector
#  is equal to the len(data)
combs_onehot = np.eye(m)[np.asarray(combs)]

# Summing on the first axis will give us masks that we can directly
#  sum to the original array. For example, if we had two 1s to insert
#  and a possible combination were (0, 1), combs_onehot would become
#  ([1, 0, 0, ...], [0, 1, 0, 0, ...]) and summing would give us the
#  mask [1, 1, 0, 0, ...]
masks = np.sum(combs_onehot, axis=1).astype(int)

# Broadcast sum of the mask to original array. If our original array
#  had len M and we found N possible combinations, this has shape (N, M)
data_filled = data + masks

# 1-D connected component labeling
str_el = np.asarray([[0,0,0], [1,1,1], [0,0,0]])
labeled, _ = label(data_filled, structure=str_el)

slices = find_objects(labeled)

longest = max(slices, key=lambda x: x[1].stop - x[1].start)
longest_row = longest[0].start

print(f'Best solution: {combs[longest_row]}')
print(f'Longest run: {longest[1].stop - longest[1].start}')