numpy 通过索引在数组中找到多个出现次数

Question

numpy 通过索引在数组中找到多个出现次数

3

给定以下数组：

array = [-1, -1, -1, -1, -1, -1, 3, 3, -1, 3, -1, -1,  2,  2, -1, -1,  1, -1]
 indexes  0   1   2   3   4   5  6  7   8  9  10  11  12  13  14  15  16  17

我需要找到相同数字出现的索引。在这个例子中，它会返回一个类似这样的列表的列表：

list(list(), list(16), list(12, 13), list(6, 7, 9), list() etc...)
     0       1     \   2             3              4
     ^              \ 
      \              \ the index in the array at which "1" appears
       \ 
        \ the numbers in the array

numpy 中该如何实现？

数字 1 出现在索引 16 处
数字 2 出现在索引 12、13 处
等等。

基于评论的注意事项：

-1 可以忽略，我只对其他值感兴趣
数组有大约50个元素，取值范围为 int(500)
此函数将被调用6000多次。

- Andrei M.

6个回答

1

array = [-1, -1, -1, -1, -1, -1, 3, 3, -1, 3, -1, -1,  2,  2, -1, -1,  1, -1]
s = sorted(set(array))
print(s)  # all the unique elements in the list

#output
[-1, 1, 2, 3] 

[([i for i,d in enumerate(array) if d == x],x) for x in s]

#output
[([0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17], -1),   #[([indices],element)]
 ([16], 1),
 ([12, 13], 2),
 ([6, 7, 9], 3)]

- Talha Tayyab

解决方案的复杂度是二次的，如果有很多不同的值，这将变得相当慢。 - mozway

是的，我认为这不值一提，但在数组中可能有最多500个值此外，这个函数需要被调用约6000+次。 - Andrei M.

1

@talha-tayyab的答案只适用于你只需要考虑出现在数组中的值，但是如果你需要从value=0开始递增直到最高值，这个方法会起作用。

array = [-1, -1, -1, -1, -1, -1, 3, 3, -1, 3, -1, -1,  2,  2, -1, -1,  1, -1]
max = numpy.max(array)

result = []
for i in range(max):
    result.append(numpy.argwhere(array == i))

return result

- Stitt

就像在另一个答案中所提到的那样，复杂度是二次的。你不应该需要对数组进行多次循环。此外，你可能没有测试过代码，但它包含语法错误并且无法产生预期的输出结果。 - mozway

请注意，你需要在代码中添加array = np.array(array)才能使其正常工作。 - mozway

1

一个numpy解决方案：

l1 = np.array([-1, -1, -1, -1, -1, -1, 3, 3, -1, 3, -1, -1,  2,  2, -1, -1,  1, -1])

unique, counts = np.unique(l1, return_counts=True)
print(dict(zip(unique, np.split(np.argsort(l1), np.cumsum(counts)))))

输出：

{
    -1: array([0, 15, 14, 11, 10, 8, 4, 3, 2, 1, 5, 17]),
    1: array([16]),
    2: array([12, 13]),
    3: array([6, 7, 9]),
}

- Andrej Kesely

1

使用 itertools.groupby + operator.itemgetter 方法：

from itertools import groupby
from operator import itemgetter

[[k, list(i[0] for i in g)] for k, g in groupby(sorted(enumerate(a), key=itemgetter(1)), key=itemgetter(1))]

[[-1, [0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17]],
 [1, [16]],
 [2, [12, 13]],
 [3, [6, 7, 9]]]

- RomanPerekhrest

1

另一个可能的解决方案：

[(x, np.where(array == x)[0].tolist()) for x in np.unique(array)]

输出：

[(-1, [0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17]),
 (1, [16]),
 (2, [12, 13]),
 (3, [6, 7, 9])]

- PaulS

1

我刚刚看到你的回答，我没有重新运行所有的测试，但它也是二次方程（k=100）；) - mozway

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mozway · Accepted Answer

使用字典来收集索引，可以在O(n)时间内解决问题。

# collect positions per value for each item
d = {}
for i, x in enumerate(array):
    d.setdefault(x, []).append(i)

# sort the output (optional)
out = {k: d[k] for k in sorted(d)}

输出：

{-1: [0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17],
  1: [16],
  2: [12, 13],
  3: [6, 7, 9]}

* + O(k*log(k)) 其中 k 是唯一值的数量，如果你需要一个排序后的输出

对于一个列表的列表：

out = [d.get(k, []) for k in range(min(d), max(d)+1)]

# or for only positive values
out = [d.get(k, []) for k in range(1, max(d)+1)]

输出：

[[0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17],
 [],
 [16],
 [12, 13],
 [6, 7, 9]]

替代方案

或者，如果您预先初始化输出，可以采用非常简单的方法：

out = [[] for i in range(max(array)+1)]
for i, x in enumerate(array):
    out[x].append(i)

所有方法的比较

在这里，纯Python是最快的。

初始数组是使用np.random.randint(0, k, size=n).tolist()生成的，其中n是数组的长度，k是数组中的最大值。

当k=4时：

当 k=100 时：

我们现在可以看到 @TalhaTayyab/@Stitt/@PaulS 方法的二次行为。

在 k=10_000 的情况下：

我们可以注意到，对于相对较小的数组（当值可能是唯一的时候），numpy 的速度稍快。