在 NumPy 中高效地计算唯一子数组的出现次数？

Question

在 NumPy 中高效地计算唯一子数组的出现次数？

pythonarraysnumpycounting

10

我有一个形状为(128, 36, 8)的数组，我想要找到最后一维中长度为8的唯一子数组的出现次数。

我知道np.unique和np.bincount，但这些似乎是针对元素而不是子数组的。我看到了这个问题，但它是关于查找特定子数组的第一次出现，而不是所有唯一子数组的计数。

- Will

我无法想出一种在numpy内完成它的方法，但使用trie会太慢吗？它只需要访问每个元素一次，然后在结束时自动拥有唯一子数组的数量以及它们的位置（如果您存储了它们）。 - KobeJohn

这里有一个相关的问题，https://dev59.com/cWoy5IYBdhLWcg3wa9ff。基本思路是对子数组进行排序（词典序排序）。一旦相似的子数组被分组，识别和计数它们就很容易了。 - Bi Rico

3个回答

2

我在@Divakar非常有用的答案基础上进行了修改，使其返回唯一子数组的计数以及子数组本身，这样输出结果与collections.Counter.most_common()相同:

# Get the array in 2D form.
arr = arr.reshape(-1, arr.shape[-1])

# Lexicographically sort
sorted_arr = arr[np.lexsort(arr.T), :]

# Get the indices where a new row appears
diff_idx = np.where(np.any(np.diff(sorted_arr, axis=0), 1))[0]

# Get the unique rows
unique_rows = [sorted_arr[i] for i in diff_idx] + [sorted_arr[-1]]

# Get the number of occurences of each unique array (the -1 is needed at
# the beginning, rather than 0, because of fencepost concerns)
counts = np.diff(
    np.append(np.insert(diff_idx, 0, -1), sorted_arr.shape[0] - 1))

# Return the (row, count) pairs sorted by count
return sorted(zip(unique_rows, counts), key=lambda x: x[1], reverse=True)

- Will

0

我不确定这是否是最有效的方法，但这应该可以工作。

arr = arr.reshape(128*36,8)
unique_ = []
occurence_ = []

for sub in arr:
    if sub.tolist() not in unique_:
        unique_.append(sub.tolist())
        occurence_.append(1)
    else:
        occurence_[unique_.index(sub.tolist())]+=1
for index_,u in unique_:
   print u,"occurrence: %s"%occurence_[index_]

- farhawa

这个方法可以运行，但我想避免使用像 tolist 和 index 这样的原生 Python 函数，它们很耗费资源。不过还是谢谢你的回答。 - Will

顺便提一下，你的方法可以进行明显的优化，即将计数保存在字典中，其中键是子数组的元组，而不是列表，我们需要使用unique_.index进行搜索。 - Will

1

@Will 或者更好的是，使用 collections.Counter，counts = Counter(tuple(row) for row in arr) :) - Bi Rico

@BiRico，太棒了，我不知道那个内置函数！ - Will

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Accepted Answer

问题说明输入数组的形状为(128, 36, 8)，我们要找到在最后一个维度上长度为8的不重复子数组。因此，我认为唯一性是沿着合并在一起的前两个维度。让我们假设A是输入的三维数组。 获取唯一子数组的数量

# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar =  Ar[sorted_idx,:]

# Get the count of rows that have at least one TRUE value 
# indicating presence of unique subarray there
unq_out = np.any(np.diff(sorted_Ar,axis=0),1).sum()+1

样例运行 -

In [159]: A # A is (2,2,3)
Out[159]: 
array([[[0, 0, 0],
        [0, 0, 2]],

       [[0, 0, 2],
        [2, 0, 1]]])

In [160]: unq_out
Out[160]: 3

获取唯一子数组出现次数的计数。

# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar =  Ar[sorted_idx,:]

# Get IDs for each element based on their uniqueness
id = np.append([0],np.any(np.diff(sorted_Ar,axis=0),1).cumsum())

# Get counts for each ID as the final output
unq_count = np.bincount(id)

样例运行 -

In [64]: A
Out[64]: 
array([[[0, 0, 2],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 2, 0]]])

In [65]: unq_count
Out[65]: array([1, 2, 1], dtype=int64)