如何在ndarray中找到最常见的元素

Question

如何在ndarray中找到最常见的元素

3

我有一个numpy数组，其形状为(11617, 37)。数据是多类数据，为了建立基准，我需要找出哪个类（或哪些类）最常见。

我尝试过这个公式和这个。

A = np.array([[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
     [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
     [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]])

axis = 0
u, indices = np.unique(arr, return_inverse=True)
answer = u[np.argmax(np.apply_along_axis(np.bincount, axis,                                              indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

我需要找到数组中出现频率最高的37个类别的组合。

期望输出：

[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]

- Bok

1

当您尝试这两个参考文献时，您有什么发现？ - gosuto

你能同时提供输出样本吗？ - J...S

一个返回了全零，另一个返回了全一。 - Bok

5个回答

1

你可以尝试使用np.unique函数，并带上return_counts参数：

from operator import itemgetter

import numpy as np

A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

uniques, counts = np.unique(A, axis=0, return_counts=True)

idxmax, _ = max(zip(range(len(counts)), counts), key=itemgetter(1))
print(uniques[idxmax])

输出

[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]

- Dani Mesejo

1

如果您将列表中的列表元素转换为元组（将列表转换为元组以便进行计数），则可以使用collections.Counter.most_common

from collections import Counter

A = [[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]]

c = Counter(tuple(x) for x in A)
print(c.most_common()[0]) # ((0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0), 2)

这将返回一个包含最常见列表和出现次数的元组。

- b-fg

1

一个真正快速简单的解决方案：

A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

print(max(A, key=A.count))

这会打印出什么：

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]

如果你需要关注运行时或想要优化你的代码，这不是你想要走的路。然而，如果你只需要一个快速解决方案，记住这个一行代码可能有所帮助。（A.tolist() 可以从 np.ndarray 中获取列表，如果你需要的话。）

- finefoot

0

from collections import Counter
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
most_common = [Counter(i).most_common(1).pop()[0] for i in A]
most_common
   [0, 0, 0]

- cph_sto

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Venkatachalam · Accepted Answer

要找到最常见的组合（rows，意味着 axis=0），你可以尝试这个！

A = np.array([[1,0,0,0],
             [1,0,0,1],
             [1,0,0,0]])

unique_rows,counts = np.unique(A, return_counts=True,axis=0)
unique_rows[np.argmax(counts)]

如果你在问题中提到的数组是你的目标变量，那么它就是一个多标签数据的例子。

这个链接可能对你理解多类和多标签有帮助。