Numpy - 查找每行中出现最频繁的元素

Question

Numpy - 查找每行中出现最频繁的元素

4

我在NumPy中有一个像这样的矩阵:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

我想要获取每行中最常见的值。换句话说，我想要得到一个像这样的向量:

array([0, 1, 0, 1, 1, 2])

我成功地使用Scipy的mode方法解决了这个问题，具体实现如下：

scipy.stats.mode(data, axis=1)[0].flatten()

然而，我正在寻找仅使用NumPy的解决方案。此外，该解决方案需要使用负整数值也能正常工作。

- David Lasry

3个回答

1

如果您的标签从0到n_labels - 1，您可以使用以下方法：

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

这段代码是完全向量化的（没有列表推导，也没有apply_along_axis），因此在速度上比上面提出的解决方案更有效率（并且也更简单）。

如果您的标签不是从0到n_labels - 1，则可以用一个数组索引代替np.arange(n_labels)来获得相同的结果。

- Nephanth

0

我已经根据以下帖子中Def_Os的答案进行了调整：

在numpy数组中查找众数的最有效方法

以下函数仅使用numpy，并且适用于负数。

import numpy as np
def mode_row(ar):
    _min = np.min(ar)
    adjusted = False
    if _min < 0:
        ar = ar - _min
        adjusted = True
    ans = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=ar)
    if adjusted:
        ans = ans + _min
    return ans

A = np.array([[0, 0, 1, 1],
              [1, 1, 0, 2],
              [0, 0, 1, 0],
              [0, 2, 1, 1],
              [1, 1, 1, 0],
              [1, 0, 2, 2]])

B = A - 1

mode_row(A)
mode_row(B)

array([0, 1, 0, 1, 1, 2], dtype=int64)

array([-1, 0, -1, 0, 0, 1], dtype=int64)

- Self Dot

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Borja_042 · Accepted Answer

假设m是您的矩阵的名称:

most_f = np.array([np.bincount(row).argmax() for row in m])

我希望这能解决你的问题。