pandas：如何在pandas系列中获取最常见的项目？

Question

pandas：如何在pandas系列中获取最常见的项目？

7

如何在 pandas 系列中获取最常见的项？

考虑系列 s

s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)

返回的值应为3

- mommomonthewind

4个回答

7

使用value_counts并通过index选择第一个值：

val = s.value_counts().index[0]

或者Counter.most_common：

from collections import Counter

val = Counter(s).most_common(1)[0][0]

或者numpy解决方案：

_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]

- jezrael

1

series.mode() 怎么样？ - anky

1

@anky_91 - 它很慢 :( - jezrael

3

`pandas.factorize` 和 `numpy.bincount`

这与 @jezrael 的 Numpy 解答非常相似，不同之处在于使用 factorize 而不是 numpy.unique

factorize 返回一个整数分解和唯一值
bincount 计算每个唯一值出现的次数
argmax 确定最频繁的 bin 或 factor
使用从 argmax 返回的 bin 位置引用唯一值数组中的最频繁值

i, r = s.factorize()
r[np.bincount(i).argmax()]

3

- piRSquared

是的，它应该是的。但说实话，在刚才我并没有注意到你的Numpy答案。我将会删除这个内容，并在你的答案下留言。 - piRSquared

我刚刚从这个版本中添加了时间，看起来非常快，但似乎无法击败pd.Series.mode。 - jpp

1

from scipy import stats
import pandas as pd
x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
data=pd.DataFrame({"values":x})


print(stats.mode(data["values"]))

output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))

- ramakrishnareddy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

你可以直接使用 pd.Series.mode 方法，并提取第一个值：

res = s.mode().iloc[0]

这不一定是低效的。像往常一样，根据你的数据进行测试，看哪种方法更适合。

import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter

np.random.seed(0)

s = pd.Series(np.random.randint(0, 100, 100000))

def jez_np(s):
    _, idx, counts = np.unique(s, return_index=True, return_counts=True)
    index = idx[np.argmax(counts)]
    val = s[index]
    return val

def pir(s):
    i, r = s.factorize()
    return r[np.bincount(i).argmax()]

%timeit s.mode().iloc[0]                 # 1.82 ms
%timeit pir(s)                           # 2.21 ms
%timeit s.value_counts().index[0]        # 2.52 ms
%timeit mode(s).mode[0]                  # 5.64 ms
%timeit jez_np(s)                        # 8.26 ms
%timeit Counter(s).most_common(1)[0][0]  # 8.27 ms

pandas：如何在pandas系列中获取最常见的项目？

pandas.factorize 和 numpy.bincount

`pandas.factorize` 和 `numpy.bincount`