Python - 统计同长度列表中出现次数最多的元素

Question

Python - 统计同长度列表中出现次数最多的元素

4

我在过去几个小时里一直在寻找答案，但没有找到我要找的答案，所以我决定在这里提问。

假设我有一个具有相同长度的数据列表，例如：

0004000000350
0000090033313
0004000604363
040006203330b
0004000300a3a
0004000403833
00000300333a9
0004000003a30

什么是在每个位置上匹配最频繁出现的字符的最有效方法。

例如输出结果可能如下所示：

0 0 0 4 0 0 0 0 0 3 3 3 3

编辑：谢谢大家的回答，给了我想要的内容！ :)

编辑2：我觉得最好的方法可能是通过添加总数和某种百分比来解决。由于数据集很大，仅仅列出最常见的出现并不像我希望的那样清晰明了。

- M-Phoenix16

为什么预期输出的第四个元素是4而不是0？ - zabop

1

在这个数据集的例子中，第4个位置上最频繁出现的字符是4。4出现了5次，而0只出现了3次。 - M-Phoenix16

5个回答

2

将字符串列表压缩成"转置"它们以在同一迭代器中呈现列，对它们应用collections.Counter，并使用most_common方法，删除不需要的数据。将其称为"最初的回答"。

data="""0004000000350
0000090033313
0004000604363
040006203330b
0004000300a3a
0004000403833
00000300333a9
0004000003a30"""

import collections

counts = [collections.Counter(x).most_common(1)[0][0] for x in zip(*data.splitlines())]

最初的回答

这将产生以下结果：

['0', '0', '0', '4', '0', '0', '0', '0', '0', '3', '3', '3', '3']

"Original Answer" 翻译成中文为 "最初的回答"。如果需要，可以使用 "".join(counts) 将字符组合起来重新创建一个字符串。请注意保留HTML标签，并且尽可能地让内容更容易理解，不必进行解释。

- Jean-François Fabre

0

from collections import Counter
''.join(Counter(i).most_common(1)[0][0] for i in zip(*l))

其中l是您的字符串列表。

- Benoît P

1

期望字符串实例，但发现元组。缺少 [0]？ - tobias_k

0

如果没有导入，我会这样做：

data = [
"0004000000350",
"0000090033313",
"0004000604363",
"040006203330b",
"0004000300a3a",
"0004000403833",
"00000300333a9",
"0004000003a30",
]

# return the most common elemebt in an iterable
most_common = lambda ite: max(ite, key=ite.count)  

# print the most_common in each columns
print(map(most_common, zip(*data)))

# ['0', '0', '0', '4', '0', '0', '0', '0', '0', '3', '3', '3', '3']

- cdrom

1

值得注意的是，使用 key 的 max 时间复杂度为 O(n²)。 - tobias_k

0

由于没有人使用过pandas，因此通过使用pandas，您可以轻松高效地实现这一点。

a = """0004000000350
0000090033313
0004000604363
040006203330b
0004000300a3a
0004000403833
00000300333a9
0004000003a30"""

import pandas as pd
df = pd.DataFrame([list(j) for j in a.strip().split('\n')])
result =  df.mode().to_string(header=None,index=None)
print(result)

""" output 
 0  0  0  4  0  0  0  0  0  3  3  3  3
"""

- sahasrara62

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- yatu · Accepted Answer

你可以使用zip函数将每个字符串中相同位置的字符交叉组合。然后使用scipy.stats.mode函数获取每个元组的众数，最后将生成器表达式返回的字符串连接起来即可。"Original Answer"翻译成"最初的回答"。

l = ['0004000000350', '0000090033313', '0004000604363', '040006203330b', 
     '0004000300a3a', '0004000403833', '00000300333a9', '0004000003a30']

from scipy.stats import mode
''.join(mode(i).mode[0] for i in list(zip(*l)))

Output

'0004000003333'