计算嵌套列表中所有元素的数量

Question

计算嵌套列表中所有元素的数量

6

我有一个包含多个列表的列表，并希望创建一个数据框，其中包含所有唯一元素的计数。以下是我的测试数据：

test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]]

我可以使用Counter和for循环来完成这样的操作：

from collections import Counter
for item in test:
     print(Counter(item))

但是如何将这个循环的结果汇总成一个新的数据框呢？

期望的输出结果是一个数据框：

P1 P2 P3 P4
15 4  1  2

- ThomasJohnson

4个回答

5

为了获得更好的性能，你应该使用以下其中之一：

collections.Counter with itertools.chain.from_iterable as:

>>> from collections import Counter
>>> from itertools import chain

>>> Counter(chain.from_iterable(test))
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})

OR, yo should be using collections.Counter with list comprehension (requires one less import of itertools with same performance) as:

>>> from collections import Counter

>>> Counter([x for a in test for x in a])
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})

继续阅读以获取更多替代方案和性能比较结果。如果不需要，可以跳过。

方法一：将子列表连接起来创建单个list，并使用collections.Counter查找计数。

Solution 1: Concatenate list using itertools.chain.from_iterable and find the count using collections.Counter as:

test = [
    ["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
    ["P1", "P1", "P1"],
    ["P1", "P1", "P1", "P2"],
    ["P4"],
    ["P1", "P4", "P2"],
    ["P1", "P1", "P1"]
]

from itertools import chain 
from collections import Counter

my_counter = Counter(chain.from_iterable(test))

Solution 2: Combine list using list comprehension as:

from collections import Counter

my_counter = Counter([x for a in my_list for x in a])

Solution 3: Concatenate list using sum

from collections import Counter

my_counter = Counter(sum(test, []))

方法二： 使用 collections.Counter 计算每个子列表中元素的数量，然后对列表中的 Counter 对象进行 sum 操作。

Solution 4: Count objects of each sublist using collections.Counter and map as:

from collections import Counter

my_counter = sum(map(Counter, test), Counter())

Solution 5: Count objects of each sublist using list comprehension as:

from collections import Counter

my_counter = sum([Counter(t) for t in test], Counter())

在上述所有解决方案中，my_counter 将会保存这个值：

>>> my_counter
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})

性能比较

下方是Python 3中对1000个子列表和每个子列表中有100个元素进行的timeit比较：

Fastest using chain.from_iterable (17.1 msec)

mquadri$ python3 -m timeit "from collections import Counter; from itertools import chain; my_list = [list(range(100)) for i in range(1000)]" "Counter(chain.from_iterable(my_list))"
100 loops, best of 3: 17.1 msec per loop

Second on the list is using list comprehension to combine the list and then do the Count (similar result as above but without the additional import of itertools) (18.36 msec)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter([x for a in my_list for x in a])"
100 loops, best of 3: 18.36 msec per loop

Third in terms of performance is using Counter on sublists within list comprehension : (162 msec)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum([Counter(t) for t in my_list], Counter())"
10 loops, best of 3: 162 msec per loop

Fourth on the list is via using Counter with map (results are quite similar to the one using list comprehension above) (176 msec)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum(map(Counter, my_list), Counter())"
10 loops, best of 3: 176 msec per loop

Solution using sum to concatenate the list is too slow (526 msec)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter(sum(my_list, []))"
10 loops, best of 3: 526 msec per loop

- Moinuddin Quadri

是的，但这样它就不会对“计数器”进行求和，而是在组合列表上运行单个计数器。然而，我认为我也应该在答案中提到这一点（以相同的性能跳过itertools的导入的好方法）。 - Moinuddin Quadri

1

这里有另一种方法可以做到这一点，使用itertools.groupby。

>>> from itertools import groupby, chain

>>> out = [(k,len(list(g))) for k,g in groupby(sorted(chain(*test)))]
>>> out
>>> [('P1', 15), ('P2', 4), ('P3', 1), ('P4', 2)]

将其转换为类似于字典的格式：

>>> dict(out)
>>> {'P2': 4, 'P3': 1, 'P1': 15, 'P4': 2}

将其转换为数据框，请使用：

>>> import pandas as pd

>>> pd.DataFrame(dict(out), index=[0])
   P1  P2  P3  P4
0  15   4   1   2

- Sohaib Farooqi

0

函数“set”仅保留列表中的唯一元素。因此，使用“len(set(mylinst))”，您可以获得列表中唯一元素的数量。然后，您只需要对其进行迭代即可。

dict_nb_item = {}
i = 0
for test_item in test:
    dict_nb_item[i] = len(set(test_item))
    i += 1
print(dict_nb_item)

- Alizé

这个是如何产生OP想要的结果的？ - Ma0

这个输出是 {0: 3, 1: 1, 2: 2, 3: 1, 4: 3, 5: 1} (Python-3)，这显然不是 OP 所期望的。 - rollstuhlfahrer

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

以下是其中的一种方式。

from collections import Counter
from itertools import chain

test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]]

c = Counter(chain.from_iterable(test))

for k, v in c.items():
    print(k, v)

# P1 15
# P2 4
# P3 1
# P4 2

输出为数据帧：

df = pd.DataFrame.from_dict(c, orient='index').transpose()

#    P1 P2 P3 P4
# 0  15  4  1  2