如何在Python中计算列表中包含集合的出现次数?

6
尝试实现Apriori算法并成功提取在所有交易中一起出现的子集。以下是我的代码:
subsets = [set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['American (Traditional)', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Restaurants'])]

例如set(['早餐和早午餐','餐厅'])出现两次,我需要跟踪出现次数以及相应模式。
我尝试使用:
from collections import Counter

support_set = Counter()
# some code that generated the list above

support_set.update(subsets)

但是它会生成以下错误信息:
  supported = itemsets_support(transactions, candidates)
  File "apriori.py", line 77, in itemsets_support
    support_set.update(subsets)
  File"/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 567, in update
    self[elem] = self_get(elem, 0) + 1
TypeError: unhashable type: 'set'

有什么想法吗?

这可能不再是Apriori算法,而是一种天真且低效的“频繁项集”思想的近似实现。请使用一些更大的数据集进行基准测试,例如ELKI或R的arules包。将所有内容放入Counter中无法扩展。请尝试使用超市数据集等其他数据集。 - Has QUIT--Anony-Mousse
这是Apriori的一部分。它是否可扩展是另一个问题。目前它还没有为生产环境构建! - flamenco
不,不是这样的。Apriori 是关于高效而非低效的实现。如果忽略了效率方面,那就不再是 Apriori 了。 - Has QUIT--Anony-Mousse
1个回答

8
您可以将集合转换为可哈希的 frozenset 实例:
>>> from collections import Counter
>>> subsets = [set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['American (Traditional)', 'Restaurants']), set(['American (Traditional)', 'Breakfast & Brunch']), set(['Breakfast & Brunch', 'Restaurants']), set(['American (Traditional)', 'Restaurants'])]
>>> c = Counter(frozenset(s) for s in subsets)
>>> c
Counter({frozenset(['American (Traditional)', 'Restaurants']): 2, frozenset(['Breakfast & Brunch', 'Restaurants']): 2, frozenset(['American (Traditional)', 'Breakfast & Brunch']): 2})

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接