如何在Python中对元组列表进行简化

Question

如何在Python中对元组列表进行简化

3

我有一个数组，我想计算每个项在数组中出现的次数。

我已经使用map函数生成了一个元组列表。

def mapper(a):
    return (a, 1)

r = list(map(lambda a: mapper(a), arr));

//output example: 
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)

我希望reduce函数可以帮助我按每个元组中第一个数字（id）对计数进行分组。例如：

(11817685, 2), (2014036792, 1), (2014047115, 1)

我尝试了

cnt = reduce(lambda a, b: a + b, r);

除此之外还有其他方法，但它们都不能解决问题。

注意非常感谢您提供的关于其他解决该问题的方法，但我只是在学习Python以及如何在这里实现Map-Reduce。为了方便理解，我大大简化了我的实际业务问题，请您友好地向我展示正确的Map-Reduce执行方法。

- Lee

5

lambda a: mapper(a)？为什么不直接传递 mapper 呢？另外，你期望的输出是什么？ - internet_user

感谢您的评论。是的，我可以直接传递映射器，只是在尝试其他东西。我已经添加了我的预期输出。 - Lee

你需要 r 吗？还是它只是一个中介？ - internet_user

仅仅是中间人。 - Lee

2

既不是reduce也不是map真正能帮助你处理这里的任务。这种类型的任务就是为什么有collections.Counter存在(以及对于特殊情况，其中输入已经排序，还有itertools.groupby)。Map/Reduce策略是用于并行运行许多mappers并馈送到并行的许多reducers的情况;将相同模式盲目应用于纯单线程代码是浪费的(在Map/Reduce情况下也是如此，你只是依靠荒谬的并行级别来弥补开销)。 - ShadowRanger

显示剩余4条评论

4个回答

1

在回答了另一个问题后，我想起了这篇文章，并认为在这里写一个类似的答案会很有帮助。以下是使用reduce方法获取所需输出的方法。

arr = [11817685, 2014036792, 2014047115, 11817685]

def mapper(a):
    return (a, 1)

def reducer(x, y):
    if isinstance(x, dict):
        ykey, yval = y
        if ykey not in x:
            x[ykey] = yval
        else:
            x[ykey] += yval
        return x
    else:
        xkey, xval = x
        ykey, yval = y
        a = {xkey: xval}
        if ykey in a:
            a[ykey] += yval
        else:
            a[ykey] = yval
        return a

mapred = reduce(reducer, map(mapper, arr))

print mapred.items()

这会打印什么：

[(2014036792, 1), (2014047115, 1), (11817685, 2)]

请查看链接答案以获取更详细的解释。

- pault

1

不使用任何外部模块，您可以使用一些逻辑并在没有任何模块的情况下完成：

track={}
if intr not in track:
    track[intr]=1
else:
    track[intr]+=1

示例代码：

对于这些类型的列表问题，有一个模式：

假设你有一个列表：

a=[(2006,1),(2007,4),(2008,9),(2006,5)]

您希望将其转换为字典，其中元组的第一个元素作为键，第二个元素作为值。类似于：

{2008: [9], 2006: [5], 2007: [4]}

但有一个问题，如果键相同但值不同，例如(2006,1)和(2006,5)，您也希望这些值附加到同一个键上，因此期望的输出如下：

{2008: [9], 2006: [1, 5], 2007: [4]}

对于这种类型的问题，我们通常会采取以下步骤：

首先创建一个新字典，然后按照以下模式进行操作：

if item[0] not in new_dict:
    new_dict[item[0]]=[item[1]]
else:
    new_dict[item[0]].append(item[1])

所以我们首先检查新字典中是否存在该键，如果已经存在，则将重复键的值添加到其值中：

完整代码：

a=[(2006,1),(2007,4),(2008,9),(2006,5)]

new_dict={}

for item in a:
    if item[0] not in new_dict:
        new_dict[item[0]]=[item[1]]
    else:
        new_dict[item[0]].append(item[1])

print(new_dict)

输出：

{2008: [9], 2006: [1, 5], 2007: [4]}

- Aaditya Ura

0

如果您只需要 cnt，那么在这里使用 dict 可能比 tuple 的 list 更好（如果您需要此格式，请使用 dict.items）。 collections 模块有一个有用的数据结构，即 defaultdict。

from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
                       # the result of calling int
for key in arr:
  cnt[key] += 1 # if key is not in cnt, it will put in the default

# cnt_list = list(cnt.items())

- internet_user

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- scope · Accepted Answer

你可以使用 Counter ：

from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())

编辑：

正如@ShadowRanger所指出的那样，Counter具有items()方法：

from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()