我有以下数据,我想做的是:
现在将其作为键/元组:
[(13, 'D'), (14, 'T'), (32, '6'), (45, 'T'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'T'), (53, '2'), (54, '0'), (13, 'A'), (14, 'T'), (32, '6'), (45, 'A'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'X')]
对于每个键,计算值(一个1个字符串字符)的实例数。所以我首先做了一个映射:
.map(lambda x: (x[0], [x[1], 1]))
现在将其作为键/元组:
[(13, ['D', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['T', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['T', 1]), (53, ['2', 1]), (54, ['0', 1]), (13, ['A', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['A', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['X', 1])]
我只是最后一部分无法弄清楚如何针对每个密钥计算该字母的出现次数。例如,密钥13将有1个D和1个A。而14将有2个T等。
groupByKey
,然后在已分组的字符上执行计数。 - mattsilver