需要从字典列表中删除重复项并更改剩余重复项的数据(Python)

3
考虑这个短小的Python字典列表(第一个字典项是字符串,第二个字典项是Widget对象):
raw_results =  
     [{'src': 'tag', 'widget': <Widget: to complete a form today>},   # dupe 1a
      {'src': 'tag', 'widget': <Widget: a newspaper>},                # dupe 2a
      {'src': 'zip', 'widget': <Widget: to complete a form today>},   # dupe 1b
      {'src': 'zip', 'widget': <Widget: the new Jack Johnson album>},
      {'src': 'zip', 'widget': <Widget: a newspaper>},                # dupe 2b
      {'src': 'zip', 'widget': <Widget: premium dog food >}]

我想浏览那个列表并删除重复项,这个SO问题已经为我解答了:

在保持顺序的同时删除列表中的重复项(Python)


    known_widgets= set()
    processed_results = []

    for x in raw_results:
        widget = x['widget']
        if widget in known_widgets: 
            continue
        else:
            processed_results.append(x)
            known_widgets.add(widget)

然而,当我删除重复行(例如重复项1b)后,我想要更改剩余重复项(例如重复项1a)的"src"数据。 我想将已删除的重复项的"src"附加到原始项上。这是我想得到的结果:

processed_results =  
    [{'src': 'tag-zip', 'widget': <Widget: to complete a form today>},  # dupe 1a
     {'src': 'tag-zip', 'widget': <Widget: a newspaper>},               # dupe 2a
     {'src': 'zip', 'widget': <Widget: the new Jack Johnson album>},
     {'src': 'zip', 'widget': <Widget: premium dog food >}]

我相信这很容易做到,但在喝了太多咖啡并花了许多时间围绕这个问题后,我的头脑已经转晕了。我非常希望并感激专家的帮助。谢谢!


你并不是在“删除”重复项,而是在合并它们,对吗? - S.Lott
是的,我想这更准确,因为我正在合并重复项的两个“src”字段。 - mitchf
你正在合并源代码——将所有src='tag-zip'的标签分为一组,将所有src='zip'的标签分为另一组? - hughdbrown
2个回答

2
def find_widget(widget, L):
    for i, v in enumerate(L):
      if v[widget] == widget:
          return i

known_widgets= set()
processed_results = []

for x in raw_results:
    widget = x['widget']
    if widget in known_widgets:
        processed_widgets[find_widget(widget, processed_results)]['src'] += '-%s' % x['tag']
        continue
    else:
        processed_results.append(x)
        known_widgets.add(widget)

这可能可以做得更好(因为这需要对每个重复的小部件进行两次操作)。


谢谢你的帮助ikanobori,我很感激! - mitchf
如果它有效,请通过单击其左侧的V形图标接受我的帖子作为答案。 - supakeen

1
假设您想要一个以重复的src值为键的小部件列表,那么这就是您想要的内容:
class Widget(object):
    def __init__(self, desc):
        self.desc = desc
    def __str__(self):
        return "Widget(%s)" % self.desc

raw_results = [
    {'src':'tag-zip', 'widget':Widget('to complete a form today')},
    {'src':'tag-zip', 'widget':Widget('a newspaper')},
    {'src':'zip', 'widget':Widget('the new Jack Johnson album')},
    {'src':'zip', 'widget':Widget('premium dog food')}
]

from collections import defaultdict
known_widgets = defaultdict(list)
for x in raw_results:
    k, v = x['src'], x['widget']
    known_widgets[k].append(v)

for k, v in known_widgets.iteritems():
    print "%s: %s" % (k, ",".join(str(w) for w in v))

如果你想要消除重复的widget5s,请执行以下操作:

class Widget(object):
    def __init__(self, desc):
        self.desc = desc
    def __str__(self):
        return "Widget(%s)" % self.desc
    def __hash__(self):
        return hash(self.desc)
    def __cmp__(self, other):
        return cmp(self.desc, other.desc)

raw_results = [
    {'src':'tag-zip', 'widget':Widget('to complete a form today')},
    {'src':'tag-zip', 'widget':Widget('a newspaper')},
    {'src':'zip', 'widget':Widget('the new Jack Johnson album')},
    {'src':'zip', 'widget':Widget('premium dog food')},
    {'src':'tag-zip', 'widget':Widget('to complete a form today')},
    {'src':'tag-zip', 'widget':Widget('a newspaper')},
    {'src':'zip', 'widget':Widget('the new Jack Johnson album')},
    {'src':'zip', 'widget':Widget('premium dog food')},
]

from collections import defaultdict
known_widgets = defaultdict(set)
for x in raw_results:
    k, v = x['src'], x['widget']
    known_widgets[k].add(v)

for k, v in known_widgets.iteritems():
    print "%s: %s" % (k, ",".join(str(w) for w in v))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接