在Python中查找两个列表中共同元素的最快方法

4

我有一个如下的列表。

mylist = 
[  
   [  
      [  
         "chocolate_pudding",
         920.8000000000001
      ],
      [  
         "caramel_pudding",
         345.59999999999997
      ],
      [  
         "pudding",
         248.0
      ],
      [  
         "banana_pudding",
         27.599999999999998
      ]
   ],
   [  
      [  
         "biscuits",
         190.8
      ],
      [  
         "chocolates",
         33.599999999999994
      ],
      [  
         "chocolate_pudding",
         920.8000000000001
      ]
   ],
   [  
      [  
         "tiramusu",
         145.8
      ]
   ],
   [  
      [  
         "cakes",
         139.29999999999998
      ]
   ],
   [  
      [  
         "butter_cakes",
         133.0
      ]
   ],
   [  
      [  
         "chocolate_pudding",
         920.8000000000001
      ]
   ]
]

我想查找在列表中出现多次的元素(例如,["chocolate_pudding", 920.8000000000001]),并删除重复的元素,同时保留第一个条目。

因此,我的输出应该如下所示。

mylist = 
[  
   [  
      [  
         "chocolate_pudding",
         920.8000000000001
      ],
      [  
         "caramel_pudding",
         345.59999999999997
      ],
      [  
         "pudding",
         248.0
      ],
      [  
         "banana_pudding",
         27.599999999999998
      ]
   ],
   [  
      [  
         "biscuits",
         190.8
      ],
      [  
         "chocolates",
         33.599999999999994
      ]
   ],
   [  
      [  
         "tiramusu",
         145.8
      ]
   ],
   [  
      [  
         "cakes",
         139.29999999999998
      ]
   ],
   [  
      [  
         "butter_cakes",
         133.0
      ]
   ]
]

我一直在尝试的代码如下。

mylist_copy = mylist

for item in mylist:
    myindex = mylist.index(item)
    #print(item)

    for single_item in item:
        #print(single_item)
        for item_copy in mylist_copy:
            if mylist_copy.index(item_copy) != myindex:
                if single_item in item_copy:
                    print(single_item)

因为它有很多for循环,我想寻找一种高效的方法来完成它。注意:我也尝试过;

mylist_copy = mylist

for item in mylist:
    myindex = mylist.index(item)
    for item_copy in mylist_copy:
          if mylist_copy.index(item_copy) != myindex:
                print(set(item).intersection(item_copy))

然而,交集不支持列表。

在Python中是否有一种简单快捷的方法?


只是提醒一下,你的 mylist_copy = mylist 并没有真正复制列表,因为在 Python 中,列表是可变对象。改变 mylist 也会改变 mylist_copy - Arne
3个回答

2

使用set()对象并保留子列表的顺序:

mylist = [[["chocolate_pudding", 920.8000000000001], ["caramel_pudding", 345.59999999999997], 
          ["pudding", 248.0], ["banana_pudding", 27.599999999999998]], [["biscuits", 190.8], 
          ["chocolates", 33.599999999999994], ["chocolate_pudding", 920.8000000000001]], 
          [["tiramusu", 145.8]], [["cakes", 139.29999999999998]], [["butter_cakes", 133.0]], 
          [["chocolate_pudding", 920.8000000000001]]]

result, foods = [], set()
for sub_l in mylist:
    new_sublist = []
    for i in sub_l:
        if i[0] not in foods:     # on the 1st occurrence of `foodstuff` name
            new_sublist.append(i)
            foods.add(i[0])       # add `foodstuff` into set of unique foods
    if new_sublist: result.append(new_sublist)

print(result)

输出结果:
[[['chocolate_pudding', 920.8000000000001], ['caramel_pudding', 345.59999999999997], ['pudding', 248.0], ['banana_pudding', 27.599999999999998]], [['biscuits', 190.8], ['chocolates', 33.599999999999994]], [['tiramusu', 145.8]], [['cakes', 139.29999999999998]], [['butter_cakes', 133.0]]]

1
你可以展开内部列表并将它们全部放入一个集合中。集合不允许包含重复项,因此你甚至无需检查,集合会在很短的时间内为你完成。唯一的注意点是集合不能包含列表,所以它们需要先转换为元组。如果你对这两种类型转换没有问题,那么可以使用简单的集合推导式来完成,并且应该相当快速:
no_duplicates = {tuple(inner) for outer in mylist for inner in outer}

或者您之后再将类型更改回来:

no_dupe_lists = list(map(list, no_duplicates))

您没有要求这样做,但如果您想复制列表,则必须使用以下其中一种适当的复制技术:mylist_copy = list(mylist)mylist_copy = mylist[:]mylist_copy = [element for element in mylist],第一个是推荐的方法。
由于您的列表包含嵌套列表,因此这些也需要被复制:
mylist_copy = [[list(inner) for inner in outer] for outer in mylist]

1

曾经有位伟人说过,只取所需,何必删除?现在有两个人表示:

mylist = [[["chocolate_pudding", 920.8000000000001], ["caramel_pudding", 345.59999999999997],
          ["pudding", 248.0], ["banana_pudding", 27.599999999999998]], [["biscuits", 190.8],
          ["chocolates", 33.599999999999994], ["chocolate_pudding", 920.8000000000001]],
          [["tiramusu", 145.8]], [["cakes", 139.29999999999998]], [["butter_cakes", 133.0]],
          [["chocolate_pudding", 920.8000000000001]]]


result=[]
track=[]
for i in mylist:
    sublist=[]
    for k in i:
        if k not in track:
            track.append(k)
            sublist.append(k)

    if sublist:

        result.append(sublist)


print(result)

输出:

[[['chocolate_pudding', 920.8000000000001], ['caramel_pudding', 345.59999999999997], ['pudding', 248.0], ['banana_pudding', 27.599999999999998]], [['biscuits', 190.8], ['chocolates', 33.599999999999994]], [['tiramusu', 145.8]], [['cakes', 139.29999999999998]], [['butter_cakes', 133.0]]]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接