合并元组,如果它们有一个共同的元素

6

请考虑以下列表:

tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]

我该如何实现这个目标?
new_tuple_list = [('c', 'e', 'd'), ('a', 'b')]

我尝试过:

for tuple in tuple_list:
    for tup in tuple_list:
        if tuple[0] == tup[0]:
            new_tup = (tuple[0],tuple[1],tup[1])
            new_tuple_list.append(new_tup)

但是这只有在我按照特定顺序拥有元组的元素时才有效,这意味着结果将会是这样:

new_tuple_list = [('c', 'e', 'd'), ('a', 'b'), ('d', 'e')]

1
你的合并策略不清晰。 - Nir Alfasi
我想合并每个具有共同元素的元组:('c', 'e')('c', 'd'),因为它们都有 'c' 共同元素,这将给我们 ('c', 'e', 'd'),然后再将其与 ('d', 'e') 合并,因为它们都有 'd' 和 'e' 共同元素,这将得到 ('c', 'e', 'd') - Meryem
你能否借助以下示例建立起答复非常相似的问题:http://stackoverflow.com/questions/9118312/finding-tuples-with-a-common-element? - fedepad
7个回答

10

你可以将元组视为图中的边,并将目标视为查找图中的连通分量。然后,你只需循环遍历顶点(元组中的项),对于每个尚未访问过的顶点执行DFS以生成一个连通分量:

from collections import defaultdict

def dfs(adj_list, visited, vertex, result, key):
    visited.add(vertex)
    result[key].append(vertex)
    for neighbor in adj_list[vertex]:
        if neighbor not in visited:
            dfs(adj_list, visited, neighbor, result, key)

edges = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]

adj_list = defaultdict(list)
for x, y in edges:
    adj_list[x].append(y)
    adj_list[y].append(x)

result = defaultdict(list)
visited = set()
for vertex in adj_list:
    if vertex not in visited:
        dfs(adj_list, visited, vertex, result, vertex)

print(result.values())

输出:

[['a', 'b'], ['c', 'e', 'd']]
请注意,在上述示例中,组件内的组件和元素都是随机排列的。

6
如果您不需要重复的值(例如保留['a', 'a', 'b']),那么使用集合是实现您想要的简单快捷的方法:
iset = set([frozenset(s) for s in tuple_list])  # Convert to a set of sets
result = []
while(iset):                  # While there are sets left to process:
    nset = set(iset.pop())      # Pop a new set
    check = len(iset)           # Does iset contain more sets
    while check:                # Until no more sets to check:
        check = False
        for s in iset.copy():       # For each other set:
            if nset.intersection(s):  # if they intersect:
                check = True            # Must recheck previous sets
                iset.remove(s)          # Remove it from remaining sets
                nset.update(s)          # Add it to the current set
    result.append(tuple(nset))  # Convert back to a list of tuples

提供

[('c', 'e', 'd'), ('a', 'b')]

1

我曾经在集合方面遇到过这个问题,所以我将我的解决方案贡献出来。它将具有一个或多个共同元素的集合尽可能地合并。

以下是我的示例数据:

data = [['A','B','C'],['B','C','D'],['D'],['X'],['X','Y'],['Y','Z'],['M','N','O'],['M','N','O'],['O','A']]
data = list(map(set,data))

解决问题的我的代码:

oldlen = len(data)+1
while len(data)<oldlen:
    oldlen = len(data)
    for i in range(len(data)):
        for j in range(i+1,len(data)):
                if len(data[i]&data[j]):
                    data[i] = data[i]|data[j]
                    data[j] = set()
    data = [data[i] for i in range(len(data)) if data[i]!= set()]

结果:

[{'A', 'B', 'C', 'D', 'M', 'N', 'O'}, {'X', 'Y', 'Z'}]

1
这个性能很差,因为列表包含检查是O(n),但它非常简短。
result = []

for tup in tuple_list:
    for idx, already in enumerate(result):
        # check if any items are equal
        if any(item in already for item in tup):
            # tuples are immutable so we need to set the result item directly
            result[idx] = already + tuple(item for item in tup if item not in already)
            break
    else:
        # else in for-loops are executed only if the loop wasn't terminated by break
        result.append(tup)

这有一个好处是顺序被保留:
>>> result
[('c', 'e', 'd'), ('a', 'b')]

1

使用图形操作库 NetworkX,这个任务变得微不足道。与 @niemmi 的 这个答案 类似,您需要找到 连通组件

import networkx as nx

tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
graph = nx.Graph(tuple_list)
result = list(nx.connected_components(graph))
print(result)
# [{'e', 'c', 'd'}, {'b', 'a'}]

将结果作为元组列表获取:
result = list(map(tuple, nx.connected_components(G)))
print(result)
# [('d', 'e', 'c'), ('a', 'b')]

0

在解决共指时,我遇到了这个问题,我需要合并一个具有共同元素的集合列表中的集合:

import copy

def merge(list_of_sets):
    # init states
    list_of_sets = copy.deepcopy(list_of_sets)
    result = []
    indices = find_fist_overlapping_sets(list_of_sets)
    while indices:
        # Keep other sets
        result = [
            s
            for idx, s in enumerate(list_of_sets)
            if idx not in indices
        ]
        # Append merged set
        result.append(
            list_of_sets[indices[0]].union(list_of_sets[indices[1]])
        )
        # Update states
        list_of_sets = result
        indices = find_fist_overlapping_sets(list_of_sets)
    return list_of_sets

def find_fist_overlapping_sets(list_of_sets):
    for i, i_set in enumerate(list_of_sets):
        for j, j_set in enumerate(list_of_sets[i+1:]):
            if i_set.intersection(j_set):
                return i, i+j+1

@MSeifert的代码更加优雅,但是对于list_of_sets = [{1, 2}, {2, 3}, {4}, {5, 6}, {4, 5}, {3, 7}],输出结果为[{1, 2, 3, 7}, {4, 5}, {5, 6}],有点令人困惑,而这种方法则给出了[{4, 5, 6}, {1, 2, 3, 7}]。也许适用于不同的用例。 - Zhang Fan

0
使用集合。您正在检查(最初很小的)集合的重叠和累积,而Python有一个专门的数据类型:
#!python3

#tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
tuple_list = [(1,2), (3,4), (5,), (1,3,5), (3,'a'),
        (9,8), (7,6), (5,4), (9,'b'), (9,7,4),
        ('c', 'e'), ('e', 'f'), ('d', 'e'), ('d', 'f'),
        ('a', 'b'),
        ]
set_list = []

print("Tuple list:", tuple_list)
for t in tuple_list:
    #print("Set list:", set_list)
    tset = set(t)
    matched = []
    for s in set_list:
        if tset & s:
            s |= tset
            matched.append(s)

    if not matched:
        #print("No matches. New set: ", tset)
        set_list.append(tset)

    elif len(matched) > 1:
        #print("Multiple Matches: ", matched)
        for i,iset in enumerate(matched):
            if not iset:
                continue
            for jset in matched[i+1:]:
                if iset & jset:
                    iset |= jset
                    jset.clear()

set_list = [s for s in set_list if s]
print('\n'.join([str(s) for s in set_list]))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接