如何在networkx图中合并和删除节点?

3

我使用networkx制作了一个图表,它有三个属性(id、出版作者、出版物标题)。 我想要检查节点属性(作者),如果两个节点之间有共同的节点属性大于1,我想要添加这两个节点的属性并从图中删除第二个节点。例如,我有以下图表:

[(0, {'publication_authors': {'Bob Johnson', 'Stephen Michell', 'Andy J. Wellings', 'Jorg Kienzle', 'Thomas Wolf', 'Bo Sanden'}, 'title': 'Object-Oriented Programming and Protected Objects in Ada 95'}), (1, {'publication_authors': {'Bob Johnson'}, 'title': 'UNIX Metrics: Is The Data In Open Systems The Same From Platform To Platform?'}), (2, {'publication_authors': {'Bob Johnson'}, 'title': 'Triad Of Computing In The 21st Century Or, Back To The Future Again'}), (3, {'publication_authors': {'Bob Johnson'}, 'title': 'User-centeredness, situatedness, and designing the media of computer documentation'}), (4, {'publication_authors': {'Bob Johnson'}, 'title': "Introduction to commentaries on 'Spurious Coin: A History of Science, Management, and Technical Writing' by Bernadette Longo"}), (5, {'publication_authors': {'Brian Lawrence', 'Bob Johnson'}, 'title': 'Manager: The Project Scoping Gamble'}), (6, {'publication_authors': {'Bob Johnson', 'Stephen Michell', 'Andy J. Wellings', 'Jorg Kienzle', 'Thomas Wolf', 'Bo Sanden'}, 'title': 'Integrating object-oriented programming and protected objects in Ada 95'}), (7, {'publication_authors': {'Robert Johnson', 'Michael Hackett', 'Bob Johnson', 'Hung Quoc Nguyen'}, 'title': 'Testing Applications on the Web: Test Planning for Mobile and Internet-Based Systems, 2 edition'}), (8, {'publication_authors': {'Bob Johnson'}, 'title': 'The Wired Neighborhood: An Extended Multimedia Conversation'}), (9, {'publication_authors': {'Bob Johnson'}, 'title': 'Introduction to the book commentaries'}), (10, {'publication_authors': {'Bob Johnson'}, 'title': 'The cult of ISDN'})]

我想取第一个节点并将其与其余的10个节点进行比较,如果在任意两个节点(0、6)之间,“publication_authors”的值大于1,则我想修改第一个节点(在这种情况下是0)并将节点6的属性合并到0中。然后从图形中删除节点号为6的节点。我已经实现了以下代码,但它给出了错误。请有人帮助我纠正它。提前致谢。

import networkx as nx

ground_truth_file = 'C:\\Bob Johnson.txt'
G = nx.DiGraph()


f = open(ground_truth_file, mode='r')
lines = f.readlines()
i=0

for line in lines:
    line.strip()
    pub_authors = set()
    tokens = line.split('<>')
    authors = tokens[1]
    title = tokens[2]
    venue = tokens[3]
    num_of_authors = authors.split(',')
    for author in num_of_authors:
        pub_authors.add(author)
    G.add_node(i,publication_authors=pub_authors, title=title)
    i=i+1
num_nodes =G.number_of_nodes()
for node in range (num_nodes-1):
    for next_node in range (node+1,num_nodes):
        a = G.node[node]['publication_authors']
        b = G.node[next_node]['publication_authors']
        common_authors = a.intersection(b)
        if (len(common_authors)>1):
            c = G.node[node]['title']
            d = G.node[next_node]['title']
            cluster_authors = a.union(b)
            cluster_title = c+d
            G.node[node]['publication_authors']=cluster_authors
            G.node[node]['title']=cluster_title
            G.remove_node(next_node)
        else:
            print ('Not enough common authors')

print(G.nodes(data=True))`

我的文本文件是

0<>Andy J. Wellings,Bob Johnson,Bo Sanden,Jorg Kienzle,Thomas Wolf,Stephen Michell<>Object-Oriented Programming and Protected Objects in Ada 95<>Ada-Europe<>2000<>null
1<>Bob Johnson<>UNIX Metrics: Is The Data In Open Systems The Same From Platform To Platform?<>Int. CMG Conference<>1995<>null
1<>Bob Johnson<>Triad Of Computing In The 21st Century Or, Back To The Future Again<>Int. CMG Conference<>1995<>null
2<>Bob Johnson<>User-centeredness, situatedness, and designing the media of computer documentation<>SIGDOC<>1990<>Miami University of Ohio
3<>Bob Johnson<>Introduction to commentaries on 'Spurious Coin: A History of Science, Management, and Technical Writing' by Bernadette Longo<>ACM Journal of Computer Documentation<>2001<>Michigan Technological University
4<>Brian Lawrence,Bob Johnson<>Manager: The Project Scoping Gamble<>IEEE Software<>1997<>null
0<>Andy J. Wellings,Bob Johnson,Bo Sanden,Jorg Kienzle,Thomas Wolf,Stephen Michell<>Integrating object-oriented programming and protected objects in Ada 95<>ACM Trans. Program. Lang. Syst.<>2000<>null
5<>Hung Quoc Nguyen,Bob Johnson,Robert Johnson,Michael Hackett<>Testing Applications on the Web: Test Planning for Mobile and Internet-Based Systems, 2 edition<>null<>2002<>null
2<>Bob Johnson<>The Wired Neighborhood: An Extended Multimedia Conversation<>ACM SIGDOC Asterisk Journal of Computer Documentation<>1997<>Miami University, Oxford, OH
2<>Bob Johnson<>Introduction to the book commentaries<>ACM SIGDOC Asterisk Journal of Computer Documentation<>1998<>Miami University, Oxford, OH
6<>Bob Johnson<>The cult of ISDN<>PC/Computing<>1989<>null

如果您提供一个最小的示例和最少的图表来重现您的问题,那么帮助您将会变得更加容易。阅读整个数据库(尤其是纯文本)以尝试弄清楚您需要什么是不可能的。 - Imanol Luengo
0<>ABC 1<>AGF 2<>JT 3<>U 4<>A 5<>ABTF 6<>AFT 这个图的第一个节点具有属性ABC。如果在该图的任何其他节点中有两个字符匹配,则必须将这些字符合并到此节点中,并删除其他节点。例如,节点0具有ABC属性,节点5具有ABTF,因此在这两个节点中,AB是共同的,因此我们必须将这些节点合并为0,并从networkx图中删除节点5。合并后,它将变为: 0<>ABCTF 1<>AGF 2<>JHT 3<>UJHG 4<>A 6<>AFT 现在节点0与节点6有三个相同之处。因此,节点0将被修改,节点6将被删除。 - paras
@Imanol Luengo 0<>ABC, 1<>AGF, 2<>JT, 3<>U, 4<>A, 5<>ABTF, 6<> 这些都是 networkx 图中的单个节点。 - paras
请有人回复。 - paras
2个回答


0

如果我理解正确,您想要递归地合并节点,直到边之间没有重叠。我的想法是从一个完全连接的图开始,“递归”合并节点。这里是一个“假”的递归实现。编辑:我不太确定为什么在这里需要networkx,只用字典也可以(可能更清晰)。

enter image description here

import networkx as nx

# file you provided
with open('temp.txt', 'r') as f:
    lines = f.readlines()



nodes = {}
for idx, line in enumerate(lines):
    authors, title, venue = line.split('<>')[1:4]
    authors = set(authors.split(','))
    nodes[idx] = dict(authors = authors, title = (title, ))

G = nx.complete_graph(len(nodes))
nx.set_node_attributes(G, nodes)


def merge_recursive(G, target = 'authors'):
    """
    Keeps merging if there is overlap between nodes
    """
    # check edges
    while G.edges():
        for (i, j) in G.edges():
            overlap = G.nodes()[i][target].intersection(G.nodes()[j][target])
            # copy values
            if overlap:
                tmp = {}
                for k, v in G.nodes()[i].items():
                    if isinstance(v, set):
                        tmp[k] = v.union(G.nodes()[j][k])
                    else:
                        tmp[k] = v + G.nodes()[j][k]

                nx.set_node_attributes(G, {i: tmp})
                G.remove_node(j)
            # no overlap remove edge
            else:
                G.remove_edge(i, j)
            break
    return G

merged = merge_recursive(G.copy())

from matplotlib.pyplot import subplots

fig, (left, right) = subplots(1, 2, figsize = (10,  5))
nx.draw(G, ax = left, with_labels = 1)
nx.draw(merged, ax = right, with_labels = 1)

left.set_title('Before merging')
right.set_title('After merging')
fig.show()

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接