我使用networkx制作了一个图表,它有三个属性(id、出版作者、出版物标题)。 我想要检查节点属性(作者),如果两个节点之间有共同的节点属性大于1,我想要添加这两个节点的属性并从图中删除第二个节点。例如,我有以下图表:
[(0, {'publication_authors': {'Bob Johnson', 'Stephen Michell', 'Andy J. Wellings', 'Jorg Kienzle', 'Thomas Wolf', 'Bo Sanden'}, 'title': 'Object-Oriented Programming and Protected Objects in Ada 95'}), (1, {'publication_authors': {'Bob Johnson'}, 'title': 'UNIX Metrics: Is The Data In Open Systems The Same From Platform To Platform?'}), (2, {'publication_authors': {'Bob Johnson'}, 'title': 'Triad Of Computing In The 21st Century Or, Back To The Future Again'}), (3, {'publication_authors': {'Bob Johnson'}, 'title': 'User-centeredness, situatedness, and designing the media of computer documentation'}), (4, {'publication_authors': {'Bob Johnson'}, 'title': "Introduction to commentaries on 'Spurious Coin: A History of Science, Management, and Technical Writing' by Bernadette Longo"}), (5, {'publication_authors': {'Brian Lawrence', 'Bob Johnson'}, 'title': 'Manager: The Project Scoping Gamble'}), (6, {'publication_authors': {'Bob Johnson', 'Stephen Michell', 'Andy J. Wellings', 'Jorg Kienzle', 'Thomas Wolf', 'Bo Sanden'}, 'title': 'Integrating object-oriented programming and protected objects in Ada 95'}), (7, {'publication_authors': {'Robert Johnson', 'Michael Hackett', 'Bob Johnson', 'Hung Quoc Nguyen'}, 'title': 'Testing Applications on the Web: Test Planning for Mobile and Internet-Based Systems, 2 edition'}), (8, {'publication_authors': {'Bob Johnson'}, 'title': 'The Wired Neighborhood: An Extended Multimedia Conversation'}), (9, {'publication_authors': {'Bob Johnson'}, 'title': 'Introduction to the book commentaries'}), (10, {'publication_authors': {'Bob Johnson'}, 'title': 'The cult of ISDN'})]
我想取第一个节点并将其与其余的10个节点进行比较,如果在任意两个节点(0、6)之间,“publication_authors”的值大于1,则我想修改第一个节点(在这种情况下是0)并将节点6的属性合并到0中。然后从图形中删除节点号为6的节点。我已经实现了以下代码,但它给出了错误。请有人帮助我纠正它。提前致谢。
import networkx as nx
ground_truth_file = 'C:\\Bob Johnson.txt'
G = nx.DiGraph()
f = open(ground_truth_file, mode='r')
lines = f.readlines()
i=0
for line in lines:
line.strip()
pub_authors = set()
tokens = line.split('<>')
authors = tokens[1]
title = tokens[2]
venue = tokens[3]
num_of_authors = authors.split(',')
for author in num_of_authors:
pub_authors.add(author)
G.add_node(i,publication_authors=pub_authors, title=title)
i=i+1
num_nodes =G.number_of_nodes()
for node in range (num_nodes-1):
for next_node in range (node+1,num_nodes):
a = G.node[node]['publication_authors']
b = G.node[next_node]['publication_authors']
common_authors = a.intersection(b)
if (len(common_authors)>1):
c = G.node[node]['title']
d = G.node[next_node]['title']
cluster_authors = a.union(b)
cluster_title = c+d
G.node[node]['publication_authors']=cluster_authors
G.node[node]['title']=cluster_title
G.remove_node(next_node)
else:
print ('Not enough common authors')
print(G.nodes(data=True))`
我的文本文件是
0<>Andy J. Wellings,Bob Johnson,Bo Sanden,Jorg Kienzle,Thomas Wolf,Stephen Michell<>Object-Oriented Programming and Protected Objects in Ada 95<>Ada-Europe<>2000<>null
1<>Bob Johnson<>UNIX Metrics: Is The Data In Open Systems The Same From Platform To Platform?<>Int. CMG Conference<>1995<>null
1<>Bob Johnson<>Triad Of Computing In The 21st Century Or, Back To The Future Again<>Int. CMG Conference<>1995<>null
2<>Bob Johnson<>User-centeredness, situatedness, and designing the media of computer documentation<>SIGDOC<>1990<>Miami University of Ohio
3<>Bob Johnson<>Introduction to commentaries on 'Spurious Coin: A History of Science, Management, and Technical Writing' by Bernadette Longo<>ACM Journal of Computer Documentation<>2001<>Michigan Technological University
4<>Brian Lawrence,Bob Johnson<>Manager: The Project Scoping Gamble<>IEEE Software<>1997<>null
0<>Andy J. Wellings,Bob Johnson,Bo Sanden,Jorg Kienzle,Thomas Wolf,Stephen Michell<>Integrating object-oriented programming and protected objects in Ada 95<>ACM Trans. Program. Lang. Syst.<>2000<>null
5<>Hung Quoc Nguyen,Bob Johnson,Robert Johnson,Michael Hackett<>Testing Applications on the Web: Test Planning for Mobile and Internet-Based Systems, 2 edition<>null<>2002<>null
2<>Bob Johnson<>The Wired Neighborhood: An Extended Multimedia Conversation<>ACM SIGDOC Asterisk Journal of Computer Documentation<>1997<>Miami University, Oxford, OH
2<>Bob Johnson<>Introduction to the book commentaries<>ACM SIGDOC Asterisk Journal of Computer Documentation<>1998<>Miami University, Oxford, OH
6<>Bob Johnson<>The cult of ISDN<>PC/Computing<>1989<>null