NetworkX - 从数据框设置节点属性

Question

NetworkX - 从数据框设置节点属性

17

我在尝试从数据框的列中添加属性到我的网络节点上时遇到了麻烦。

下面是我的数据框的一个示例，总共有大约10个列，但在创建网络时我只使用下面显示的5个列。

目前，我只能让我的网络使用边缘属性，如下所示：

g = nx.from_pandas_dataframe(df, 'node_from', 'node_to', edge_attr=['attribute1','attribute2','attribute3'])

该网络将是一个有向网络。下面的数据框中显示的属性是“node_from”节点的属性。“node_to”节点有时会出现作为“node_from”节点。可以在网络中可能显示的所有节点及其各自的属性都显示在df_attributes_only表中。

df_relationship：

node_from:  node_to: ........ attribute1:   attribute2:   attribute3:
    jim      john    ........    tall          red             fat
    ...

所有列的值都是单词而不是数字。

我还有另一个数据框，其中包含每个可能的节点及其属性：

df_attributes_only:

id:   attribute1:   attribute2:     attribute3:
jim      tall          red             fat
john     small         blue            fat
...

我需要将上述三个属性分配给它们各自的id，这样每个节点都有它们的3个属性附加在上面。

如果您能帮助我让节点属性与我的网络一起运行，我将不胜感激。

- user7057659

关于属性的快速问题。它们是描述它们连接的节点还是以某种方式描述关系？例如，Jim高大而胖？这是否以某种方式描述了Jim与其他事物之间的关系？是否存在多个属性的情况，例如是否有另一个条目显示关系，但将Jim列为矮和胖？Jim会有多个关系吗？ - Polkaguy6000

请检查我的答案，它更简单。@dataframed - DataFramed

4个回答

4

nx.from_pandas_dataframe（在最新的稳定版本2.2中还包括from_pandas_edgelist）的概念是将边列表转换为图形。也就是说，数据帧中的每一行代表一个边缘，这是一对 2个不同节点 。

使用此API无法读取节点属性。这是有道理的，因为每行都有两个不同节点，并为不同的节点保留特定列可能会很麻烦并导致不一致。例如，请考虑以下数据帧：

node_from node_to src_attr_1 tgt_attr_1
  a         b         0         3
  a         c         2         4

节点a的'src_attr_1'值应该是0还是2？此外，每个属性都需要保留两列（因为每条边中的两个节点都应该具有它）。在我看来，支持这种设计会很糟糕，我想这就是NetworkX API不支持它的原因。

将df转换为图后，仍然可以按如下方式读取节点属性：

import networkx as nx
import pandas as pd

# Build a sample dataframe (with 2 edges: 0 -> 1, 0 -> 2, node 0 has attr_1 value of 'a', node 1 has 'b', node 2 has 'c')
d = {'node_from': [0, 0], 'node_to': [1, 2], 'src_attr_1': ['a','a'], 'tgt_attr_1': ['b', 'c']}
df = pd.DataFrame(data=d)
G = nx.from_pandas_edgelist(df, 'node_from', 'node_to')

# Iterate over df rows and set the source and target nodes' attributes for each row:
for index, row in df.iterrows():
    G.nodes[row['node_from']]['attr_1'] = row['src_attr_1']
    G.nodes[row['node_to']]['attr_1'] = row['tgt_attr_1']

print(G.edges())
print(G.nodes(data=True))

编辑：

如果您想要为源节点拥有大量属性列表，可以按照以下方式自动提取此列的字典：

#List of desired source attributes:
src_attributes = ['src_attr_1', 'src_attr_2', 'src_attr_3']

# Iterate over df rows and set source node attributes:
for index, row in df.iterrows():
    src_attr_dict = {k: row.to_dict()[k] for k in src_attributes}    
    G.nodes[row['node_from']].update(src_attr_dict)

- zohar.kom

1

评论不适合进行长时间的讨论；此对话已被移至聊天室。 - Samuel Liew

请检查我的答案，它更简单。@dataframed - DataFramed

0

答案：

目标：从数据框对象生成具有节点、边和节点属性的网络。

假设我们想要生成一个具有节点和节点属性的网络。每个节点都有三个属性，即attr1、attr2和attr3。

给定一个名为df的数据框，其中第一列和第二列分别为from_node和to_node，并且具有属性列attr1、attr2和attr3。下面的代码将从数据框中添加所需的edge、node和node-attributes。

#%%time
g = nx.Graph()

# Add edges
g = nx.from_pandas_edgelist(df_5, 'from_node','to_node')
# Iterate over df rows and set the target nodes' and node-attributes for each row:
for index, row in df.iterrows():
    g.nodes[row[0]]['attr_dict'] = row.iloc[2:].to_dict() 

list(g.edges())[0:5]
list(g.nodes(data=True))[0:5]

- DataFramed

0

这是在@zohar.kom的回答基础上构建的。有一种方法可以解决这个问题，而不需要迭代。那个答案可以进行优化。我假设属性描述了node_from。

从边缘列表（如@zohar.kom的答案中）开始创建一个图：

 G = nx.from_pandas_edgelist(df, 'node_from', 'node_to')

您可以先添加节点和属性。

 # Create a mask with only the first records
 mask = ~df['node_from'].duplicated()
 # Get a list of nodes with attributes
 nodes = df[mask][['node_from','attribute1','attribute2','attribute3']]

从数据框中添加节点的方法来自于这个答案。

 # Add the attributes one at a time.
 attr_dict = nodes.set_index('node_from')['attribute1'].to_dict()
 nx.set_node_attributes(G,attr_dict,'attr1')

 attr_dict = nodes.set_index('node_from')['attribute2'].to_dict()
 nx.set_node_attributes(G,attr_dict,'attr2')

 attr_dict = nodes.set_index('node_from')['attribute3'].to_dict()
 nx.set_node_attributes(G,attr_dict,'attr3')

与 @zohar.kom 的结果类似，但迭代次数更少。

- Polkaguy6000

“node from”的列名是什么？该列需要填写这个信息。 - Polkaguy6000

TiAddendum，我意识到在发布时忘记了引号。我已经在答案中进行了编辑。 - Polkaguy6000

是的，但您的示例没有显示节点to的详细信息出现在哪里。哪些字段具有节点to的数据？ - Polkaguy6000

在这个例子中，您在源数据的单行中列出了Jim和John，属性字段中包含高、红和胖。这是否意味着Jim和John都具有这些属性？如果不是，哪个列名存储了node_to的属性？ - Polkaguy6000

是的，您需要添加“nodes_to”的数据。任何代码都无法发明不存在的数据。一旦添加，您只需使用完全相同的代码，将“node_from”替换为“node_to”，并用描述“node_to”的新字段替换描述“node_from”的旧字段。如果您正在查询数据并具有提取这些列的能力，我觉得可能有更好的方法来解决整个过程，但我们没有足够的信息来做出那个决定。 - Polkaguy6000

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- busybear · Accepted Answer

从 Networkx 2.0 开始，您可以将一个字典的字典输入到nx.set_node_attributes中，以设置多个节点的属性。这比手动迭代每个节点要简单得多。外部字典键表示每个节点，内部字典键对应于您想要为每个节点设置的属性。像这样：

attrs = {
    node0: {attr0: val00, attr1: val01},
    node1: {attr0: val10, attr1: val11},
    node2: {attr0: val20, attr1: val21},
}
nx.set_node_attributes(G, attrs)

你可以在文档中找到更多详细信息。

使用你的例子，假设你的索引是id，你可以将节点属性的数据框df_attributes_only转换为这种格式，并添加到你的图中：

df_attributes_only = pd.DataFrame(
    [['jim', 'tall', 'red', 'fat'], ['john', 'small', 'blue', 'fat']],
    columns=['id', 'attribute1', 'attribute2', 'attribute3']
)
node_attr = df_attributes_only.set_index('id').to_dict('index')
nx.set_node_attributes(g, node_attr)

g.nodes['jim']


>>> {'attribute1': 'tall', 'attribute2': 'red', 'attribute3': 'fat'}