构建NetworkX图时避免NaN属性

Question

构建NetworkX图时避免NaN属性

4

我希望使用Pandas读取一个包含节点及其属性的CSV文件。并不是所有节点都有每个属性，缺失的属性在CSV文件中被简单地忽略。当Pandas读取CSV文件时，缺失值显示为nan。我想从数据框中批量添加节点，但避免添加nan属性。

例如，这里有一个名为mwe.csv的样本CSV文件：

Name,Cost,Depth,Class,Mean,SD,CST,SL,Time
Manuf_0001,39.00,1,Manuf,,,12,,10.00
Manuf_0002,36.00,1,Manuf,,,8,,10.00
Part_0001,12.00,2,Part,,,,,28.00
Part_0002,5.00,2,Part,,,,,15.00
Part_0003,9.00,2,Part,,,,,10.00
Retail_0001,0.00,0,Retail,253,36.62,0,0.95,0.00
Retail_0002,0.00,0,Retail,45,1,0,0.95,0.00
Retail_0003,0.00,0,Retail,75,2,0,0.95,0.00

以下是我目前的处理方式：

import pandas as pd
import numpy as np
import networkx as nx

node_df = pd.read_csv('mwe.csv')

graph = nx.DiGraph()
graph.add_nodes_from(node_df['Name'])
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['Cost'])), 'nodeCost')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['Mean'])), 'avgDemand')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['SD'])), 'sdDemand')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['CST'])), 'servTime')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['SL'])), 'servLevel')

# Loop through all nodes and all attributes and remove NaNs.
for i in graph.nodes:
    for k, v in list(graph.nodes[i].items()):
        if np.isnan(v):
            del graph.nodes[i][k]

它可以工作，但很笨重。有没有更好的方法，例如，在添加节点时避免使用nan，而不是之后删除nan？

- LarrySnyder610

2个回答

0

在导入csv到Pandas时使用keep_default_na：

pd.read_csv('data.csv', keep_default_na=False)

如何让pandas.read_csv将空值读取为空字符串而不是NaN

- crocefisso

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gambit1614 · Accepted Answer

你可以利用Pandas的强大功能来完成这个任务。因此，我创建了这个函数，将你的DataFrame转换为一个包含两个关键字和值列的Series，然后删除带有NaN的元素，并最终将其转换为一个字典。"Original Answer"翻译成"最初的回答"。

def create_node_attribs(key_col, val_col):
    # Upto you if you want to pass the dataframe as argument
    # In your case, since this was the only df, I only passed the columns
    global node_df
    return Series(node_df[val_col].values,
                  index=node_df[key_col]).dropna().to_dict()

这是完整的代码。最初的回答：

这里是完整的代码

import pandas as pd
import networkx as nx
from pandas import Series

node_df = pd.read_csv('mwe.csv')

graph = nx.DiGraph()

def create_node_attribs(key_col, val_col):
    # Upto you if you want to pass the dataframe as argument
    # In your case, since this was the only df, I only passed the columns
    global node_df
    return Series(node_df[val_col].values,
                  index=node_df[key_col]).dropna().to_dict()

graph.add_nodes_from(node_df['Name'])
nx.set_node_attributes(graph, create_node_attribs('Name', 'Cost'), 'nodeCost')
nx.set_node_attributes(graph, create_node_attribs('Name', 'Mean'), 'avgDemand')
nx.set_node_attributes(graph, create_node_attribs('Name', 'SD'), 'sdDemand')
nx.set_node_attributes(graph, create_node_attribs('Name', 'CST'), 'servTime')
nx.set_node_attributes(graph, create_node_attribs('Name', 'SL'), 'servLevel')

链接到Google Colab笔记本，其中包含代码。

此外，请参阅此答案，了解有关当前使用方法的时间比较的更多信息。

注：Original Answer翻译成“最初的回答”