将Neo4j Cypher查询结果转换为Pandas DataFrame

Question

将Neo4j Cypher查询结果转换为Pandas DataFrame

3

我试图读取CSV文件，其中包含节点ID和它们之间的关系。前两列表示节点，第三列表示它们之间的关系。到目前为止，我已经能够在neo4j中创建数据库，但我不确定该使用什么Cypher查询来将所需数据获取到pandas DataFrame中！

我将在此处使用大型数据集的子集来说明我的问题。原始数据集包含数千个节点和关系。

我的CSV文件（Node1_id，Node2_id，relation_id）如下：

0   1   1
4   2   1
44  3   1
0   4   1
0   5   1
4   10173   3
4   10191   2
4   10192   2
6   10193   2
8   10194   2
3   10195   2
6   10196   2

这里是节点的创建，通过从CSV文件中加载id来定义节点之间的关系。（我认为这个图形是正确的，但如果您注意到任何问题，请告诉我）我使用它们在CSV文件中的ID为节点和关系分配了一个名为“id”的属性。

LOAD CSV WITH HEADERS FROM  'file:///edges.csv' AS row FIELDTERMINATOR ","
WITH row
WHERE row.relation_id = '1'
MERGE (paper:Paper{id:(row.Node1_id)})
MERGE (author:Author{id:(row.Node2_id)})
CREATE (paper)-[au:AUTHORED{id: '1'}]->(author);

到目前为止，我尝试了类似以下的东西：

    query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper,author LIMIT 3; ''' 
    result = session.run(query)
    df = DataFrame(result)

    for dataF in df.itertuples(index=False):
    print(row)

它会返回以下内容：

0   1
0   (id)    (id)
1   (id)    (id)
2   (id)    (id)

期望结果:

我希望能够通过从GraphDB查询数据并逐行迭代结果，将结果以节点ID和关系ID的格式转换为pandas DataFrame，与csv文件中定义的格式相同。

0   1   1
4   2   1
44  3   1
0   4   1
0   5   1
4   10173   3
4   10191   2
4   10192   2
6   10193   2
8   10194   2
3   10195   2
6   10196   2

我也想知道这种情况下Cypher查询对象的返回类型是什么，它是pandas.core.frame.DataFrame，但在进行Cypher查询时如何访问每个节点和关系的属性。这是主要的问题。

请随意详细解释，我将非常感激您的帮助。

使用neo4j版本：4.2.1

- Sniper

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jose_bacoy · Accepted Answer

我正在使用py2neo，如果你使用其他库，你可以使用它或告诉我你正在使用哪个neo4j库，我会编辑我的答案。

#1：期望结果

我希望从图数据库中查询数据，并将结果以节点ID和关系ID的格式以CSV文件中定义的方式导入到pandas DataFrame中，并逐行迭代结果。

 from py2neo import Graph 
 from pandas import DataFrame
 # remove search by au.id='1' and limit so that you will get all 
 # return the id in your query 
 session = Graph("bolt://localhost:7687", auth=("neo4j", "****"))
 query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper.id, author.id, au.id LIMIT 3; ''' 
 # access the result data
 result = session.run(query).data() 
 # convert result into pandas dataframe 
 df = DataFrame(result)
 df.head()

结果：

0   1   1
4   2   1
44  3   1

#2：另一个问题

在cypher查询期间，我如何访问节点和关系的个别属性？答案：节点内的属性是字典，因此请使用get函数。

 # Note that we are returning the nodes and not ids
 query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper, author, au LIMIT 3; ''' 
result = session.run(query).data() 
print ("What is data type of result? ", type(result))
print ("What is the data type of each item? ", type(result[0]))
print ("What are the keys of the dictionary? ", result[0].keys())
print ("What is the class of the node? ", type(result[0].get('paper')))
print ("How to access the first node? ", result[0].get('paper'))
print ("How to access values inside the node? ", result[0].get('paper',{}).get('id'))

Result:
What is data type of result?  <class 'list'>
What is the data type of each item?  <class 'dict'>
What are the keys of the dictionary?  dict_keys(['paper', 'author', 'au'])
What is the class of the node?  <class 'py2neo.data.Node'>
How to access the first node?  (_888:paper {id: '1'})
How to access values inside the node?  '1'