使用Py2Neo加速在Neo4j中节点之间边的创建

Question

使用Py2Neo加速在Neo4j中节点之间边的创建

5

我正在尝试在neo4j中创建一个巨大的数据库，其中大约有200万个节点和400万条边。我已经通过将每1000个节点分批创建来加快节点创建过程。然而，当我尝试在这些节点之间创建边时，该过程变慢并超时。最初我认为它可能很慢，因为我是根据节点名称合并的，但即使我使用id也会更慢 - 我必须手动创建这些id。下面给出了数据和代码片段，以更好地理解问题。

1. Node.csv - 此文件包含有关节点的详细信息

NodeName NodeType NodeId Sachin Person 1 UNO Organisation 2 Obama Person 3 Cricket Sports 4 Tennis Sports 5 USA Place 6 India Place 7

2. Edges.csv - 此文件仅包含节点id及其关系

Node1Id Relationship Node2Id 1 Plays 4 3 PresidentOf 6 1 CitizenOf 7

以下是创建节点的代码。

from py2neo import Graph
graph = Graph()
statement =""
tx = graph.cypher.begin()
for i in range(len(Node)):
    statement = "Create(n{name:{A} ,label:{C}, id:{B}})"
    tx.append(statement,{"A": Node[i][0],"C":str(Node[i][1]), "B":str(Node[i][2])})
    if i % 1000 == 0:
        print str(i) + "Node Created"
        tx.commit()
        tx = self.graph.cypher.begin()
        statement =""

上述代码非常有效，仅用5分钟就完成了200万个节点的创建。以下是创建边缘的代码 -

tx = graph.cypher.begin()
statement = ""
for i in range(len(Edges)):
    statement = "MATCH (a {id:{A}}), (b {id:{B}}) CREATE (a)-[:"+ Edges[i][1]+"]->(b)"
    tx.append(statement, {"A": Edges[i][0], "B": Edges[i][2]})
        if i % 1000 == 0:
            print str(i) + " Relationship Created"
            tx.commit()
            tx = graph.cypher.begin()
            statement = ""

上面的代码可以成功创建前1000个关系，但是在此之后，它需要很长时间并且连接超时。

我急需解决这个问题，任何能加快关系创建过程的帮助都将非常有用。

请注意 - 我没有使用Neo4j的import csv或Neo4j shell import，因为这些假设节点之间的关系是固定的。而对于我来说，关系是不同的，一次导入一个关系不可行，因为这意味着手动导入近2000次。

- Pawan

你在 id 属性上有索引（最好是唯一性约束）吗？ - Martin Preusse

Michael在下面的答案中提到了同样的建议，它像魔法一样起作用。谢谢！ - Pawan

2个回答

2

这是更新后的代码版本，因为在py2neo（v3+）事务中有很多内容被弃用。代码还包括了Michael的解决方案。

节点：

def insert_nodes_to_neodb():
    queries_per_transaction = 1000  # 1000 seems to work faster
    node_path = "path_to_csv"

    graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")
    trans_action = graph.begin()

    with open(node_path) as csvfile:
        next(csvfile)   # labels of the columns
        node_csv = csv.reader(csvfile, delimiter=',')
        for idx, row in enumerate(node_csv):
            statement = "CREATE (n:Entity {id:{A} , label:{B}, name:{B}})"  # name is for visual reasons (neo4j)
            trans_action.run(statement, {"A": row[0], "B": row[1]})

            if idx % queries_per_transaction == 0:
                trans_action.commit()
                trans_action = graph.begin()

        # Commit the left over queries
        trans_action.commit()

        # We need to create indexes on a separate transaction
        # neo4j.exceptions.Forbidden: Cannot perform schema updates in a transaction that has performed data updates.
        trans_action = graph.begin()
        trans_action.run("CREATE CONSTRAINT ON (o:Entity) ASSERT o.id IS UNIQUE;")
        trans_action.commit()

边缘：

def insert_edges_to_neodb(neo_graph):
    queries_per_transaction = 1000  # 1000 seems to work faster
    edge_path = "path_to_csv"
    graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")

    trans_action = graph.begin()

    with open(edge_path) as csvfile:
        next(csvfile)   # labels of the columns
        edge_csv = csv.reader(csvfile, delimiter=',')
        for idx, row in enumerate(edge_csv):
            statement = """MATCH (a:Entity),(b:Entity)
                        WHERE a.id = {A} AND b.id = {B}
                        CREATE (a)-[r:CO_APPEARS { name: a.name + '<->' + b.name, weight: {C} }]->(b)
                        RETURN type(r), r.name"""
            trans_action.run(statement, {"A": row[0], "B": row[1], "C": row[2]})

            if idx % queries_per_transaction == 0:
                trans_action.commit()
                trans_action = graph.begin()

        trans_action.commit()

- Anoroah

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Michael Hunger · Accepted Answer

你忘记为节点使用标签，然后在标签和ID上创建约束。

create constraint on (o:Organization) assert o.id is unique;
create constraint on (p:Person) assert p.id is unique;

Create(n:Person {name:{A} ,id:{B}})
Create(n:Organization {name:{A} ,id:{B}})

match (p:Person {id:{p_Iid}), (o:Organization {id:{o_id}})
create (p)-[:WORKS_FOR]->(o);