Neo4J:如何从路径集合中查找唯一节点

4

我正在使用neo4j解决实时规范化问题。假设我有来自2个不同来源的3个地点。一个来源45给了我2个实际上是重复的地方,而一个来源55给了我1个正确的标识符。然而,对于任何地点标识符(无论是否重复),我都希望通过提供标识符的唯一性找到最接近的一组地点。我的数据如下:

CREATE (a: Place {feedId:45, placeId: 123, name:"Empire State", address: "350 5th Ave", city: "New York", state: "NY", zip: "10118" })
CREATE (b: Place {feedId:45, placeId: 456, name:"Empire State Building", address: "350 5th Ave", city: "New York", state: "NY"})
CREATE (c: Place {feedId:55, placeId: 789, name:"Empire State", address: "350 5th Ave", city: "New York", state: "NY", zip: "10118"})

我通过匹配节点将这些节点连接起来,以便对数据进行一些规范化。例如:

MERGE (m1: Matching:NameAndCity { attr: "EmpireStateBuildingNewYork", cost: 5.0 })
MERGE (a)-[:MATCHES]-(m1)
MERGE (b)-[:MATCHES]-(m1)
MERGE (c)-[:MATCHES]-(m1)
MERGE (m2: Matching:CityAndZip { attr: "NewYork10118", cost: 7.0 })
MERGE (a)-[:MATCHES]-(m2)
MERGE (c)-[:MATCHES]-(m2)

当我想查找与起始位置ID最接近的匹配项时,可以对所有从起始节点开始的路径进行匹配,并按成本排名,例如:

MATCH p=(a:Place {placeId:789, feedId:55})-[*..4]-(d:Place)
WHERE NONE (n IN nodes(p)
        WHERE size(filter(x IN nodes(p)
                          WHERE n = x))> 1)
WITH    p,
        reduce(costAccum = 0, n in filter(n in nodes(p) where has(n.cost)) | costAccum+n.cost) AS costAccum
        order by costAccum
RETURN p, costAccum

然而,由于到达相同位置有多条路径,因此当像这样查询时,我会获得多个复制的节点。是否可能收集节点及其成本,然后仅返回不同的子集(例如,给我从feed 45和55中获取的最佳结果)?
我应该如何返回一组不同的路径,按成本排名,并按feed标识符唯一?我的问题结构是否有误?
请帮忙!

你能否提供一个更完整的示例图,使用http://console.neo4j.org? - Stefan Armbruster
不是很确定你想要什么,但你是否了解“Order by”和“Distinct”?(http://neo4j.com/docs/stable/query-order.html#order-by-order-nodes-in-descending-order 和 http://neo4j.com/docs/stable/query-aggregation.html#aggregation-distinct)? - Thomas
1个回答

0

你可以收集每个地点 d 的所有路径,然后只需要在每个集合中选择最佳路径(因为它们按顺序排序并收集)

MATCH p=(a:Place {placeId:789, feedId:55})-[*..4]-(d:Place)
WITH d, collect(p) as paths,
        reduce(costAccum = 0, n in filter(n in nodes(p) where has(n.cost)) | costAccum+n.cost) AS costAccum
        order by costAccum
RETURN head(paths) as p, costAccum

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接