这两个Cypher查询有什么区别?

5

我有些困惑。

在我的数据库中,我有这样的关系:

(u:User)-[r1:LISTENS_TO]->(a:Artist)<-[r2:LISTENS_TO]-(u2:User)

我希望查询一个给定用户和其他所有用户之间共同喜欢的艺术家。

为了让你了解我的数据库规模,我大约有600个用户,47,546个艺术家,以及184,211个用户和艺术家之间的关系。

我尝试的第一个查询如下:

START me=node(553314), other=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")

OPTIONAL MATCH 
    pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)
WHERE
    other:User

WITH other, COUNT(DISTINCT pMutualArtists) AS mutualArtists

ORDER BY mutualArtists DESC
LIMIT 10
RETURN other.username, mutualArtists

这个查询大约需要20秒才能返回结果。该查询的概要如下:
+----------------------+-------+--------+------------------------+------------------------------------------------------------------------------------------------+
|             Operator |  Rows | DbHits |            Identifiers |                                                                                          Other |
+----------------------+-------+--------+------------------------+------------------------------------------------------------------------------------------------+
|      ColumnFilter(0) |    10 |      0 |                        |                                                     keep columns other.username, mutualArtists |
|              Extract |    10 |     20 |                        |                                                                                 other.username |
|      ColumnFilter(1) |    10 |      0 |                        |                                                              keep columns other, mutualArtists |
|                  Top |    10 |      0 |                        | {  AUTOINT0}; Cached(  INTERNAL_AGGREGATEb6facb18-1c5d-45a6-83bf-a75c25ba6baf of type Integer) |
|     EagerAggregation |   563 |      0 |                        |                                                                                          other |
|        OptionalMatch | 52806 |      0 |                        |                                                                                                |
|             Eager(0) |   563 |      0 |                        |                                                                                                |
|  NodeByIndexQuery(1) |   563 |    564 |           other, other |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |                 me, me |                                                                          Literal(List(553314)) |
|             Eager(1) |    82 |      0 |                        |                                                                                                |
|          ExtractPath |    82 |      0 |         pMutualArtists |                                                                                                |
|            Filter(0) |    82 |     82 |                        |                                                    (hasLabel(a:Artist(1)) AND NOT(ar1 == ar2)) |
| SimplePatternMatcher |    82 |     82 | a, me, ar2, ar1, other |                                                                                                |
|            Filter(1) |     1 |      3 |                        |               ((hasLabel(me:User(3)) AND hasLabel(other:User(3))) AND hasLabel(other:User(3))) |
|  NodeByIndexQuery(1) |   563 |    564 |           other, other |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |                 me, me |                                                                          Literal(List(553314)) |
+----------------------+-------+--------+------------------------+------------------------------------------------------------------------------------------------+

我感到沮丧。这似乎不应该需要20秒钟。

后来,我回到这个问题上,并尝试从头开始调试它。

我开始分解查询,并注意到我得到了更快的结果。没有Neo4J空间查询,我大约可以在1.5秒内得到结果。

最终,我添加了一些东西,并得出了以下查询:

START u=node(553314), u2=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")

OPTIONAL MATCH 
    pMutualArtists=(u:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(u2:User)
WHERE
    u2:User

WITH u2, COUNT(DISTINCT pMutualArtists) AS mutualArtists

ORDER BY mutualArtists DESC
LIMIT 10
RETURN u2.username, mutualArtists

这个查询只用了4240毫秒,提升了5倍!该查询的概要如下:

+----------------------+-------+--------+--------------------+------------------------------------------------------------------------------------------------+
|             Operator |  Rows | DbHits |        Identifiers |                                                                                          Other |
+----------------------+-------+--------+--------------------+------------------------------------------------------------------------------------------------+
|      ColumnFilter(0) |    10 |      0 |                    |                                                        keep columns u2.username, mutualArtists |
|              Extract |    10 |     20 |                    |                                                                                    u2.username |
|      ColumnFilter(1) |    10 |      0 |                    |                                                                 keep columns u2, mutualArtists |
|                  Top |    10 |      0 |                    | {  AUTOINT0}; Cached(  INTERNAL_AGGREGATEbdf86ac1-8677-4d45-967f-c2dd594aba49 of type Integer) |
|     EagerAggregation |   563 |      0 |                    |                                                                                             u2 |
|        OptionalMatch | 52806 |      0 |                    |                                                                                                |
|             Eager(0) |   563 |      0 |                    |                                                                                                |
|  NodeByIndexQuery(1) |   563 |    564 |             u2, u2 |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |               u, u |                                                                          Literal(List(553314)) |
|             Eager(1) |    82 |      0 |                    |                                                                                                |
|          ExtractPath |    82 |      0 |     pMutualArtists |                                                                                                |
|            Filter(0) |    82 |     82 |                    |                                                    (hasLabel(a:Artist(1)) AND NOT(ar1 == ar2)) |
| SimplePatternMatcher |    82 |     82 | a, u2, u, ar2, ar1 |                                                                                                |
|            Filter(1) |     1 |      3 |                    |                      ((hasLabel(u:User(3)) AND hasLabel(u2:User(3))) AND hasLabel(u2:User(3))) |
|  NodeByIndexQuery(1) |   563 |    564 |             u2, u2 |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |               u, u |                                                                          Literal(List(553314)) |
+----------------------+-------+--------+--------------------+------------------------------------------------------------------------------------------------+

为了证明我连续运行它们并得到非常不同的结果:

neo4j-sh (?)$ START u=node(553314), u2=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
>
> OPTIONAL MATCH
>     pMutualArtists=(u:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(u2:User)
> WHERE
>     u2:User
>
> WITH u2, COUNT(DISTINCT pMutualArtists) AS mutualArtists
> ORDER BY mutualArtists DESC
> LIMIT 10
> RETURN u2.username, mutualArtists
> ;
+------------------------------+
| u2.username  | mutualArtists |
+------------------------------+
| "573904765"  | 644           |
| "28600291"   | 601           |
| "1092510304" | 558           |
| "1367963461" | 521           |
| "1508790199" | 455           |
| "1335360028" | 447           |
| "18200866"   | 444           |
| "1229430376" | 435           |
| "748318333"  | 434           |
| "5612902"    | 431           |
+------------------------------+
10 rows
4240 ms
neo4j-sh (?)$ START me=node(553314), other=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
>
> OPTIONAL MATCH
>     pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)
> WHERE
>     other:User
>
> WITH other, COUNT(DISTINCT pMutualArtists) AS mutualArtists
> ORDER BY mutualArtists DESC
> LIMIT 10
> RETURN other.username, mutualArtists;
+--------------------------------+
| other.username | mutualArtists |
+--------------------------------+
| "573904765"    | 644           |
| "28600291"     | 601           |
| "1092510304"   | 558           |
| "1367963461"   | 521           |
| "1508790199"   | 455           |
| "1335360028"   | 447           |
| "18200866"     | 444           |
| "1229430376"   | 435           |
| "748318333"    | 434           |
| "5612902"      | 431           |
+--------------------------------+
10 rows
20418 ms

除非我已经疯了,否则这两个查询之间唯一的区别就是节点名称(我将 "我" 改为 "u","其他人" 改为 "u2")。
为什么会导致5倍的提高?!如果有人对此有任何见解,我将感激不尽。
谢谢,
-Adam
编辑 8.1.14
根据 @ulkas 的建议,我尝试简化查询。
结果如下:
START u=node(553314), u2=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
OPTIONAL MATCH pMutualArtists=(u:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(u2:User)
RETURN u2.username, COUNT(DISTINCT pMutualArtists) as mutualArtists
ORDER BY mutualArtists DESC
LIMIT 10

~4 秒钟

START me=node(553314), other=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
OPTIONAL MATCH pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)
RETURN other.username, COUNT(DISTINCT pMutualArtists) as mutualArtists
ORDER BY mutualArtists DESC
LIMIT 10

大约20秒

太奇怪了,好像“other”和“me”的命名节点导致查询时间大幅跳升。我很困惑。

谢谢, -Adam

2个回答

1
使用OPTIONAL MATCH跟随WHERE other:User没有意义,因为结束节点other (u2)必须匹配。尝试在没有optional matchwhere以及最后一个with的情况下执行查询,简单地进行查询。
START me=node(553314), other=node:userLocations("withinDistance[38.89037,-77.03196,80.467]")
 MATCH
     pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)   
 RETURN other.username, count(DISTINCT pMutualArtists) as mutualArtists
 ORDER BY mutualArtists DESC
 LIMIT 10

感谢您的回复。我同意“WHERE”和“WITH”子句是不必要的。它们只是我尝试的东西。然而,对于我的特定用例,“OPTIONAL MATCH”是必需的,因为我需要返回与其他用户没有共同艺术家的用户,而“MATCH”不能返回这些用户。我将编辑我的帖子并告诉您基于您的评论我尝试了什么......简而言之......同样的问题。 - Adam Miskiewicz
我看到了,我也很困惑为什么会这样。正在关注这个问题。 - ulkas

1
那听起来像是你看到了缓存的效果。第一次访问时,缓存没有被填充。随后查询命中相同图形将会更快,因为节点/关系已经在缓存中可用。

1
谢谢您的回复。这正是我所认为的。但是,我可以连续运行长查询(包括我的和其他人的),并获得相同的结果,多次运行另一个查询也会产生相同的结果。此外,这些结果在服务器重新启动后仍然存在。 - Adam Miskiewicz

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接