这两个Cypher查询有什么区别？

Question

这两个Cypher查询有什么区别？

5

我有些困惑。

在我的数据库中，我有这样的关系：

(u:User)-[r1:LISTENS_TO]->(a:Artist)<-[r2:LISTENS_TO]-(u2:User)

我希望查询一个给定用户和其他所有用户之间共同喜欢的艺术家。

为了让你了解我的数据库规模，我大约有600个用户，47,546个艺术家，以及184,211个用户和艺术家之间的关系。

我尝试的第一个查询如下：

START me=node(553314), other=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")

OPTIONAL MATCH 
    pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)
WHERE
    other:User

WITH other, COUNT(DISTINCT pMutualArtists) AS mutualArtists

ORDER BY mutualArtists DESC
LIMIT 10
RETURN other.username, mutualArtists

这个查询大约需要20秒才能返回结果。该查询的概要如下：

+----------------------+-------+--------+------------------------+------------------------------------------------------------------------------------------------+
|             Operator |  Rows | DbHits |            Identifiers |                                                                                          Other |
+----------------------+-------+--------+------------------------+------------------------------------------------------------------------------------------------+
|      ColumnFilter(0) |    10 |      0 |                        |                                                     keep columns other.username, mutualArtists |
|              Extract |    10 |     20 |                        |                                                                                 other.username |
|      ColumnFilter(1) |    10 |      0 |                        |                                                              keep columns other, mutualArtists |
|                  Top |    10 |      0 |                        | {  AUTOINT0}; Cached(  INTERNAL_AGGREGATEb6facb18-1c5d-45a6-83bf-a75c25ba6baf of type Integer) |
|     EagerAggregation |   563 |      0 |                        |                                                                                          other |
|        OptionalMatch | 52806 |      0 |                        |                                                                                                |
|             Eager(0) |   563 |      0 |                        |                                                                                                |
|  NodeByIndexQuery(1) |   563 |    564 |           other, other |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |                 me, me |                                                                          Literal(List(553314)) |
|             Eager(1) |    82 |      0 |                        |                                                                                                |
|          ExtractPath |    82 |      0 |         pMutualArtists |                                                                                                |
|            Filter(0) |    82 |     82 |                        |                                                    (hasLabel(a:Artist(1)) AND NOT(ar1 == ar2)) |
| SimplePatternMatcher |    82 |     82 | a, me, ar2, ar1, other |                                                                                                |
|            Filter(1) |     1 |      3 |                        |               ((hasLabel(me:User(3)) AND hasLabel(other:User(3))) AND hasLabel(other:User(3))) |
|  NodeByIndexQuery(1) |   563 |    564 |           other, other |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |                 me, me |                                                                          Literal(List(553314)) |
+----------------------+-------+--------+------------------------+------------------------------------------------------------------------------------------------+

我感到沮丧。这似乎不应该需要20秒钟。

后来，我回到这个问题上，并尝试从头开始调试它。

我开始分解查询，并注意到我得到了更快的结果。没有Neo4J空间查询，我大约可以在1.5秒内得到结果。

最终，我添加了一些东西，并得出了以下查询：

START u=node(553314), u2=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")

OPTIONAL MATCH 
    pMutualArtists=(u:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(u2:User)
WHERE
    u2:User

WITH u2, COUNT(DISTINCT pMutualArtists) AS mutualArtists

ORDER BY mutualArtists DESC
LIMIT 10
RETURN u2.username, mutualArtists

这个查询只用了4240毫秒，提升了5倍！该查询的概要如下：

+----------------------+-------+--------+--------------------+------------------------------------------------------------------------------------------------+
|             Operator |  Rows | DbHits |        Identifiers |                                                                                          Other |
+----------------------+-------+--------+--------------------+------------------------------------------------------------------------------------------------+
|      ColumnFilter(0) |    10 |      0 |                    |                                                        keep columns u2.username, mutualArtists |
|              Extract |    10 |     20 |                    |                                                                                    u2.username |
|      ColumnFilter(1) |    10 |      0 |                    |                                                                 keep columns u2, mutualArtists |
|                  Top |    10 |      0 |                    | {  AUTOINT0}; Cached(  INTERNAL_AGGREGATEbdf86ac1-8677-4d45-967f-c2dd594aba49 of type Integer) |
|     EagerAggregation |   563 |      0 |                    |                                                                                             u2 |
|        OptionalMatch | 52806 |      0 |                    |                                                                                                |
|             Eager(0) |   563 |      0 |                    |                                                                                                |
|  NodeByIndexQuery(1) |   563 |    564 |             u2, u2 |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |               u, u |                                                                          Literal(List(553314)) |
|             Eager(1) |    82 |      0 |                    |                                                                                                |
|          ExtractPath |    82 |      0 |     pMutualArtists |                                                                                                |
|            Filter(0) |    82 |     82 |                    |                                                    (hasLabel(a:Artist(1)) AND NOT(ar1 == ar2)) |
| SimplePatternMatcher |    82 |     82 | a, u2, u, ar2, ar1 |                                                                                                |
|            Filter(1) |     1 |      3 |                    |                      ((hasLabel(u:User(3)) AND hasLabel(u2:User(3))) AND hasLabel(u2:User(3))) |
|  NodeByIndexQuery(1) |   563 |    564 |             u2, u2 |                             Literal(withinDistance:[38.89037,-77.03196,80.467]); userLocations |
|          NodeById(1) |     1 |      1 |               u, u |                                                                          Literal(List(553314)) |
+----------------------+-------+--------+--------------------+------------------------------------------------------------------------------------------------+

为了证明我连续运行它们并得到非常不同的结果：

neo4j-sh (?)$ START u=node(553314), u2=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
>
> OPTIONAL MATCH
>     pMutualArtists=(u:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(u2:User)
> WHERE
>     u2:User
>
> WITH u2, COUNT(DISTINCT pMutualArtists) AS mutualArtists
> ORDER BY mutualArtists DESC
> LIMIT 10
> RETURN u2.username, mutualArtists
> ;
+------------------------------+
| u2.username  | mutualArtists |
+------------------------------+
| "573904765"  | 644           |
| "28600291"   | 601           |
| "1092510304" | 558           |
| "1367963461" | 521           |
| "1508790199" | 455           |
| "1335360028" | 447           |
| "18200866"   | 444           |
| "1229430376" | 435           |
| "748318333"  | 434           |
| "5612902"    | 431           |
+------------------------------+
10 rows
4240 ms
neo4j-sh (?)$ START me=node(553314), other=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
>
> OPTIONAL MATCH
>     pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)
> WHERE
>     other:User
>
> WITH other, COUNT(DISTINCT pMutualArtists) AS mutualArtists
> ORDER BY mutualArtists DESC
> LIMIT 10
> RETURN other.username, mutualArtists;
+--------------------------------+
| other.username | mutualArtists |
+--------------------------------+
| "573904765"    | 644           |
| "28600291"     | 601           |
| "1092510304"   | 558           |
| "1367963461"   | 521           |
| "1508790199"   | 455           |
| "1335360028"   | 447           |
| "18200866"     | 444           |
| "1229430376"   | 435           |
| "748318333"    | 434           |
| "5612902"      | 431           |
+--------------------------------+
10 rows
20418 ms

除非我已经疯了，否则这两个查询之间唯一的区别就是节点名称（我将 "我" 改为 "u"，"其他人" 改为 "u2"）。

为什么会导致5倍的提高？！如果有人对此有任何见解，我将感激不尽。

谢谢，

-Adam

编辑 8.1.14

根据 @ulkas 的建议，我尝试简化查询。

结果如下：

START u=node(553314), u2=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
OPTIONAL MATCH pMutualArtists=(u:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(u2:User)
RETURN u2.username, COUNT(DISTINCT pMutualArtists) as mutualArtists
ORDER BY mutualArtists DESC
LIMIT 10

~4 秒钟

START me=node(553314), other=node:userLocations("withinDistance:[38.89037,-77.03196,80.467]")
OPTIONAL MATCH pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)
RETURN other.username, COUNT(DISTINCT pMutualArtists) as mutualArtists
ORDER BY mutualArtists DESC
LIMIT 10

大约20秒

太奇怪了，好像“other”和“me”的命名节点导致查询时间大幅跳升。我很困惑。

谢谢， -Adam

- Adam Miskiewicz

2个回答

1

那听起来像是你看到了缓存的效果。第一次访问时，缓存没有被填充。随后查询命中相同图形将会更快，因为节点/关系已经在缓存中可用。

- Stefan Armbruster

1

谢谢您的回复。这正是我所认为的。但是，我可以连续运行长查询（包括我的和其他人的），并获得相同的结果，多次运行另一个查询也会产生相同的结果。此外，这些结果在服务器重新启动后仍然存在。 - Adam Miskiewicz

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ulkas · Accepted Answer

使用OPTIONAL MATCH跟随WHERE other:User没有意义，因为结束节点other (u2)必须匹配。尝试在没有optional match和where以及最后一个with的情况下执行查询，简单地进行查询。

START me=node(553314), other=node:userLocations("withinDistance[38.89037,-77.03196,80.467]")
 MATCH
     pMutualArtists=(me:User)-[ar1:LISTENS_TO]->(a:Artist)<-[ar2:LISTENS_TO]-(other:User)   
 RETURN other.username, count(DISTINCT pMutualArtists) as mutualArtists
 ORDER BY mutualArtists DESC
 LIMIT 10