Mongodb - 没有结果返回时性能差

17

我有一个包含大约700万个表示地点的Mongodb集合。

我的查询是根据某个特定位置附近以某个前缀开头的地点的名称进行搜索。

我们已经创建了下面描述的复合索引来加速搜索。

当查询找到匹配项(即使只有一个)时,查询速度非常快(约20毫秒)。但是,当没有匹配项时,查询执行可能需要30秒。

请协助解决问题。

具体而言:

每个地点(geoData)都有以下字段:

"loc" - a GeoJSON point that represent the location
"categoriesIds" - array of int ids
"name" - the name of the placee

此集合上定义了以下索引:

{
  "loc" : "2dsphere",
  "categoriesIds" : 1,
  "name" : 1
}

查询内容为:

db.geoData.find({
  "loc":{
    "$near":{
      "$geometry":{
        "type": "Point" ,
        "coordinates": [ -0.10675191879272461 , 51.531600743186644]
      },
      "$maxDistance": 5000.0
    }
  }, 
  "categoriesIds":{
    "$in": [ 1 , 2 , 71 , 70 , 74 , 72 , 73 , 69 , 44 , 26 , 27 , 33 , 43 , 45 , 53 , 79]
  }, 
  "name":{ "$regex": "^Cafe Ne"}
})

执行统计信息 (链接到完整的解释结果)

    "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 1,
    "executionTimeMillis" : 169,
    "totalKeysExamined" : 14333,
    "totalDocsExamined" : 1,
    "executionStages" : {
        "stage" : "GEO_NEAR_2DSPHERE",
        "nReturned" : 1,
        "executionTimeMillisEstimate" : 60,
        "works" : 14354,
        "advanced" : 1,
        "needTime" : 14351,
        "needFetch" : 0,
        "saveState" : 361,
        "restoreState" : 361,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
            "loc" : "2dsphere",
            "categoriesIds" : 1,
            "name" : 1
        },
        "indexName" : "loc_2dsphere_categoriesIds_1_name_1",
        "searchIntervals" : [ 
            {
                "minDistance" : 0,
                "maxDistance" : 3408.329295346151,
                "maxInclusive" : false
            }, 
            {
                "minDistance" : 3408.329295346151,
                "maxDistance" : 5000,
                "maxInclusive" : true
            }
        ],
        "inputStages" : [ 
            {
                "stage" : "FETCH",
                "nReturned" : 1,
                "executionTimeMillisEstimate" : 20,
                "works" : 6413,
                "advanced" : 1,
                "needTime" : 6411,
                "needFetch" : 0,
                "saveState" : 361,
                "restoreState" : 361,
                "isEOF" : 1,
                "invalidates" : 0,
                "docsExamined" : 1,
                "alreadyHasObj" : 0,
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "filter" : {
                        "TwoDSphereKeyInRegionExpression" : true
                    },
                    "nReturned" : 1,
                    "executionTimeMillisEstimate" : 20,
                    "works" : 6413,
                    "advanced" : 1,
                    "needTime" : 6411,
                    "needFetch" : 0,
                    "saveState" : 361,
                    "restoreState" : 361,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "keyPattern" : {
                        "loc" : "2dsphere",
                        "categoriesIds" : 1,
                        "name" : 1
                    },
                    "indexName" : "loc_2dsphere_categoriesIds_1_name_1",
                    "isMultiKey" : true,
                    "direction" : "forward",
                    "indexBounds" : {
                        "loc" : [ 
                            "[\"2f1003230\", \"2f1003230\"]", 
                            "[\"2f10032300\", \"2f10032300\"]", 
                            "[\"2f100323000\", \"2f100323000\"]", 
                            "[\"2f1003230001\", \"2f1003230001\"]", 
                            "[\"2f10032300012\", \"2f10032300013\")", 
                            "[\"2f1003230002\", \"2f1003230002\"]", 
                            "[\"2f10032300021\", \"2f10032300022\")", 
                            "[\"2f10032300022\", \"2f10032300023\")", 
                            "[\"2f100323003\", \"2f100323003\"]", 
                            "[\"2f1003230031\", \"2f1003230031\"]", 
                            "[\"2f10032300311\", \"2f10032300312\")", 
                            "[\"2f10032300312\", \"2f10032300313\")", 
                            "[\"2f10032300313\", \"2f10032300314\")", 
                            "[\"2f1003230032\", \"2f1003230032\"]", 
                            "[\"2f10032300320\", \"2f10032300321\")", 
                            "[\"2f10032300321\", \"2f10032300322\")"
                        ],
                        "categoriesIds" : [ 
                            "[1.0, 1.0]", 
                            "[2.0, 2.0]", 
                            "[26.0, 26.0]", 
                            "[27.0, 27.0]", 
                            "[33.0, 33.0]", 
                            "[43.0, 43.0]", 
                            "[44.0, 44.0]", 
                            "[45.0, 45.0]", 
                            "[53.0, 53.0]", 
                            "[69.0, 69.0]", 
                            "[70.0, 70.0]", 
                            "[71.0, 71.0]", 
                            "[72.0, 72.0]", 
                            "[73.0, 73.0]", 
                            "[74.0, 74.0]", 
                            "[79.0, 79.0]"
                        ],
                        "name" : [ 
                            "[\"Cafe Ne\", \"Cafe Nf\")", 
                            "[/^Cafe Ne/, /^Cafe Ne/]"
                        ]
                    },
                    "keysExamined" : 6412,
                    "dupsTested" : 0,
                    "dupsDropped" : 0,
                    "seenInvalidated" : 0,
                    "matchTested" : 1
                }
            }, 
            {
                "stage" : "FETCH",
                "nReturned" : 0,
                "executionTimeMillisEstimate" : 40,
                "works" : 7922,
                "advanced" : 0,
                "needTime" : 7921,
                "needFetch" : 0,
                "saveState" : 261,
                "restoreState" : 261,
                "isEOF" : 1,
                "invalidates" : 0,
                "docsExamined" : 0,
                "alreadyHasObj" : 0,
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "filter" : {
                        "TwoDSphereKeyInRegionExpression" : true
                    },
                    "nReturned" : 0,
                    "executionTimeMillisEstimate" : 40,
                    "works" : 7922,
                    "advanced" : 0,
                    "needTime" : 7921,
                    "needFetch" : 0,
                    "saveState" : 261,
                    "restoreState" : 261,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "keyPattern" : {
                        "loc" : "2dsphere",
                        "categoriesIds" : 1,
                        "name" : 1
                    },
                    "indexName" : "loc_2dsphere_categoriesIds_1_name_1",
                    "isMultiKey" : true,
                    "direction" : "forward",
                    "indexBounds" : {
                        "loc" : [ 
                            "[\"2f1003230\", \"2f1003230\"]", 
                            "[\"2f10032300\", \"2f10032300\"]", 
                            "[\"2f100323000\", \"2f100323000\"]", 
                            "[\"2f1003230001\", \"2f1003230001\"]", 
                            "[\"2f10032300011\", \"2f10032300012\")", 
                            "[\"2f10032300012\", \"2f10032300013\")", 
                            "[\"2f1003230002\", \"2f1003230002\"]", 
                            "[\"2f10032300021\", \"2f10032300022\")", 
                            "[\"2f10032300022\", \"2f10032300023\")", 
                            "[\"2f100323003\", \"2f100323003\"]", 
                            "[\"2f1003230031\", \"2f1003230032\")", 
                            "[\"2f1003230032\", \"2f1003230032\"]", 
                            "[\"2f10032300320\", \"2f10032300321\")", 
                            "[\"2f10032300321\", \"2f10032300322\")", 
                            "[\"2f10032300322\", \"2f10032300323\")"
                        ],
                        "categoriesIds" : [ 
                            "[1.0, 1.0]", 
                            "[2.0, 2.0]", 
                            "[26.0, 26.0]", 
                            "[27.0, 27.0]", 
                            "[33.0, 33.0]", 
                            "[43.0, 43.0]", 
                            "[44.0, 44.0]", 
                            "[45.0, 45.0]", 
                            "[53.0, 53.0]", 
                            "[69.0, 69.0]", 
                            "[70.0, 70.0]", 
                            "[71.0, 71.0]", 
                            "[72.0, 72.0]", 
                            "[73.0, 73.0]", 
                            "[74.0, 74.0]", 
                            "[79.0, 79.0]"
                        ],
                        "name" : [ 
                            "[\"Cafe Ne\", \"Cafe Nf\")", 
                            "[/^Cafe Ne/, /^Cafe Ne/]"
                        ]
                    },
                    "keysExamined" : 7921,
                    "dupsTested" : 0,
                    "dupsDropped" : 0,
                    "seenInvalidated" : 0,
                    "matchTested" : 0
                }
            }
        ]
    },

搜索"CafeNeeNNN"而不是"Cafe Ne"时的执行统计数据 (链接整个解释结果

 "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 2537,
    "totalKeysExamined" : 232259,
    "totalDocsExamined" : 162658,
    "executionStages" : {
        "stage" : "FETCH",
        "filter" : {
            "$and" : [ 
                {
                    "name" : /^CafeNeeNNN/
                }, 
                {
                    "categoriesIds" : {
                        "$in" : [ 
                            1, 
                            2, 
                            26, 
                            27, 
                            33, 
                            43, 
                            44, 
                            45, 
                            53, 
                            69, 
                            70, 
                            71, 
                            72, 
                            73, 
                            74, 
                            79
                        ]
                    }
                }
            ]
        },
        "nReturned" : 0,
        "executionTimeMillisEstimate" : 1330,
        "works" : 302752,
        "advanced" : 0,
        "needTime" : 302750,
        "needFetch" : 0,
        "saveState" : 4731,
        "restoreState" : 4731,
        "isEOF" : 1,
        "invalidates" : 0,
        "docsExamined" : 70486,
        "alreadyHasObj" : 70486,
        "inputStage" : {
            "stage" : "GEO_NEAR_2DSPHERE",
            "nReturned" : 70486,
            "executionTimeMillisEstimate" : 1290,
            "works" : 302751,
            "advanced" : 70486,
            "needTime" : 232264,
            "needFetch" : 0,
            "saveState" : 4731,
            "restoreState" : 4731,
            "isEOF" : 1,
            "invalidates" : 0,
            "keyPattern" : {
                "loc" : "2dsphere"
            },
            "indexName" : "loc_2dsphere",
            "searchIntervals" : [ 
                {
                    "minDistance" : 0,
                    "maxDistance" : 3408.329295346151,
                    "maxInclusive" : false
                }, 
                {
                    "minDistance" : 3408.329295346151,
                    "maxDistance" : 5000,
                    "maxInclusive" : true
                }
            ],
            "inputStages" : [ 
                {
                    "stage" : "FETCH",
                    "nReturned" : 44540,
                    "executionTimeMillisEstimate" : 110,
                    "works" : 102690,
                    "advanced" : 44540,
                    "needTime" : 58149,
                    "needFetch" : 0,
                    "saveState" : 4731,
                    "restoreState" : 4731,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "docsExamined" : 44540,
                    "alreadyHasObj" : 0,
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "filter" : {
                            "TwoDSphereKeyInRegionExpression" : true
                        },
                        "nReturned" : 44540,
                        "executionTimeMillisEstimate" : 90,
                        "works" : 102690,
                        "advanced" : 44540,
                        "needTime" : 58149,
                        "needFetch" : 0,
                        "saveState" : 4731,
                        "restoreState" : 4731,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "keyPattern" : {
                            "loc" : "2dsphere"
                        },
                        "indexName" : "loc_2dsphere",
                        "isMultiKey" : false,
                        "direction" : "forward",
                        "indexBounds" : {
                            "loc" : [ 
                                "[\"2f1003230\", \"2f1003230\"]", 
                                "[\"2f10032300\", \"2f10032300\"]", 
                                "[\"2f100323000\", \"2f100323000\"]", 
                                "[\"2f1003230001\", \"2f1003230001\"]", 
                                "[\"2f10032300012\", \"2f10032300013\")", 
                                "[\"2f1003230002\", \"2f1003230002\"]", 
                                "[\"2f10032300021\", \"2f10032300022\")", 
                                "[\"2f10032300022\", \"2f10032300023\")", 
                                "[\"2f100323003\", \"2f100323003\"]", 
                                "[\"2f1003230031\", \"2f1003230031\"]", 
                                "[\"2f10032300311\", \"2f10032300312\")", 
                                "[\"2f10032300312\", \"2f10032300313\")", 
                                "[\"2f10032300313\", \"2f10032300314\")", 
                                "[\"2f1003230032\", \"2f1003230032\"]", 
                                "[\"2f10032300320\", \"2f10032300321\")", 
                                "[\"2f10032300321\", \"2f10032300322\")"
                            ]
                        },
                        "keysExamined" : 102689,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 44540
                    }
                }, 
                {
                    "stage" : "FETCH",
                    "nReturned" : 47632,
                    "executionTimeMillisEstimate" : 250,
                    "works" : 129571,
                    "advanced" : 47632,
                    "needTime" : 81938,
                    "needFetch" : 0,
                    "saveState" : 2556,
                    "restoreState" : 2556,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "docsExamined" : 47632,
                    "alreadyHasObj" : 0,
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "filter" : {
                            "TwoDSphereKeyInRegionExpression" : true
                        },
                        "nReturned" : 47632,
                        "executionTimeMillisEstimate" : 230,
                        "works" : 129571,
                        "advanced" : 47632,
                        "needTime" : 81938,
                        "needFetch" : 0,
                        "saveState" : 2556,
                        "restoreState" : 2556,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "keyPattern" : {
                            "loc" : "2dsphere"
                        },
                        "indexName" : "loc_2dsphere",
                        "isMultiKey" : false,
                        "direction" : "forward",
                        "indexBounds" : {
                            "loc" : [ 
                                "[\"2f1003230\", \"2f1003230\"]", 
                                "[\"2f10032300\", \"2f10032300\"]", 
                                "[\"2f100323000\", \"2f100323000\"]", 
                                "[\"2f1003230001\", \"2f1003230001\"]", 
                                "[\"2f10032300011\", \"2f10032300012\")", 
                                "[\"2f10032300012\", \"2f10032300013\")", 
                                "[\"2f1003230002\", \"2f1003230002\"]", 
                                "[\"2f10032300021\", \"2f10032300022\")", 
                                "[\"2f10032300022\", \"2f10032300023\")", 
                                "[\"2f100323003\", \"2f100323003\"]", 
                                "[\"2f1003230031\", \"2f1003230032\")", 
                                "[\"2f1003230032\", \"2f1003230032\"]", 
                                "[\"2f10032300320\", \"2f10032300321\")", 
                                "[\"2f10032300321\", \"2f10032300322\")", 
                                "[\"2f10032300322\", \"2f10032300323\")"
                            ]
                        },
                        "keysExamined" : 129570,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 47632
                    }
                }
            ]
        }
    },

集合上的索引

{
"0" : {
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "wego.geoData"
},
"1" : {
    "v" : 1,
    "key" : {
        "srcId" : 1
    },
    "name" : "srcId_1",
    "ns" : "wego.geoData"
},
"2" : {
    "v" : 1,
    "key" : {
        "loc" : "2dsphere"
    },
    "name" : "loc_2dsphere",
    "ns" : "wego.geoData",
    "2dsphereIndexVersion" : 2
},
"3" : {
    "v" : 1,
    "key" : {
        "name" : 1
    },
    "name" : "name_1",
    "ns" : "wego.geoData"
},
"4" : {
    "v" : 1,
    "key" : {
        "loc" : "2dsphere",
        "categoriesIds" : 1,
        "name" : 1
    },
    "name" : "loc_2dsphere_categoriesIds_1_name_1",
    "ns" : "wego.geoData",
    "2dsphereIndexVersion" : 2
},
"5" : {
    "v" : 1,
    "key" : {
        "loc" : "2dsphere",
        "categoriesIds" : 1,
        "keywords" : 1
    },
    "name" : "loc_2dsphere_categoriesIds_1_keywords_1",
    "ns" : "wego.geoData",
    "2dsphereIndexVersion" : 2
}
}

收集统计链接


你能否也发布两个查询的explain()中的“queryPlanner”部分? - joao
我给整个“解释”结果添加了链接。 - Eliezer
我在集合上添加了索引列表。 - Eliezer
尝试移除 loc_2dsphere_categoriesIds_1_name_1 索引,并改为在 locname 上创建复合索引。我认为这样可以提供足够的选择性,使索引有效,而无需包含 categoriesIds 的多键膨胀。 - JohnnyHK
你尝试过使用不同的索引吗,就像建议的那样? - Laizer
显示剩余10条评论
4个回答

10

我要进行一些猜测,然后评论一下你的设计。

首先,当你在键上创建一个包含值数组的索引时,你会为数组的每个元素创建一个记录:

为了对保存数组值的字段创建索引,MongoDB 为数组中的每个元素创建一个索引键。

这是来自MongoDB有关索引的文档

所以, 如果你的典型记录超过了几个类别并且你有7百万条记录,那么你的索引会变得很大,并且扫描索引本身以找到索引不包含所需内容的时间也会很长。它仍然比集合扫描快,但与查找现有记录所需的时间相比,它确实很慢。

现在,让我评论一下你的架构设计。这是一种风格问题,所以可以忽略这部分。

你有一个可能属于17个类别的记录。这有点压倒性,也滥用了术语“类别”。一个类别是一个具体的分类,一种快速将东西与一组东西联系起来的方式。什么东西属于这么多组呢?举个例子,你的记录Cafe Ne。我假设在现实世界中 - 请记住,编程和应用程序在解决现实世界问题时的效果最好 - Cafe Ne可能是一家餐厅、咖啡厅、爵士酒吧、晚餐。除非在我不知道的语言中咖啡馆就是汽车,否则肯定不是车库。我几乎无法想象它是银行或牙科诊所。我必须真正努力才能找到10个以上有意义的类别,让用户通过它们搜索咖啡馆。

我的观点是,即使mongodb允许你设计这样的东西,也并不意味着你必须这样做。尝试缩小你拥有和寻找的类别数量,你将拥有更好的性能。


1
索引大小不是问题。对于现有文档,响应非常快。问题在于当它无法通过索引找到文档时,它会扫描数万个文档,而不是放弃并返回无结果。 - Eliezer
关于每个文档的类别数量,平均每个文档大约有1.1个类别,因此它不会使索引膨胀太多。当一个文档在查询列表中至少有一个类别时,“$in”条件为真。 - Eliezer
@Eliezer,如果你的搜索条件模糊,那么搜索索引的时间也会相应增加。你能否发布一下你的索引大小? - oz123
我在问题末尾添加了一个链接,指向geoData.stats()输出。"loc_2dsphere_categoriesIds_1_name_1"索引比"id"索引大约2.5倍。 - Eliezer
1
实际上,我进行了一些测试。使用 此 mgenerate 模板 生成的文档,平均索引条目大小为 3732 字节,总共达到惊人的 26.1GB 索引大小。我认为这解决了这个子问题。扩展 @Oz123 的观点,我猜工作集(也称为“现有”文档)可能包含大多数要返回的文档,而游标(按需加载文档)只是在后台读取其余文档。如果没有匹配项,则在找到第一个文档后,没有游标可以返回,因此我们需要进行一些磁盘操作。 - Markus W Mahlberg
@MarkusWMahlberg 看起来你的评论可以作为一个答案。 - Tarlog

2
正如JohnnyHK在评论中建议的,以及Oz123在他的回答中指出的,这里的问题似乎是一个索引已经变得过大,无法作为索引良好地执行。我认为除了已经指出的类别扩展问题之外,索引中字段的排序也会造成麻烦。复合索引是按字段顺序构建的,将name放在categoriesIds之后使得在name上查询更加昂贵。
很明显,你需要调整你的索引。如何调整取决于你期望支持的查询类型。特别是,我不确定你是否会从locname的复合索引中看到更好的性能,还是从单独的索引中获得更好的性能,一个用于loc,一个用于name。Mongo本身对于何时最好使用复合索引,何时最好使用单独的索引并依靠索引交集有些含糊不清。
我的直觉是,单独的索引会表现更好,但我会测试两种情况。如果你预计也需要按类别查询,而没有可以缩小查询范围的“名称”或“位置”字段,则最好创建一个单独的“categoriesIds”索引。

1

复合索引中字段的顺序非常重要。如果没有实际数据和使用模式的访问权限,很难进行诊断,但是该关键字可能会增加(或不增加)仅使用索引匹配文档的几率:

{
  "loc" : "2dsphere",
  "name" : 1,
  "categoriesIds" : 1
}

0
不确定是否完全相同,但我们遇到了一个类似的问题,涉及一个 多键索引 ,当没有结果时性能很差。
这实际上是一个Mongo bug,在v3.3.8中已得到修复。 https://jira.mongodb.org/browse/SERVER-15086 在升级Mongo并重建索引后,我们解决了问题。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接