MongoDB - 日期索引未被使用

Question

MongoDB - 日期索引未被使用

9

Collection events包含userId和事件数组--数组中的每个元素都是一个嵌入式文档。例如：

{
    "_id" : ObjectId("4f8f48cf5f0d23945a4068ca"),
    "events" : [
            {
                    "eventType" : "profile-updated",
                    "eventId" : "247266",
                    "eventDate" : ISODate("1938-04-27T23:05:51.451Z"),
            },
           {
                   "eventType" : "login",
                   "eventId" : "64531",
                   "eventDate" : ISODate("1948-05-15T23:11:37.413Z"),
           }
    ],
    "userId" : "junit-19568842",

使用以下查询语句查找最近30天内生成的事件：

db.events.find( { events : { $elemMatch: { "eventId" : 201, 
"eventDate" : {$gt : new Date(1231657163876) } } } }  ).explain()

查询计划显示，在测试数据包含较少事件（约20个）时，将使用“events.eventDate”上的索引：

{
    "cursor" : "BtreeCursor events.eventDate_1",
    "nscanned" : 0,
    "nscannedObjects" : 0,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
            "events.eventDate" : [
                    [
                            ISODate("2009-01-11T06:59:23.876Z"),
                            ISODate("292278995-01--2147483647T07:12:56.808Z")
                    ]
            ]
    }

然而，当事件数量很大（约为500）时，索引不会被使用:

{
    "cursor" : "BasicCursor",
    "nscanned" : 4,
    "nscannedObjects" : 4,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {

    }

当事件数量很多时为什么索引没有被使用？也许是因为当有大量事件时，MongoDB 发现全盘扫描所有项比使用索引更加高效？

- dsatish

你在抱怨优化器没有在返回0ms的查询中使用索引吗？ :) - Eve Freeman

上面的解释输出来自一个测试集合。对于大约20M个文档，查询大约需要8秒钟。 - dsatish

如果查询集合中的大部分文档，则类似这样的范围查询可能会很慢。您可以使用提示来强制索引进行速度比较，但我想它在执行索引扫描时也会同样慢。您应该发布一份包含提示和不包含提示的生产数据解释。问题是，如果找到了几百万个匹配的文档，检查它们需要一些时间。 - Eve Freeman

2个回答

2

使用$hint强制使用索引"events.eventDate"，与不使用索引相比，nscannedObjects更多。

使用索引时的伪代码：

for(all entries in index matching the criteria) {
  get user object and scan to see if the eventId criteria is met
}

所有符合条件的索引条目-->每个事件都是索引中的一条条目。因此，索引中的条目数量将多于用户数量。假设有4个用户对象和总共7个符合条件的事件，则用户对象被扫描了7次（for循环执行了7次）。当未扫描索引时，所有4个用户对象仅被检查一次。因此，使用索引，扫描用户对象的次数比不使用索引要多。这种理解正确吗？

db.events.find( { events : { $elemMatch: { "eventId" : 201, 
"eventDate" : {$gt : new Date(1231657163876) } } } }  )
._addSpecial("$hint",{"events.eventDate":1}).explain()

{
    "cursor" : "BasicCursor",
    "nscanned" : 7,
    "nscannedObjects" : 7,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {

}

- dsatish

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sergio Tulentsev · Accepted Answer

MongoDB的查询优化器工作方式有些特别。它不是计算某个查询计划的成本，而是启动所有可用的计划，然后选择哪个返回最快，认为这是最优的计划，并在将来使用。

随着应用程序的增长和数据的增长和变化，最优计划可能在某些时候变得不再优秀。因此，Mongo会定期重复查询选择过程。

看起来在这个具体案例中，基础扫描（basic scan）是最有效率的。

链接：http://www.mongodb.org/display/DOCS/Query+Optimizer