无法在MongoDB中创建索引，“键太大而无法索引”

Question

无法在MongoDB中创建索引，“键太大而无法索引”

33

我正在MongoDB中创建一个包含1000万条记录的索引，但是遇到以下错误：

db.logcollection.ensureIndex({"Module":1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 3,
        "ok" : 0,
        "errmsg" : "Btree::insert: key too large to index, failing play.logcollection.$Module_1 1100 { : \"RezGainUISystem.Net.WebException: The request was aborted: The request was canceled.\r\n   at System.Net.ConnectStream.InternalWrite(Boolean async, Byte...\" }",
        "code" : 17282
}

请帮我学习如何在MongoDB中创建索引。

- Sandeep.maurya

1

你试图删除“模块”的索引。我认为你的内容太大了，普通索引无法处理。 - NHK

这也可能是由于在同一字段上同时拥有文本索引和标准索引所致。通过删除其中一个，您可能能够解决此问题。 - Gabriel Fair

5个回答

25

您可以通过使用以下命令启动mongod实例来关闭此行为：

mongod --setParameter failIndexKeyTooLong=false

或者通过在mongoShell中执行以下命令

db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

如果您确定您的字段很少超出限制，那么解决这个问题的一种方法是通过按字节长度<1KB将引起索引超出限制的字段分成几部分，例如对于字段val，我会将其拆分为元组val_1、val_2等多个字段。Mongo将文本存储为utf-8有效值，这意味着您需要一个能够正确拆分utf-8字符串的函数。

   def split_utf8(s, n):
    """
    (ord(s[k]) & 0xc0) == 0x80 - checks whether it is continuation byte (actual part of the string) or jsut header indicates how many bytes there are in multi-byte sequence

    An interesting aside by the way. You can classify bytes in a UTF-8 stream as follows:

    With the high bit set to 0, it's a single byte value.
    With the two high bits set to 10, it's a continuation byte.
    Otherwise, it's the first byte of a multi-byte sequence and the number of leading 1 bits indicates how many bytes there are in total for this sequence (110... means two bytes, 1110... means three bytes, etc).
    """
    s = s.encode('utf-8')
    while len(s) > n:
        k = n
        while (ord(s[k]) & 0xc0) == 0x80:
            k -= 1
        yield s[:k]
        s = s[k:]
    yield s

然后您可以定义复合索引：

db.coll.ensureIndex({val_1: 1, val_2: 1, ...}, {background: true})

或每个 val_i 多个索引：

db.coll.ensureIndex({val_1: 1}, {background: true})
db.coll.ensureIndex({val_1: 2}, {background: true})
...
db.coll.ensureIndex({val_1: i}, {background: true})

重要提示：如果您考虑将字段用于复合索引，请注意 split_utf8 函数的第二个参数。对于每个文档，您需要移除组成索引键的每个字段值的字节数总和，例如对于索引（a:1，b:1，val:1），1024-sizeof(value(a))-sizeof(value(b))。

在任何其他情况下，请使用 hash 或 text 索引。

- Rustem K

1

创建一个复合索引不起作用，因为1024大小限制适用于整个索引键的大小，而不是其中每个字段的大小。 - JohnnyHK

@JohnnyHK 你是对的。请看“重要”说明。我进行了修改。 - Rustem K

1

在我的项目中，我有4-5维的索引，这种方法非常有效 :) - Rustem K

大家让我们把它保持为必备答案吧？ - Rustem K

12

正如其他人在答案中指出的那样，错误key too large to index 意味着您正在尝试在长度超过1024字节的字段或字段上创建索引。

按ASCII计算，1024字节通常相当于大约1024个字符的长度。

对于此问题， MongoDB设置了内在限制，因此没有解决方案，如MongoDB Limits and Thresholds page所述：

索引条目的总大小（可以包括取决于BSON类型的结构开销）必须小于1024字节。

打开failIndexKeyTooLong错误也不是解决方案，如服务器参数手册页所述：

......这些操作将成功地插入或修改文档，但索引或索引不包括对文档的引用。

该句话的意思是有错的文档将不会被包括在索引中，可能会缺失查询结果。

例如：

> db.test.insert({_id: 0, a: "abc"})

> db.test.insert({_id: 1, a: "def"})

> db.test.insert({_id: 2, a: <string more than 1024 characters long>})

> db.adminCommand( { setParameter: 1, failIndexKeyTooLong: false } )

> db.test.find()
{"_id": 0, "a": "abc"}
{"_id": 1, "a": "def"}
{"_id": 2, "a": <string more than 1024 characters long>}
Fetched 3 record(s) in 2ms

> db.test.find({a: {$ne: "abc"}})
{"_id": 1, "a": "def"}
Fetched 1 record(s) in 1ms

通过强制MongoDB忽略failIndexKeyTooLong错误，最后一个查询不包含有问题的文档（即结果中缺少_id: 2的文档），因此该查询导致了错误的结果集。

- kevinadi

6

当遇到 "index key limit" 时，解决方案取决于模式的需求。在极少数情况下，对大于1024字节的值进行键匹配是一种设计要求。事实上，几乎所有数据库都会施加索引键限制，但通常在传统关系型数据库（Oracle/MySQL/PostgreSQL）中可以配置，因此您可以轻松地自我瞄准脚部。

为了快速搜索，“文本”索引旨在优化长文本字段的搜索和模式匹配，并且非常适合使用情况。然而，更常见的是，对长文本值的唯一性约束是必需的。而“文本”索引的行为不像设置了唯一标志的唯一标量值那样（更像字段中所有文本字符串的数组）。

受MongoDb的GridFS启发，可以通过向文档添加“md5”字段并在其上创建唯一标量索引来轻松实现唯一性检查。有点像自定义唯一散列索引。这允许具有几乎无限（~16mb）文本字段长度的索引进行搜索，并在整个集合中保持唯一。

const md5 = require('md5');
const mongoose = require('mongoose');

let Schema = new mongoose.Schema({
  text: {
    type: String,
    required: true,
    trim: true,
    set: function(v) {
        this.md5 = md5(v);
        return v;
    }
  },
  md5: {
    type: String,
    required: true,
    trim: true
  }
});

Schema.index({ md5: 1 }, { unique: true });
Schema.index({ text: "text" }, { background: true });

- JoelABair

1

在我的情况下，我试图对一个大的子文档数组建立索引，当我查看我的查询时，发现实际上是针对子属性的子属性进行查询，因此我将索引更改为专注于该子子属性，然后它正常工作了。

在我的情况下，`goals` 是大的子文档数组，失败的“键太大”的索引如下：`{"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}`，查询如下：

emailsDisabled: {$ne: true},
priorityEmailsDisabled: {$ne: true},
goals: {
  $elemMatch: {
    "topPriority.ymd": ymd,
  }
}

我将索引更改为 {"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1} 之后，它可以正常工作。

明确一下，我确定的是它允许我创建索引。是否该索引适用于该查询是一个不同的问题，我还没有回答。

- MalcolmOcean

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- anhlc · Accepted Answer

MongoDB如果已有文档的索引条目超过索引键限制（1024字节），则不会在集合上创建索引。但是，您可以选择创建哈希索引或文本索引：

db.logcollection.createIndex({"Module":"hashed"})

或者

db.logcollection.createIndex({"Module":"text"})