在mongodb中查询大型列表的最快方法

Question

在mongodb中查询大型列表的最快方法

6

我希望从mongodb中获取大量用户的详细信息。用户列表超过10万个。

由于mongodb不支持一次查询非常庞大的数据，因此我想知道最佳方法是什么来获取这些数据。

将列表分成几组并获取数据

groups_of_list contains list of userId with bunches of 10000
for group in groups_of_list:
    curr_data = db.collection.find({'userId': {'$in': group}})
    data.append(curr_data)

循环遍历集合

for doc in db.collection.find({}):
   if i['userId'] in set_of_userIds:
       data.append(doc)

我想获取最快的方法。如果有更好的方法，请指出。

- Dheeraj Pande

你能否添加一些关于db.collection结构的细节？这样会更容易提供帮助。 - learn2day

2个回答

3

您可以使用带有固定限制的光标，并使用光标迭代结果。您可以在此处找到更多信息 - https://docs.mongodb.com/v3.2/tutorial/iterate-a-cursor/ 但实际的代码实现取决于您使用的语言。例如，如果是Spring、Java应用程序，则可以使用Pageable请求，类似于：

Pageable pageable = new PageRequest(0, 50);
Query query = new Query();
query.with(pageable);

mongoTemplate.find(query, User.class);

//get the next page 
pageable = pageable.next();

请记住，如果您在迭代过程中更新数据，则可能会导致不一致的结果。因此，在这种情况下，您必须使用快照查询。https://docs.mongodb.com/manual/reference/method/cursor.snapshot/

希望对您有所帮助！

- Puran

谢谢Puran的帮助。 - Dheeraj Pande

如果您认为回答有帮助，应该接受它 :) - Puran

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- learn2day · Accepted Answer

依我之见，你应该按照你提到的方法1分成“适当大小”的块进行分离，这并不是为了避免Mongo的限制，而是为了避免你自己机器内存的限制。

可能应该像这样：

def get_user_slice_data(groups_of_list):
    for group in groups_of_list:
        yield list(db.collection.find({'userId': {'$in': group}}))

这个生成器函数可以像这样使用：

for use_slice_data in get_user_slice_data(groups_of_list):
    # do stuff

通过这样做，您既可以避免在内存中拥有大量数据，也可以减小Mongo事务的大小。

提示：您可能需要先在“userId”上添加索引，例如：

db.collection.ensure_index('userId')