在 MongoDB 结果中以块为单位迭代

Question

在 MongoDB 结果中以块为单位迭代

c#mongodb

3

我有一个MongoDB查询，可以提取大约50,000个大型文档。
这对我的RAM来说太多了，因此电脑变慢了。

现在我想要遍历MongoDB结果的分块。
我想先获取前1000个文档并处理它们，然后再获取下一个1000个文档。

这是处理大量数据的最佳方式吗？还是我应该一个接一个地处理？

我尝试了以下方法：

MongoCursor<MyDocument> items = collection.Find(query);
long count = items.Count();
int stepSize = 1000;

for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = items.SetSkip(i * stepSize).SetLimit(stepSize).ToList();
    // process the 1000 ...
}

但是这段代码没有起作用，我收到了以下错误：

一旦MongoCursor对象被冻结，就无法修改。

如何解决这个问题？

- Sammy

2个回答

3

感谢profesor79的提示。

现在我能够按块获取它，所以我进行了一项测量，结果出乎意料。

结果集共有39,500个文档

第一个测试：每次处理1000个文档：

int stepSize = 1000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = collection.Find(query).SetSkip(i * stepSize).SetLimit(stepSize).ToList();
}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 1000 --> " + new TimeSpan(end - start));
// Step 1000 --> 00:00:41.1731168

第二次测试：步长为2000个文档：

int stepSize = 2000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = collection.Find(query).SetSkip(i * stepSize).SetLimit(stepSize).ToList();
}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 2000 --> " + new TimeSpan(end - start));
// Step 2000 --> 00:00:42.1772173

第三个测试：每次处理5000篇文档的步长：

int stepSize = 5000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = collection.Find(query).SetSkip(i * stepSize).SetLimit(stepSize).ToList();
}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 5000 --> " + new TimeSpan(end - start));
// Step 5000 --> 00:00:40.9530949

上次测试：步长为1的文档。

long start = DateTime.Now.Ticks;
foreach (MyDocument item in collection.Find(query))
{

}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 1 --> " + new TimeSpan(end - start));
// Step 1 --> 00:00:39.6329629

所以似乎使用块没有意义。
一切都大致相同快速。
再次运行测试时，我收到了类似的时间。

唯一使用大量RAM的是这样做：

collection.Find(query)).ToList();

ToList已经将所有内容缓存在RAM中。

希望这能帮助其他人。

- Sammy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- profesor79 · Accepted Answer

你可以使用这种方法

        var count = collection.Find(new BsonDocument()).Count();
        var stepSize = 1000;

        for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
        {
            // put your query here      \/\/\/\/\/\/
            var list = collection.Find(new BsonDocument()).Skip(i * stepSize).Limit(stepSize).ToList();
            // process the 1000 ...
        }

由于您对获取和处理2.2.4版驱动程序感兴趣