MongoDB单次操作中多个聚合操作

Question

MongoDB单次操作中多个聚合操作

12

我有一个项集合，其中包含以下文档。

{ "item" : "i1", "category" : "c1", "brand" : "b1" }  
{ "item" : "i2", "category" : "c2", "brand" : "b1" }  
{ "item" : "i3", "category" : "c1", "brand" : "b2" }  
{ "item" : "i4", "category" : "c2", "brand" : "b1" }  
{ "item" : "i5", "category" : "c1", "brand" : "b2" }

我想按类别和品牌分离聚合结果-->数量。请注意，它不是按（类别、品牌）计数。

我能够使用以下代码使用MapReduce实现此操作。

map = function(){
    emit({type:"category",category:this.category},1);
    emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
    return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})

结果是：

{
        "results" : [
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b2"
                        },
                        "value" : 2
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c2"
                        },
                        "value" : 2
                }
        ],
        "timeMillis" : 21,
        "counts" : {
                "input" : 5,
                "emit" : 10,
                "reduce" : 4,
                "output" : 4
        },
        "ok" : 1,
}

我可以通过以下两种不同的聚合命令获得相同的结果。

db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})

有没有办法使用聚合框架来实现单个聚合命令进行相同的操作？

我在这里简化了我的情况，但实际上我需要对子文档数组中的字段进行分组。假设上面的结构是我执行展开后的结果。

这是一个实时查询（有人等待响应），尽管数据集较小，但执行时间很重要。

我正在使用MongoDB 2.4。

- Poorna Subhash

2个回答

7

在处理大型数据集时，我认为您当前的MapReduce方法是最好的选择，因为聚合技术在大型数据上无法很好地工作。但是，在相对较小的数据规模下，这种方法可能正是您需要的。

db.items.aggregate([
    { "$group": {
        "_id": null,
        "categories": { "$push": "$category" },
        "brands": { "$push": "$brand" }
    }},
    { "$project": {
        "_id": {
            "categories": "$categories",
            "brands": "$brands"
        },
        "categories": 1
    }},
    { "$unwind": "$categories" },
    { "$group": {
        "_id": {
            "brands": "$_id.brands",
            "category": "$categories"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.brands",
        "categories": { "$push": {
            "category": "$_id.category",
            "count": "$count"
        }},
    }},
    { "$project": {
        "_id": "$categories",
        "brands": "$_id"
    }},
    { "$unwind": "$brands" },
    { "$group": {
        "_id": {
            "categories": "$_id",
            "brand": "$brands"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "categories": { "$first": "$_id.categories" },
        "brands": { "$push": {
            "brand": "$_id.brand",
            "count": "$count"
        }}
    }}
])

这并不完全与mapReduce的输出相同，您可以添加更多阶段来更改输出格式，但这应该是可用的：

{
    "_id" : null,
    "categories" : [
            {
                    "category" : "c2",
                    "count" : 2
            },
            {
                    "category" : "c1",
                    "count" : 3
            }
    ],
    "brands" : [
            {
                    "brand" : "b2",
                    "count" : 2
            },
            {
                    "brand" : "b1",
                    "count" : 3
            }
    ]
}

如您所见，这需要在数组之间进行相当多的洗牌，以便在同一管道过程中对每组“类别”或“品牌”进行分组。再次强调，对于大量数据，这样做效果不佳，但对于像“订单中的商品”这样的数据，它可能会很好地完成任务。

当然，正如您所说，您已经简化了一些内容，因此第一个分组键为null，要么会成为其他内容，要么会被缩小到通过早期的$match阶段来处理null情况，这可能是您想要做的。

- Neil Lunn

太好了！从理论上讲是可行的！但是9个管道-不直观且难以管理。这就像多次执行自连接，需要大量内存和处理器资源。快速测量表明，它比调用两次聚合要慢3倍。对于我的情况来说不是正确的选择，因为我的用例需要在给定时间范围内跨订单执行此操作，并计算数量、价格总和等。 - Poorna Subhash

@Poorna 是的，可能是这样，但我在开头加了免责声明，主要问题始终在于大小，大数组是一个很大的性能问题。但我也注意到，做任何超出你实际要求的事情并不是你的问题，对吧？所以，如果你想要一个真正解决你实际问题的解决方案，最好发布一个实际呈现该问题的问题。 - Neil Lunn

我喜欢你的解决方案，在发布我的问题之前，我想不到更接近它的任何东西。我只是在解释为什么它不适合我的情况。 - Poorna Subhash

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Xavier Guihot · Accepted Answer

从 Mongo 3.4 开始，$facet 聚合阶段极大地简化了这种情况的处理，它可以在同一组输入文档上处理多个聚合管道：

// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
  { $facet: {
      categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
      brands:     [{ $group: { _id: "$brand",    count: { "$sum": 1 } } }]
  }}
)
// {
//   "categories" : [
//     { "_id" : "c1", "count" : 3 },
//     { "_id" : "c2", "count" : 2 }
//   ],
//   "brands" : [
//     { "_id" : "b1", "count" : 3 },
//     { "_id" : "b2", "count" : 2 }
//   ]
// }