嵌套文档/数组字段的平均值计算

7
我想计算这个对象中数组评分(ratings)内评分字段(rating)的平均值(rating_average)。你能帮助我理解如何使用聚合($avg)吗?
{
    "title": "The Hobbit",
    "rating_average": "???",
    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
}
3个回答

12

MongoDB 3.4及更高版本中的聚合框架提供了强大的$reduce运算符,可高效地计算总数,无需额外的管道。考虑将其作为表达式返回总评分,并使用$size获取评分数量。结合$addFields,可以使用算术运算符$divide计算平均值,公式如下:average = 总评分/评分数量

db.collection.aggregate([
    { 
        "$addFields": { 
            "rating_average": {
                "$divide": [
                    { // expression returns total
                        "$reduce": {
                            "input": "$ratings",
                            "initialValue": 0,
                            "in": { "$add": ["$$value", "$$this.rating"] }
                        }
                    },
                    { // expression returns ratings count
                        "$cond": [
                            { "$ne": [ { "$size": "$ratings" }, 0 ] },
                            { "$size": "$ratings" }, 
                            1
                        ]
                    }
                ]
            }
        }
    }           
])

示例输出

{
    "_id" : ObjectId("58ab48556da32ab5198623f4"),
    "title" : "The Hobbit",
    "ratings" : [ 
        {
            "title" : "best book ever",
            "rating" : 5.0
        }, 
        {
            "title" : "good book",
            "rating" : 3.5
        }
    ],
    "rating_average" : 4.25
}

使用旧版本,您需要首先在初始聚合管道步骤上对ratings数组字段应用$unwind运算符。这将从输入文档中解构ratings数组字段,以输出每个元素的文档。每个输出文档都将数组替换为一个元素值。
第二个流水线阶段将是 $group 运算符,它通过 _idtitle 键标识表达式分组输入文档,并对每个组应用所需的 $avg 累加器表达式计算平均值。还有另一个累加器运算符 $push 通过返回应用于上述组中每个文档的表达式得到的所有值的数组来保留原始评分数组字段。
最后一个流水线步骤是 $project 运算符,然后重新塑造流中的每个文档,例如添加新的字段 ratings_average
因此,如果您的集合中有一个示例文档(如上下文中所示):
db.collection.insert({
    "title": "The Hobbit",

    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
})

为了计算评分数组的平均值并将该值投射到另一个字段ratings_average中,您可以应用以下聚合管道:
db.collection.aggregate([
    {
        "$unwind": "$ratings"
    },
    {
        "$group": {
            "_id": {
                "_id": "$_id",
                "title": "$title"
            },
            "ratings":{
                "$push": "$ratings"
            },
            "ratings_average": {
                "$avg": "$ratings.rating"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "title": "$_id.title",
            "ratings_average": 1,
            "ratings": 1
        }
    }
])

结果:

/* 1 */
{
    "result" : [ 
        {
            "ratings" : [ 
                {
                    "title" : "best book ever",
                    "rating" : 5
                }, 
                {
                    "title" : "good book",
                    "rating" : 3.5
                }
            ],
            "ratings_average" : 4.25,
            "title" : "The Hobbit"
        }
    ],
    "ok" : 1
}

7

这段内容其实可以更简洁,即使当时编写时也是如此。如果你需要一个“平均值”,只需使用$avg

db.collection.aggregate([
  { "$addFields": {
    "rating_average": { "$avg": "$ratings.rating" }
  }}
])

这是因为从MongoDB 3.2开始,$avg操作符增加了两个功能:

  1. The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group

  2. Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:

    { "array": [ "$fielda", "$fieldb" ] }
    

    or in notating a single property from the array as an array of the values of that property:

    { "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
    
在早期版本中,您必须使用$map来访问每个数组元素内部的"rating"属性。现在不需要了。
值得一提的是,甚至$reduce的使用也可以简化:
db.collection.aggregate([
  { "$addFields": {
    "rating_average": {
      "$reduce": {
        "input": "$ratings",
        "initialValue": 0,
        "in": {
          "$add": [ 
            "$$value",
            { "$divide": [ 
              "$$this.rating", 
              { "$size": { "$ifNull": [ "$ratings", [] ] } }
            ]}
          ]
        }
      }
    }
  }}
])

是的,正如所述,这只是重新实现现有的$avg功能,因此既然该运算符可用,那么应该使用它。


2

由于你的待计算平均值数据在数组中,首先需要将其展开。使用聚合管道中的$unwind来完成:

{$unwind: "$ratings"}

接下来,您可以将数组的每个元素作为嵌入式文档访问,其中在聚合结果文档中使用键ratings。然后,您只需要按title进行$group并计算$avg

{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}

然后只需恢复您的title字段:
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}

以下是您的结果聚合管道:

db.yourCollection.aggregate([
                               {$unwind: "$ratings"}, 
                               {$group: {_id: "$title", 
                                         ratings: {$push: "$ratings"}, 
                                         average: {$avg: "$ratings.rating"}
                                        }
                               },
                               {$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
                            ])

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接