在elasticsearch范围聚合中，count和total_count有什么区别？

Question

在elasticsearch范围聚合中，count和total_count有什么区别？

5

我正在使用范围分面进行搜索：

{
"query": {
    "match_all": {}
},
"facets": {
    "prices": {
        "range": {
            "field": "product_price",
            "ranges": [
                {"from": 0, "to": 200},
                {"from": 200, "to": 400},
                {"from": 400, "to": 600},
                {"from": 600, "to": 800},
                {"from": 800}
            ]
        }
    }
}
}

我得到了预期的范围作为响应：

[
  {
    "from": 0.0,
    "to": 200.0,
    "count": 0,
    "total_count": 0,
    "total": 0.0,
    "mean": 0.0
  },
  {
    "from": 200.0,
    "to": 400.0,
    "count": 1,
    "min": 399.0,
    "max": 399.0,
    "total_count": 1,
    "total": 399.0,
    "mean": 399.0
  },
  {
    "from": 400.0,
    "to": 600.0,
    "count": 5,
    "min": 499.0,
    "max": 599.0,
    "total_count": 5,
    "total": 2886.0,
    "mean": 577.2
  },
  {
    "from": 600.0,
    "to": 800.0,
    "count": 3,
    "min": 690.0,
    "max": 790.0,
    "total_count": 3,
    "total": 2179.0,
    "mean": 726.3333333333334
  },
  {
    "from": 800.0,
    "count": 2,
    "min": 899.0,
    "max": 990.0,
    "total_count": 2,
    "total": 1889.0,
    "mean": 944.5
  }
]

在所有的响应中，count 和 total_count 是相同的。有人知道它们之间的区别吗？我应该使用哪一个？

- Lucas Cavalcanti

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- javanna · Accepted Answer

非常好的问题！这部分比较棘手，因为大多数时间您会看到相同的值，但是当您使用key_field和value_field时，可以基于一个字段和另一个字段的聚合数据（min，max，total_count，total和mean）计算范围。例如，您可以在流行度字段上计算范围，并查看价格字段上的聚合数据，以了解每个流行度范围内有哪种价格；也许人们喜欢便宜的产品，也可能不喜欢？

假设您的产品可以有多个价格，例如每个国家都有不同的价格... 这就是您的count与total_count不同的情况。让我们看一个例子。

现在我们索引了几个包含流行度字段和价格字段的文档，这些字段可以具有多个值：

{
  "popularity": 50,
  "price": [28,30,32]
}

和

{
    "popularity": 120,
    "price": [50,54]
}

现在让我们运行以下搜索请求，该请求使用流行度字段作为键和价格字段作为值构建范围分面：

{
    "query": {
        "match_all": {}
    },
    "facets": {
        "popularity_prices": {
            "range": {
                "key_field": "popularity",
                "value_field": "price",
                "ranges": [
                    {"to": 100},
                    {"from": 100}
                ]
            }
        }
    }
}

这是获得的面：

{
    "popularity_prices": {
      "_type": "range",
      "ranges": [
        {
          "to": 100,
          "count": 1,
          "min": 28,
          "max": 32,
          "total_count": 3,
          "total": 90,
          "mean": 30
        },
        {
          "from": 100,
          "count": 1,
          "min": 50,
          "max": 54,
          "total_count": 2,
          "total": 104,
          "mean": 52
        }
      ]
    }
}

现在应该更清楚了，total_count是什么。它与value_field（价格）有关：3个不同的价格值落入第一个范围，但它们来自同一篇文档。另一方面，count是落入该范围的文档数量。

既然我们也理解了count是关于文档的，而total_count是关于字段值的，如果该字段包含多个值，我们会期望在普通的范围分面中看到相同的行为……对吗？不幸的是目前并没有发生这种情况，范围分面仅考虑每个字段的第一个值。不确定这是否是一个错误。因此，count和total_count始终相同。