ElasticSearch的post_filter和filtered聚合在行为上不同。

12

我花了整整一个星期尝试解决这个问题,但是毫无头绪。我正在遵循这篇(相当古老的)电商搜索和分面过滤的文章等,并且到目前为止效果很好(搜索结果很棒,当在查询中应用筛选器时,汇总功能也非常出色)。我正在使用ElasticSearch 6.1.1。

但是,因为我想允许用户在分面上执行多个选择,所以我将筛选器移到了post_filter部分。这仍然运行良好,能够正确地过滤结果,并在整个文档集合中准确显示聚合计数。

在阅读了StackOverflow上的这个问题之后,我意识到必须通过“过滤”聚合与“特殊”聚合进行一些疯狂的杂技,以相互修剪聚合,从而同时显示正确的计数并允许与它们一起使用多个过滤器。我在那个问题上请求了一些澄清,但迄今没有得到回应(那是一个旧问题)。

我一直在苦苦挣扎的问题是如何在嵌套字段上获取一组经过筛选的聚合,其中所有分面都与所有筛选器一起被筛选。

我的计划是使用通用的聚合(未过滤)并保持所选的分面聚合不过滤(以便我可以选择多个条目),但是将所有其他聚合与当前选择的分面一起进行筛选,以便我只能显示我仍然可以应用的过滤器。

然而,如果我对文档使用同样的过滤器(它们工作得很好),并将过滤器放入过滤聚合中,则它们不能按预期工作。计数都是错误的。我知道聚合在过滤器之前计算,这就是为什么我要在我想要的聚合上复制筛选器的原因。

以下是我的查询:

  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "some book"
          }
        }
      ]
    }
  }

这里没有什么特殊的,它表现良好并返回相关的结果。

这是我的过滤器(在post_filter中):

"post_filter" : {
    "bool" : {
      "must" : [
      {
        "nested": {
          "path": "string_facets",
            "query": {
              "bool" : {
                "filter" : 
                [
                  { "term" : { "string_facets.facet_name" : "Cover colour" } },
                  { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                ]
              }
            }
          }
        }

      ]
    }
  }

让我强调一下:这个很好用。我能看到正确的结果(在这种情况下,显示了13个匹配正确字段“封面颜色”=“绿色”的结果)。

这里是我的通用(未过滤的聚合)查询,返回所有产品的正确计数的外观特征:

    "agg_string_facets": {
  "nested": {
    "path": "string_facets"
  },
  "aggregations": {
      "facet_name": {
        "terms": {
          "field": "string_facets.facet_name"
        },
        "aggregations": {
          "facet_value": {
            "terms": {
              "field": "string_facets.facet_value"
            }
          }
        }
      }
  }
}

这也完美地运行了!我可以看到所有匹配我的查询的文档的准确分面计数的聚合。

现在,看看这个:我正在创建一个聚合,针对相同的嵌套字段进行过滤,以便我可以获取在我的过滤器中“幸存”的聚合和分面:

"agg_all_facets_filtered" : {

           "filter" : {
             "bool" : {
               "must" : [
                {
                   "nested": {
                     "path": "string_facets",
                     "query": {
                       "bool" : {
                         "filter" : [
                           { "term" : { "string_facets.facet_name" : "Cover colour" } },
                           { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                          ]
                       }
                    }
                  }
              }]
            }
        },
        "aggs" : {
         "agg_all_facets_filtered" : {
           "nested": { "path": "string_facets" },
           "aggregations": {
            "facet_name": {
              "terms": { "field": "string_facets.facet_name" },
              "aggregations": {
                    "facet_value": {
                      "terms": { "field": "string_facets.facet_value" }
                    }
                  }
                }
            }  
         }

       }

请注意,我在此聚合操作中使用的筛选器与首次筛选结果筛选器相同(文章中)。

但出于某种原因,返回的聚合结果全都是错误的,即facet计数。例如,在此处搜索中,我得到了13个结果,但从'agg_all_facets_filtered'返回的聚合结果只有一个计数:'覆盖颜色' = 4

{
  "key": "Cover colour",
  "doc_count": 4,
  "facet_value": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
          "key": "Green",
          "doc_count": 4
        }
    ]
  }
}

检查了为什么数字是4后,我发现其中3个文档两次包含了'Cover colour'这个facet:一次是'Green',一次是'Some other colours'... 因此我的聚合仅计算具有该facet名称的条目两次 - 或与其他文档共同拥有它。这就是为什么我认为我的聚合筛选器是错误的原因。我已经阅读了很多关于匹配/过滤器的AND vs OR的内容,我尝试了'Filter'、'Should'等方式,但没有解决问题。

抱歉这是一个长问题,但:

我应该如何编写聚合筛选器,以便返回正确计数的facet,考虑到我的筛选器单独运行时效果正常?

非常感谢大家。

更新:根据请求,以下是完整的查询(请注意post_filter中的过滤器以及过滤聚合中相同的过滤器):

{
  "size" : 0,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "bible"
          }
        }
      ]
    }
  },

  "post_filter" : {

    "bool" : {
      "must" : [
      {
        "nested": {
          "path": "string_facets",
            "query": {
              "bool" : {
                "filter" : 
                [
                  { "term" : { "string_facets.facet_name" : "Cover colour" } },
                  { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                ]
              }
            }
          }
        }

      ]
    }

  },

  "aggregations": {

        "agg_string_facets": {
      "nested": {
        "path": "string_facets"
      },
      "aggregations": {
          "facet_name": {
            "terms": {
              "field": "string_facets.facet_name"
            },
            "aggregations": {
              "facet_value": {
                "terms": {
                  "field": "string_facets.facet_value"
                }
              }
            }
          }
      }
    },

    "agg_all_facets_filtered" : {

           "filter" : {
             "bool" : {
               "must" : [
                {
                   "nested": {
                     "path": "string_facets",
                     "query": {
                       "bool" : {
                         "filter" : [
                           { "term" : { "string_facets.facet_name" : "Cover colour" } },
                           { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                          ]
                       }
                    }
                  }
              }]
            }
        },
        "aggs" : {
         "agg_all_facets_filtered" : {
           "nested": { "path": "string_facets" },
           "aggregations": {
            "facet_name": {
              "terms": { "field": "string_facets.facet_name" },
              "aggregations": {
                    "facet_value": {
                      "terms": { "field": "string_facets.facet_value" }
                    }
                  }
                }
            }  
         }

       }


    }

  }
}

返回的结果是正确的(就文档而言),这里是聚合结果(未经过滤,来自结果,针对 'agg_string_facets' - 注意 'Green' 显示了 13 个文档 - 这是正确的):

{
            "key": "Cover colour",
            "doc_count": 483,
            "facet_value": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 111,
              "buckets": [
                {
                  "key": "Black",
                  "doc_count": 87
                },
                {
                  "key": "Brown",
                  "doc_count": 75
                },
                {
                  "key": "Blue",
                  "doc_count": 45
                },
                {
                  "key": "Burgundy",
                  "doc_count": 43
                },
                {
                  "key": "Pink",
                  "doc_count": 30
                },
                {
                  "key": "Teal",
                  "doc_count": 27
                },
                {
                  "key": "Tan",
                  "doc_count": 20
                },
                {
                  "key": "White",
                  "doc_count": 18
                },
                {
                  "key": "Chocolate",
                  "doc_count": 14
                },
                {
                  "key": "Green",
                  "doc_count": 13
                }
              ]
            }
          }

这里是聚合结果(使用相同的过滤器,同时从 'agg_all_facets_filtered' 进行筛选),仅显示 'Green' 的4个结果:

{
              "key": "Cover colour",
              "doc_count": 4,
              "facet_value": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Green",
                    "doc_count": 4
                  }
                ]
              }
            }

更新2:以下是查询返回的一些示例文档:

"hits": {
    "total": 13,
    "max_score": 17.478987,
    "hits": [
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33107",
        "_score": 17.478987,
        "_source": {
          "type": "product",
          "document_id": 33107,
          "search_data": {
            "full_text": "hcsb compact ultrathin bible mint green leathertouch  holman bible staff leather binding 9781433617751 ",
            "full_text_boosted": "HCSB Compact Ultrathin Bible Mint Green Leathertouch Holman Bible Staff "
          },
          "search_result_data": {
            "name": "HCSB Compact Ultrathin Bible, Mint Green Leathertouch (Leather)",
            "preview_image": "/images/products/medium/0.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33107"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Compact"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Ultrathin"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "17240",
        "_score": 17.416323,
        "_source": {
          "type": "product",
          "document_id": 17240,
          "search_data": {
            "full_text": "kjv thinline bible compact  leather binding 9780310439189 ",
            "full_text_boosted": "KJV Thinline Bible Compact "
          },
          "search_result_data": {
            "name": "KJV Thinline Bible, Compact (Leather)",
            "preview_image": "/images/products/medium/17240.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17240"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Compact"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "KJV"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "17243",
        "_score": 17.416323,
        "_source": {
          "type": "product",
          "document_id": 17243,
          "search_data": {
            "full_text": "kjv busy mom's bible  leather binding 9780310439134 ",
            "full_text_boosted": "KJV Busy Mom'S Bible "
          },
          "search_result_data": {
            "name": "KJV Busy Mom's Bible (Leather)",
            "preview_image": "/images/products/medium/17243.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17243"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Pocket"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "KJV"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Pink"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33030",
        "_score": 15.674053,
        "_source": {
          "type": "product",
          "document_id": 33030,
          "search_data": {
            "full_text": "apologetics study bible for students grass green leathertou  mcdowell sean; holman bible s leather binding 9781433617720 ",
            "full_text_boosted": "Apologetics Study Bible For Students Grass Green Leathertou Mcdowell Sean; Holman Bible S"
          },
          "search_result_data": {
            "name": "Apologetics Study Bible For Students, Grass Green Leathertou (Leather)",
            "preview_image": "/images/products/medium/33030.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33030"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Students"
            },
            {
              "facet_name": "Bible feature",
              "facet_value": "Indexed"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33497",
        "_score": 15.674053,
        "_source": {
          "type": "product",
          "document_id": 33497,
          "search_data": {
            "full_text": "hcsb life essentials study bible brown / green  getz gene a.; holman bible st imitation leather 9781586400446 ",
            "full_text_boosted": "HCSB Life Essentials Study Bible Brown  Green Getz Gene A ; Holman Bible St"
          },
          "search_result_data": {
            "name": "HCSB Life Essentials Study Bible Brown / Green (Imitation Leather)",
            "preview_image": "/images/products/medium/33497.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33497"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Imitation Leather"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            },
            {
              "facet_name": "Binding",
              "facet_value": "Imitation leather"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Brown"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      }
}

你能否用示例文档更详细地解释当前结果与预期结果之间的区别? - Nishant
我期望你添加的是那13个文件中的一些文档。 - Nishant
抱歉,我以为你想看聚合样本。我已经添加了一些返回的文档。 - Cristian Cotovan
2
我使用你在问题中添加的五个文档创建了一个样本数据。我添加了一个不符合后置过滤器的文档。在嵌套聚合中,我得到了正确的结果,即 {"key":"Cover colour","doc_count":7,"facet_value":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":"Green","doc_count":5},{"key":"Brown","doc_count":1},{"key":"Pink","doc_count":1}]}} - Nishant
嗯,你用了我的查询?是不是我正在使用的ES版本有bug? - Cristian Cotovan
显示剩余4条评论
1个回答

4
神秘之谜得以解决!感谢您的贡献,事实证明我使用的版本(6.1.1)存在一个漏洞。我不知道具体是什么漏洞,但是我已经安装了ElasticSearch 6.5,重新索引了我的数据,并且没有更改查询或映射,一切都正常工作了!
现在,我不知道是否应该向ES提交错误报告,还是只是放任它,因为这是一个较旧的版本,而他们已经继续前进。

我正在解决同样的问题,我查看了您提到的Stack Overflow问题以及相关的Medium和其他文章,并阅读了您在发布这个问题(再次作为一篇文章)和在StackOverflow上的评论。很高兴看到您找到了解决这个问题的方法,并向我展示了开始解决我的问题的途径。 - Satyaaditya
你是如何获取当前选定的facet信息的?我检查了一些电子商务网站的请求,没有一个请求显示当前选定的facet,而是发送所有已选facet。我们如何识别哪个facet需要持久化而不使用过滤器? - Satyaaditya
不确定我是否理解您的问题。您在 post_filter 中运行筛选器 - 这仍将结果限制为所选的外观,但还会将所有外观返回给您。您需要自己保留所选的外观。 - Cristian Cotovan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接