elasticsearch-dsl聚合仅返回10个结果。如何更改此设置？

Question

elasticsearch-dsl聚合仅返回10个结果。如何更改此设置？

11

我正在使用elasticsearch-dsl Python库连接到elasticsearch并执行聚合操作。

我遵循以下代码。

search.aggs.bucket('per_date', 'terms', field='date')\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()

这个方法可以正常工作，但是只返回了response.aggregations.per_ts.buckets的10个结果。

我想要所有的结果。

我尝试了一个解决方案，使用了size=0，就像这个问题中提到的那样。

search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})

response = search.execute()

但是这会导致错误

TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')

- hard coder

1

你有解决方案吗？我遇到了同样的问题。 - Gaurav Singhal

我也遇到了同样的问题。 - Soony

3个回答

1

这是一个较旧的问题，但我遇到了相同的问题。我想要的基本上是一个迭代器，可以用来遍历我获得的所有聚合（我还有很多唯一的结果）。

我发现最好的方法是创建一个像这样的Python生成器。

def scan_aggregation_results():
    i=0
    partitions=20
    while i < partitions:
        s = Search(using=elastic, index='my_index').extra(size=0)
        agg = A('terms', field='my_field.keyword', size=999999,
                include={"partition": i, "num_partitions": partitions})
        s.aggs.bucket('my_agg', agg)
        result = s.execute()

        for item in result.aggregations.my_agg.buckets:
            yield my_field.key
        i = i + 1

# in other parts of the code just do
for item in scan_aggregation_results():
    print(item)  # or do whatever you want with it

这里的魔法在于elastic会自动将结果数量分成20个分区，即我定义的分区数。我只需要将大小设置为足够大以容纳单个分区，例如此处结果最多可以有2000万项（或20*999999）。如果您要返回的项较少，比如像我一样只有20000个，则每个查询桶中只有1000个结果，而不管您定义了多大的大小。

使用上述生成器构造，您甚至可以摆脱它并创建自己的扫描器，逐个迭代所有结果，正是我想要的。

- Peter Kunszt

-2

你应该阅读文档。

所以在你的情况下，应该像这样：

search.aggs.bucket('per_date', 'terms', field='date')\
            .bucket('response_time_percentile', 'percentiles', field='total_time',
                    percents=percentiles, hdr={"number_of_significant_value_digits": 1})[0:50]
response = search.execute()

- fmdaboville

我尝试过这个，但是出现了错误：percents=percentiles, hdr={"number_of_significant_value_digits": 1})[0:20] TypeError: 'Percentiles'对象没有'__ getitem__'属性。 - hard coder

你的搜索初始化是这样的吗：search = Search()？ - fmdaboville

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Soony · Accepted Answer

我遇到过同样的问题。最终我找到了这个解决方案：

s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()

在2.x版本之后，所有桶结果的size=0将不再起作用，请参考此线程。在我的示例中，我只需将大小设置为999999。您可以根据自己的情况选择一个大数字。

建议明确设置合理的size值，介于1到2147483647之间。

希望这可以帮助到您。