Elasticsearch滚动扫描查询没有返回所有文档,缺失第一组。

3
我将尝试滚动ES索引并获取所有文档,但似乎我一直错过了初始滚动返回的第一组文档。例如,如果我的滚动大小为10,我的查询返回总共100个文档,那么在滚动后,我只有90个文档。你有什么建议吗?
以下是我目前尝试过的方法:
$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';

$params = [
    "scroll" => "1m",
    "size" => 50,
    "index" => "myindex",
    "type" => "mytype",
    "body" => $json 
];

$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];

print " total results:   " . $scroll_size;

//scroll
$count = 0;
while ($scroll_size > 0) {
    print "  SCROLLING...";
    $scroll_results = $client->scroll([
        'scroll_id' => $s_id,
        'scroll' => '1m'
    ]);

    // get number of results returned in the last scroll
    $scroll_size = sizeof($scroll_results['hits']['hits']);
    print "  scroll size: " . $scroll_size;

    // do something with results
    for ($i=0; $i<$scroll_size; $i++) {
        $count++;
    }
}
print " total id count: " . $id_count;
2个回答

3

第一个查询用于返回文档数量,同时也返回文档。第一个查询用于建立滚动并获取第一组文档。一旦处理完第一组结果,您可以使用滚动ID获取下一页,以此类推。


0

感谢@Ramdev。是的,我在深入了解后意识到了这一点。对于其他人,以下是最终适用于我的内容:

$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';
$count = 0;
$params = [
    "scroll" => "1m",
    "size" => 50,
    "index" => "myindex",
    "type" => "mytype",
    "body" => $json 
];

$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];

print " total results:   " . $scroll_size;

// first set of scroll results
for ($i=0; $i<$size; $i++) {
    $count++;
}
//scroll
while ($scroll_size > 0) {
    print "  SCROLLING...";
    $scroll_results = $client->scroll([
        'scroll_id' => $s_id,
        'scroll' => '1m'
    ]);

    // get number of results returned in the last scroll
    $scroll_size = sizeof($scroll_results['hits']['hits']);
    print "  scroll size: " . $scroll_size;

    // do something with results
    for ($i=0; $i<$scroll_size; $i++) {
        $count++;
    }
}
print " total id count: " . $id_count;

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接