PostgreSQL未使用btree_gist索引。

Question

PostgreSQL未使用btree_gist索引。

3

我有一个带有主键和btree_gist索引的巨大表格。当我查询btree_gist索引中的列时，我会期望使用该索引并且查询执行速度很快。然而，优化器总是在主键和过滤器上执行索引扫描。

示例：

create table test1 (
    id1 bigint not null,
    id2 bigint not null,
    validtime tstzrange not null,
    data float);
alter table test1 add constraint pk_test1 primary key (id1, id2, validtime);
alter table test1 add constraint ex_test1_validtime exclude using gist (id1 with =, id2 with =, validtime with &&);

这张表包含大约12亿行数据，我想查询的结果只返回几百行，但是所需的时间很长：

select * from test1 where id1=1 and id2=1 and validtime && '[2020-01-01,2020-02-01)';
(about 3s)

查询计划：

explain select * from test1 where id1=1 and id2=1 and validtime && '[2020-01-01,2020-02-01)';
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Index Scan using pk_test1 on test1  (cost=0.70..24.68 rows=1 width=46)
   Index Cond: ((id1 = 1) AND (id2 = 1))
   Filter: (validtime && '["2020-01-01 00:00:00+00","2020-02-01 00:00:00+00")'::tstzrange)

性能不佳的原因显然是在时间条件下读取和过滤了成千上万行。

我想知道为什么Postgres不使用btree_gist。

我有另一个略有不同的表，在这个表中，btree_gist被使用，但方式与我的预期非常不同。该表有大约1.6亿行。

create table test2 (
    id1 bigint not null,
    validtime tstzrange not null);                                                                                                                                                                                                                                                                                                                                              
alter table test2 add constraint pk_test2 primary key (id1, validtime);
alter table test2 add constraint ex_test2_validtime exclude using gist (id1 with =, validtime with &&);

在这里，执行计划看起来是这样的：

select * from test2 where id1=1 and validtime && '[2020-01-01,2020-02-01)';
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=1933.19..1937.20 rows=1 width=62)
   Recheck Cond: ((id1 = 1) AND (validtime && '["2020-01-01 00:00:00+00","2020-02-01 00:00:00+00")'::tstzrange))
   ->  BitmapAnd  (cost=1933.19..1933.19 rows=1 width=0)
         ->  Bitmap Index Scan on pk_test2  (cost=0.00..574.20 rows=11417 width=0)
               Index Cond: (id1 = 1)
         ->  Bitmap Index Scan on ex_test2_validtime  (cost=0.00..1358.74 rows=17019 width=0)
               Index Cond: (validtime && '["2020-01-01 00:00:00+00","2020-02-01 00:00:00+00")'::tstzrange)

为什么需要两个位图索引扫描？使用btree_gist索引一次性扫描不行吗？

- paul

2

请发布 EXPLAIN (ANALYZE, BUFFERS) 输出。关于您的第一个查询：如果您删除另一个索引，是否使用所需的索引？ - Laurenz Albe

很不幸，我目前无法对test1进行太多测试，因为我正在构建一个新索引并且该表已被锁定。我创建了另一个没有数据的测试表来尝试没有主键的情况。查询计划显示btree_gist索引上的位图扫描，但仅使用时间条件，然后在表上进行位图堆扫描，并过滤id1和id2条件。我猜在真正的十亿行表上这将是一场灾难（虽然比全表扫描要好）。我不明白的是为什么id1和id2条件不进入btree_gist索引扫描。我想我漏掉了什么。 - paul

2个回答

2

你的答案是正确的，但我想添加一些背景说明为什么会发生这种情况。

在PostgreSQL中，索引只支持属于其运算符类别的运算符。对于bigint上的GiST索引，它所属的运算符类别是operator family。

SELECT ao.amoplefttype::regtype,
       op.oprname,
       ao.amoprighttype::regtype
FROM pg_opfamily AS of
   JOIN pg_am AS am ON of.opfmethod = am.oid
   JOIN pg_amop AS ao ON of.oid = ao.amopfamily
   JOIN pg_operator AS op ON ao.amopopr = op.oid
WHERE am.amname = 'gist'
  AND ao.amoplefttype = 'bigint'::regtype;

 amoplefttype │ oprname │ amoprighttype 
══════════════╪═════════╪═══════════════
 bigint       │ <       │ bigint
 bigint       │ <=      │ bigint
 bigint       │ =       │ bigint
 bigint       │ >=      │ bigint
 bigint       │ >       │ bigint
 bigint       │ <>      │ bigint
 bigint       │ <->     │ bigint
(7 rows)

这就解释了为什么你必须将索引转换为bigint才能使用。

如果你习惯于PostgreSQL，这可能会让你感到惊讶，因为在B-tree索引中，PostgreSQL不需要进行此类转换。解释是btree的操作符族具有更多的操作符：

SELECT ao.amoplefttype::regtype,
       op.oprname,
       ao.amoprighttype::regtype
FROM pg_opfamily AS of
   JOIN pg_am AS am ON of.opfmethod = am.oid
   JOIN pg_amop AS ao ON of.oid = ao.amopfamily
   JOIN pg_operator AS op ON ao.amopopr = op.oid
WHERE am.amname = 'btree'
  AND ao.amoplefttype = 'bigint'::regtype;

 amoplefttype │ oprname │ amoprighttype 
══════════════╪═════════╪═══════════════
 bigint       │ <       │ bigint
 bigint       │ <=      │ bigint
 bigint       │ =       │ bigint
 bigint       │ >=      │ bigint
 bigint       │ >       │ bigint
 bigint       │ <       │ smallint
 bigint       │ <=      │ smallint
 bigint       │ =       │ smallint
 bigint       │ >=      │ smallint
 bigint       │ >       │ smallint
 bigint       │ <       │ integer
 bigint       │ <=      │ integer
 bigint       │ =       │ integer
 bigint       │ >=      │ integer
 bigint       │ >       │ integer
(15 rows)

其中之一是bigint和integer之间的相等比较。

如果您使用>=和<编写条件而不是&&，则可以使用常规B树索引来支持查询，这将使强制转换变得不必要，但当然，如果已经存在排除约束的索引，则不想创建第二个索引。

- Laurenz Albe

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- paul · Accepted Answer

终于找到了:

索引没有被使用是因为查询与索引之间存在类型不匹配。实际上，这个问题已经在很多地方提到过，但是我只是看过而已。

1 明显不是 bigint 类型！有趣的是，对于使用 btree 主键，转换会自动发生，但对于 btree_gist 则不会。

无论如何，使用这个查询一切都按预期工作：

select * from test1
where id1=1::bigint and id2=1::bigint
and validtime && '[2020-01-01,2020-02-01)';

学习这个花了我几个小时，我再也不会忘记了！