Postgres GIN索引未被使用,总是执行顺序扫描

3
我已经创建了一个表格来进行令牌搜索。
CREATE TABLE tsvector_business_objects (
    id int4 NULL,
    "type" varchar NULL,
    value varchar NULL,
    label varchar NULL,
    synonyms _text NULL,
    label_tsvector tsvector NULL
);
CREATE INDEX label_tsvector_idx ON tsvector_business_objects USING gin (label_tsvector);
CREATE INDEX lower_case_synonym_tsvector_business_objects ON tsvector_business_objects USING gin (synonyms);
CREATE INDEX txt_label_tsvector_business_objects ON tsvector_business_objects USING gin (to_tsvector('english'::regconfig, lower((label)::text)));
CREATE INDEX txt_value_tsvector_business_objects ON tsvector_business_objects USING gin (to_tsvector('english'::regconfig, lower((value)::text)));
CREATE INDEX type_tsvector_business_objects ON tsvector_business_objects USING btree (type);
CREATE INDEX type_value_label_lower_case_tsvector_business_objects ON tsvector_business_objects USING btree (lower((type)::text), lower((value)::text), lower((label)::text));

我在数据中有一些类似于OID-0127820的文本,当我使用ts_vector Postgres搜索OID时,它总是使用序列扫描而不使用GIN索引。

有500k条记录包含OID文本。但是当我禁用序列扫描时,索引被使用。

SET enable_seqscan = ON;
explain (analyze,verbose,buffers,timing,costs) 
SELECT TYPE, value, label, synonyms, ((to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)) as partial_value_label_match, (lower(value) ='OID' OR lower(label) ='OID') as exact_value_label_match, (synonyms @> '{OID}') is true as synonym_match FROM tsvector_business_objects AS business_objects_alias
WHERE synonyms @> '{OID}'
OR (to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0)
OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)
OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) 
OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR (lower(value) ='OID' OR lower(label) ='OID')
limit 30



QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=0.00..41.91 rows=30 width=82) (actual time=3804.998..3805.163 rows=30 loops=1)                                                                                                                                                                    |
  Output: type, value, label, synonyms, (((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery))), (((lower((value)::text) = 'OID'::text) OR (lowe|
  Buffers: shared hit=21217                                                                                                                                                                                                                                    |
  ->  Seq Scan on mdlz_performancebenchmarking.tsvector_business_objects business_objects_alias  (cost=0.00..717278.98 rows=513425 width=82) (actual time=3804.997..3805.159 rows=30 loops=1)                                                                  |
        Output: type, value, label, synonyms, ((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery)), ((lower((value)::text) = 'OID'::text) OR (l|
        Filter: ((business_objects_alias.synonyms @> '{OID}'::text[]) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.label)::tex|
        Rows Removed by Filter: 575042                                                                                                                                                                                                                         |
        Buffers: shared hit=21217                                                                                                                                                                                                                              |
Planning time: 0.226 ms                                                                                                                                                                                                                                        |
Execution time: 3805.210 ms   

SET enable_seqscan = OFF;
explain (analyze,verbose,buffers,timing,costs) 
SELECT TYPE, value, label, synonyms, ((to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)) as partial_value_label_match, (lower(value) ='OID' OR lower(label) ='OID') as exact_value_label_match, (synonyms @> '{OID}') is true as synonym_match FROM tsvector_business_objects AS business_objects_alias
WHERE synonyms @> '{OID}'
OR (to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0)
OR (lower(value) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0)
OR ((to_tsvector('english', lower(label))) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0) 
OR (lower(label) LIKE '% OID %' AND numnode(plainto_tsquery('english', 'OID')) = 0) OR (lower(value) ='OID' OR lower(label) ='OID')
limit 30



QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=66996.67..67036.72 rows=30 width=82) (actual time=312.131..312.251 rows=30 loops=1)                                                                                                                                                               |
  Output: type, value, label, synonyms, (((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery))), (((lower((value)::text) = 'OID'::text) OR (lowe|
  Buffers: shared hit=45103                                                                                                                                                                                                                                    |
  ->  Bitmap Heap Scan on mdlz_performancebenchmarking.tsvector_business_objects business_objects_alias  (cost=66996.67..752444.39 rows=513425 width=82) (actual time=312.130..312.247 rows=30 loops=1)                                                        |
        Output: type, value, label, synonyms, ((to_tsvector('english'::regconfig, lower((value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((label)::text)) @@ '''oid'''::tsquery)), ((lower((value)::text) = 'OID'::text) OR (l|
        Recheck Cond: ((business_objects_alias.synonyms @> '{OID}'::text[]) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.value)::text)) @@ '''oid'''::tsquery) OR (to_tsvector('english'::regconfig, lower((business_objects_alias.label|
        Heap Blocks: exact=1                                                                                                                                                                                                                                   |
        Buffers: shared hit=45103                                                                                                                                                                                                                              |
        ->  BitmapOr  (cost=66996.67..66996.67 rows=666282 width=0) (actual time=310.732..310.732 rows=0 loops=1)                                                                                                                                              |
              Buffers: shared hit=45102                                                                                                                                                                                                                        |
              ->  Bitmap Index Scan on lower_case_synonym_tsvector_business_objects  (cost=0.00..189.20 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1)                                                                                              |
                    Index Cond: (business_objects_alias.synonyms @> '{OID}'::text[])                                                                                                                                                                           |
                    Buffers: shared hit=2                                                                                                                                                                                                                      |
              ->  Bitmap Index Scan on txt_value_tsvector_business_objects  (cost=0.00..2638.25 rows=333140 width=0) (actual time=49.131..49.131 rows=575000 loops=1)                                                                                          |
                    Index Cond: (to_tsvector('english'::regconfig, lower((business_objects_alias.value)::text)) @@ '''oid'''::tsquery)                                                                                                                         |
                    Buffers: shared hit=86                                                                                                                                                                                                                     |
              ->  Bitmap Index Scan on txt_label_tsvector_business_objects  (cost=0.00..2638.25 rows=333140 width=0) (actual time=49.956..49.956 rows=575000 loops=1)                                                                                          |
                    Index Cond: (to_tsvector('english'::regconfig, lower((business_objects_alias.label)::text)) @@ '''oid'''::tsquery)                                                                                                                         |
                    Buffers: shared hit=86                                                                                                                                                                                                                     |
              ->  Bitmap Index Scan on type_value_label_lower_case_tsvector_business_objects  (cost=0.00..30444.59 rows=1 width=0) (actual time=93.168..93.169 rows=0 loops=1)                                                                                 |
                    Index Cond: (lower((business_objects_alias.value)::text) = 'OID'::text)                                                                                                                                                                    |
                    Buffers: shared hit=22464                                                                                                                                                                                                                  |
              ->  Bitmap Index Scan on type_value_label_lower_case_tsvector_business_objects  (cost=0.00..30444.59 rows=1 width=0) (actual time=118.465..118.465 rows=0 loops=1)                                                                               |
                    Index Cond: (lower((business_objects_alias.label)::text) = 'OID'::text)                                                                                                                                                                    |
                    Buffers: shared hit=22464                                                                                                                                                                                                                  |
Planning time: 0.249 ms                                                                                                                                                                                                                                        |
Execution time: 312.579 ms    

输出数据


type                 |value      |label      |synonyms|label_tsvector      |partial_value_label_match|exact_value_label_match|synonym_match|
---------------------|-----------|-----------|--------|--------------------|-------------------------|-----------------------|-------------|
orderid_1621409737948|OID-0127820|OID-0127820|NULL    |'-0127820':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355880|OID-0355880|NULL    |'-0355880':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0041048|OID-0041048|NULL    |'-0041048':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0499716|OID-0499716|NULL    |'-0499716':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219268|OID-0219268|NULL    |'-0219268':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0000560|OID-0000560|NULL    |'-0000560':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355656|OID-0355656|NULL    |'-0355656':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355628|OID-0355628|NULL    |'-0355628':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219380|OID-0219380|NULL    |'-0219380':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0063896|OID-0063896|NULL    |'-0063896':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0054740|OID-0054740|NULL    |'-0054740':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0447020|OID-0447020|NULL    |'-0447020':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0040964|OID-0040964|NULL    |'-0040964':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0499744|OID-0499744|NULL    |'-0499744':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219800|OID-0219800|NULL    |'-0219800':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0355180|OID-0355180|NULL    |'-0355180':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0530348|OID-0530348|NULL    |'-0530348':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0219996|OID-0219996|NULL    |'-0219996':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0220024|OID-0220024|NULL    |'-0220024':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0446936|OID-0446936|NULL    |'-0446936':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0220108|OID-0220108|NULL    |'-0220108':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0020692|OID-0020692|NULL    |'-0020692':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354872|OID-0354872|NULL    |'-0354872':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0097496|OID-0097496|NULL    |'-0097496':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354648|OID-0354648|NULL    |'-0354648':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0128268|OID-0128268|NULL    |'-0128268':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354536|OID-0354536|NULL    |'-0354536':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0530432|OID-0530432|NULL    |'-0530432':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0128324|OID-0128324|NULL    |'-0128324':2 'oid':1|true                     |false                  |false        |
orderid_1621409737948|OID-0354256|OID-0354256|NULL    |'-0354256':2 'oid':1|true                     |false                  |false        |

random_page_cost 是 1.1。

我认为Postgres查询计划器认为使用序列扫描会更快,但事实并非如此。

2个回答

3

一个基本问题是,规划器认为在顺序扫描中会很早地找到30行数据,因此可以很早停止。但实际上这是错误的,可能因为符合条件的行在表的早期部分出现得很少。即使选择性估计值完美无缺,也很难仅通过改进选择性估计来解决该问题,因为行即使被均匀散布在整个表中,该错误也会发生。


2
问题在于许多条件高度相关,因为它们测试的几乎是相同的内容。由于优化器不知道这一点并将它们视为统计上独立的,所以会得出错误的估计值。
您可以尝试简化 WHERE 条件,或者升级到 PostgreSQL v14,在该版本中,提交a4d75c86bf1引入了表达式的扩展统计信息。

即使只有一个ts_vector where子句OR(to_tsvector('english',lower(value))@@ plainto_tsquery('english','OID')AND numnode(plainto_tsquery('english','OID'))> 0),Postgres也不使用索引。 - unnik
那是“与”(AND)而不是“或”(OR)。我想我不明白你想说什么。 - Laurenz Albe
抱歉,我的意思是如果我只在where子句中添加一个条件(to_tsvector('english', lower(value)) @@ plainto_tsquery('english', 'OID') AND numnode(plainto_tsquery('english', 'OID')) > 0),它仍然不使用索引。 - unnik
我明白了。嗯,第二个条件是多余的。但这不应该是问题。也许“OID”在你的文本中是一个非常频繁出现的值? - Laurenz Albe
是的,有超过500K个带OID的记录。 - unnik

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接