我有一张只读表格,其中包含了8000万行:
Column | Type | Modifiers | Storage | Stats target | Description
-------------+------------------------+-----------+----------+--------------+-------------
id | character(11) | not null | extended | |
gender | character(1) | | extended | |
postal_code | character varying(10) | | extended | |
operator | character varying(5) | | extended | |
Indexes:
"categorised_phones_pkey" PRIMARY KEY, btree (id)
"operator_idx" btree (operator)
"postal_code_trgm_idx" gin (postal_code gin_trgm_ops)
id
是主键,包含唯一的手机号码。表格行看起来像这样:
id | gender | postal_code | operator
----------------+--------------+----------------+------------
09567849087 | m | 7414776788 | mtn
09565649846 | f | 1268398732 | mci
09568831245 | f | 7412556443 | mtn
09469774390 | m | 5488312790 | mci
这个查询在第一次执行时需要大约65秒的时间,之后每次执行只需要大约8秒的时间。
select operator,count(*) from categorised_phones where postal_code like '1%' group by operator;
输出结果如下:
operator | count
----------+---------
mci | 4050314
mtn | 6235778
explain analyze
的输出结果:
HashAggregate (cost=1364980.61..1364980.63 rows=2 width=10) (actual time=8257.026..8257.026 rows=2 loops=1)
Group Key: operator
-> Bitmap Heap Scan on categorised_phones (cost=95969.17..1312915.34 rows=10413054 width=2) (actual time=1140.803..6332.534 rows=10286092 loops=1)
Recheck Cond: ((postal_code)::text ~~ '1%'::text)
Rows Removed by Index Recheck: 25105697
Heap Blocks: exact=50449 lossy=237243
-> Bitmap Index Scan on postal_code_trgm_idx (cost=0.00..93365.90 rows=10413054 width=0) (actual time=1129.270..1129.270 rows=10287127 loops=1)
Index Cond: ((postal_code)::text ~~ '1%'::text)
Planning time: 0.540 ms
Execution time: 8257.392 ms
如何使查询更快?
欢迎任何想法。
附:
我正在使用PostgreSQL 9.6.1
更新
我刚刚更新了问题。 我禁用了Parallel Query
,结果发生了变化。
postal_code
上有一个gin索引。它定义了哪些操作?你似乎没有使用任何会触发该索引的东西。 - jmeleskyLIKE '1%'
。 - Hamed Kamrava