按时间戳列排序的查询非常缓慢

Question

按时间戳列排序的查询非常缓慢

7

以下查询尽管有索引，并且id是主键，但返回数据需要约15秒钟。

select id from my_table order by insert_date offset 0 limit 1

以下是解释分析：

"Limit  (cost=1766417.72..1766417.72 rows=1 width=12) (actual time=32479.440..32479.441 rows=1 loops=1)"
"  ->  Sort  (cost=1766417.72..1797117.34 rows=12279848 width=12) (actual time=32479.437..32479.437 rows=1 loops=1)"
"        Sort Key: insert_date"
"        Sort Method: top-N heapsort  Memory: 25kB"
"        ->  Seq Scan on my_table  (cost=0.00..1705018.48 rows=12279848 width=12) (actual time=0.006..21338.401 rows=12108916 loops=1)"
"Total runtime: 32479.476 ms"

我的表格还有其他几列，但是insert_date的类型是

insert_date timestamp without time zone NOT NULL DEFAULT now(),

我在那个特定日期列上有一个索引，它是：

CREATE INDEX my_table_insert_date_indx
  ON my_table
  USING btree
  (insert_date)
TABLESPACE somexyz_idx_ts;

postgresql.conf文件中的几个值：

shared_buffers = more than 1GB    ## just for an example
temp_buffers = more than 1GB
work_mem = more than 1GB
maintenance_work_mem = more than 1GB
dynamic_shared_memory_type = posix
default_statistics_target = 10000
autovacuum = on
random_page_cost = 2.0
cpu_index_tuple_cost = 0.0005

我现在正在使用Postgres 9.3。

更新：

刚刚执行了以下查询：

select insert_date, count(*) from my_table group by insert_date

搜索结果的前几个为：

"2015-04-02 00:00:00";3718104
"2015-04-03 00:00:00";6410253
"2015-04-04 00:00:00";538247
"2015-04-05 00:00:00";1228877
"2015-04-06 00:00:00";131248

我在那张表上大约有1200万条记录。而上面的计数几乎接近于总数。

不确定，但是可能问题出在索引被创建在一个有大量重复值的列上。如果是这样，我们有什么解决方法吗？

- Sabuj Hassan

也许询问的更好的地方是: dba.stackexchange.com - Luc M

最近在SO上有一个类似的问题，我认为结论可能是9.4更擅长按索引列排序以避免排序。你可能想搜索一下那个问题。 - David Aldridge

请使用 set enable_seqscan = off; 测试相同的查询，并显示解释分析输出。你的索引和表有多大？在 psql 中使用 \di+ my_table_insert_date_indx、\dt+ my_table 命令可以显示大小。 - alexius

@alexius 禁用 seqscan 对分析结果没有任何影响。然而，在实际环境中，我不能仅为了改进这个特定的查询而禁用 seqscan。我的表大小为 23 GB，索引大小为 293 MB。 - Sabuj Hassan

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike Sherrill 'Cat Recall' · Accepted Answer

在我的机器上，使用PostgreSQL 9.3和9.4，您的查询速度大约快了160000倍。我的机器并不特别。

-- From PostgreSQL 9.4; 9.3 is similar.
show shared_buffers; -- 128MB
show temp_buffers; -- 8MB
show work_mem; -- 4MB
show maintenance_work_mem; -- 64MB
show dynamic_shared_memory_type; -- posix
show default_statistics_target; -- 100
show autovacuum; -- on
show random_page_cost; -- 4
show cpu_index_tuple_cost; -- 0.005

准备工作

让我们来建立一张表格。（你应该在你的问题中已经完成了这个步骤。）

create table my_table (
  id serial primary key,
  insert_date timestamp not null
);

-- Round numbers of rows.
insert into my_table(insert_date)
select timestamp '2015-04-02 00:00:00'
from generate_series(1, 3000000) n;

insert into my_table(insert_date)
select timestamp '2015-04-03 00:00:00'
from generate_series(1, 6000000) n;

insert into my_table(insert_date)
select timestamp '2015-04-04 00:00:00'
from generate_series(1, 500000) n;

insert into my_table(insert_date)
select timestamp '2015-04-05 00:00:00'
from generate_series(1, 1200000) n;

insert into my_table(insert_date)
select timestamp '2015-04-06 00:00:00'
from generate_series(1, 131000) n;

创建索引并更新统计信息。

create index on my_table (insert_date);
analyze my_table;

PostgreSQL 9.4

现在，您的第一个查询会得到什么样的执行计划呢？

explain analyze 
select id from my_table order by insert_date offset 0 limit 1;

"Limit  (cost=0.43..0.48 rows=1 width=12) (actual time=0.014..0.014 rows=1 loops=1)"
"  ->  Index Scan using my_table_insert_date_idx on my_table  (cost=0.43..540656.27 rows=11200977 width=12) (actual time=0.012..0.012 rows=1 loops=1)"
"计划时间: 0.195 毫秒"
"执行时间: 0.032 毫秒"

PostgreSQL 9.3

这是一个针对my_table的查询，使用了名为my_table_insert_date_idx的索引进行扫描。查询返回的结果集中只有一行数据，且查询的执行时间非常短。

explain analyze 
select id from my_table order by insert_date offset 0 limit 1;

“Limit (cost=0.43..0.47行=1宽度=12) (实际时间=0.058..0.059行=1循环=1)”
“ ->使用my_table_insert_date_idx在my_table上进行索引扫描(cost=0.43..339814.36行=10830995宽度=12) (实际时间=0.057..0.057行=1循环=1)”
“总运行时间: 0.098毫秒”

select id from my_table order by insert_date offset 0 limit 1;

这个查询结果是不确定的。有3百万行数据的插入日期最早（按照ORDER BY子句排序的第一个日期）。你可以从这三百万行中随便选取任意一行，但是PostgreSQL不能保证每次都返回同样的id。

如果您无所谓它返回的是这三百万个id中的哪一个，您也可以采用不同的查询方式。但我认为改变查询方式并不能使速度提高160倍。

您可以对某个特定的查询更改其中的一些设置，例如：

-- Don't commit or rollback . . . 
begin transaction;
set local work_mem = '8 MB';

explain analyze 
select id from my_table order by insert_date offset 0 limit 1;
-- Displays the result.

手动提交或回滚。

commit;

你的 work_mem 设置将返回到服务器启动时设置的值。

show work_mem; -- 4MB