PostgreSQL日期查询性能问题

Question

PostgreSQL日期查询性能问题

sqldatabasepostgresqldatabase-partitioning

3

我有一个包含大约5百万行的Postgres表。我想知道有多少行匹配 start_time >= NOW()，但尽管在start_time上建立了索引，查询非常缓慢（需要几个小时）。

EXPLAIN SELECT COUNT(*) FROM core_event WHERE start_time >= NOW();
 Aggregate  (cost=449217.81..449217.82 rows=1 width=0)
   ->  Index Scan using core_event_start_time on core_event  (cost=0.00..447750.83 rows=586791 width=0)
         Index Cond: (start_time >= now())

这是该表的模式信息：

 id          | integer                  | not null default nextval('core_event_id_seq'::regclass)
 source      | character varying(100)   | not null
 external_id | character varying(100)   | 
 title       | character varying(250)   | not null
 location    | geometry                 | not null
 start_time  | timestamp with time zone | 
 stop_time   | timestamp with time zone | 
 thumb       | character varying(300)   | 
 image       | character varying(100)   | 
 image_thumb | character varying(100)   | 
 address     | character varying(300)   | 
 description | text                     | 
 venue_name  | character varying(100)   | 
 website     | character varying(300)   | 
 city_id     | integer                  | 
 category    | character varying(100)   | 
 phone       | character varying(50)    | 
 place_id    | integer                  | 
 image_url   | character varying(300)   | 
 event_type  | character varying(200)   | 
 hidden      | boolean                  | not null
 views       | integer                  | not null
 added       | timestamp with time zone |

我在以下字段上建立了索引：

city_id
external_id (unique)
location
location_id
place_id
start_time

有没有简单的方法可以加快查询速度（例如，部分索引），还是我必须通过按日期对数据进行分区来解决问题？

- Ben Dowling

2

http://wiki.postgresql.org/wiki/Slow_Counting - undefined

真的适用吗？根据我阅读的维基链接，@StefanNch的链接只在计数不受索引字段限制时才相关。 - user806549

你的统计数据最新吗？ - user806549

1

@StefanNch，我正在应用一个where子句，所以这个不太相关。 - undefined

@AndersUP 我认为它们是最新的。我现在正在运行 ANALYZE core_event，看看是否会有任何影响。 - undefined

显示剩余5条评论

2个回答

0

大多数情况下，这些列是否会为每一行填充数据？如果是这样的话，即使在检查索引之后，PostgreSQL仍然需要查看大量磁盘来测试行的活跃性。例如，尝试创建一个只包含id和start_time的单独表格。

create table core_event_start_time as select id, start_time from core_event;
alter table core_event_start_time add primary key(id);
alter table core_event_start_time add foreign key(id) references core_event(id);
create index on core_event_start_time(start_time);

现在看看只计算core_event_start_time字段中的ID需要多长时间。当然，这种方法会占用更多的缓冲区来换取实际的core_event表的空间...

如果有帮助的话，可以为core_event添加一个触发器来保持辅助表的更新。

（PostgreSQL 9.2将引入“仅索引扫描”，可能对这种情况有所帮助，但这是未来的事情）

- araqnid

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Erwin Brandstetter · Accepted Answer

尝试添加以下类似的部分索引来改善情况：

CREATE INDEX core_event_start_time_recent_idx ON core_event (start_time)
WHERE start_time >= '2011-01-12 0:0'::timestamptz

这将创建一个相对较小的索引。索引的创建需要一些时间，但此后像这样的查询将快得多。

SELECT count(*) FROM core_event WHERE start_time >= now();

该索引用于针对now()的查询，随着新行的增加，其效率会逐渐降低。请在非工作时间偶尔使用更新（=删除和创建）带有更近时间戳的索引。
您可以通过plpgsql函数自动化此过程，每个cronjob或pgAgent调用该函数。

如果不违反数据库中其他要求，您可以尝试在表上运行CLUSTER，从而改善性能。

CLUSTER core_event USING core_event_start_time;

是的，对整个索引进行集群化而不是部分索引。这将需要一段时间并需要独占锁，因为它实际上重写了表格。它还会完全清除表格。在手册中了解更多信息。

您也可以增加core_event.start_time的统计目标值。

ALTER core_event ALTER start_time SET STATISTICS 1000; -- example value

默认值只是100。然后：

ANALYZE core_event;

当然，所有常规的性能优化技巧同样适用。