PostgreSQL表的选择查询速度太慢

3

我正在使用一款GPS跟踪应用程序。它有一个名为gps_vehicle_data的表,其中频繁存储传入的GPS数据。我经常查询此表以处理它,因为它只包含原始数据。最近,我发现在表上执行选择语句时出现了长时间的延迟。下面是EXPLAIN的结果。我还尝试过进行VACUUM并将结果粘贴在下面。可能的原因是什么?

EXPLAIN (ANALYZE, BUFFERS) select * from gps_vehicle_data;

                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on gps_vehicle_data  (cost=0.00..130818.81 rows=1400881 width=1483) (actual time=209.129..62488.822 rows=9635 loops=1)
   Buffers: shared hit=13132 read=103678 dirtied=67 written=25
 Planning time: 0.050 ms
 Execution time: 62500.850 ms

清空输出。

VACUUM (VERBOSE,ANALYSE) gps_vehicle_data;
INFO:  vacuuming "public.gps_vehicle_data"
INFO:  index "gps_vehicle_data_pkey" now contains 1398939 row versions in 10509 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.07s/0.09u sec elapsed 9.38 sec.
INFO:  index "gps_vehicle_data_status_idx" now contains 1398939 row versions in 4311 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.04u sec elapsed 4.50 sec.
INFO:  index "gps_vehicle_data_url_data_idx" now contains 1399004 row versions in 98928 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.76s/0.88u sec elapsed 82.74 sec.
INFO:  index "gps_vehicle_data_createdon_idx" now contains 1399007 row versions in 3946 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.02u sec elapsed 1.92 sec.
INFO:  "gps_vehicle_data": found 0 removable, 1402484 nonremovable row versions in 116884 out of 116884 pages
DETAIL:  1401490 dead row versions cannot be removed yet.
There were 143431 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 1.70s/2.38u sec elapsed 200.61 sec.
INFO:  vacuuming "pg_toast.pg_toast_17296"
INFO:  index "pg_toast_17296_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "pg_toast_17296": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing "public.gps_vehicle_data"
INFO:  "gps_vehicle_data": scanned 30000 of 116884 pages, containing 335 live rows and 359656 dead rows; 335 rows in sample, 1042851 estimated total rows
VACUUM

3
您的表包含了大量无法清理的死行("1401490 dead row versions cannot be removed yet")。最有可能的原因是您有处于“事务空闲”状态的连接,导致旧行未被清理。 - user330315
1个回答

4
你读取了超过100,000个数据块来获取一些10,000行数据,这意味着你的表基本上包含了大量无效数据(即出现了“表膨胀”)。
这个表在某个时刻必须包含了更多的数据,而且由于已经删除了大部分数据,导致了表膨胀。
如@a_horse_with_no_name所述,由于某些旧事务阻塞了一些行,因此有些行无法被回收。但是,虽然VACUUM会释放掉死行,但它不会重新组织表以消除膨胀。
在这种情况下,正确的解决方案是使用VACUUM (FULL, ANALYZE) gps_vehicle_data (加上ANALYZE只是为了保险起见,因为看起来你的表统计信息有误),这将重新组织表。但要注意,当VACUUM (FULL)运行时,所有对该表的访问都会被阻塞。

非常感谢。我关闭了所有连接到数据库的应用服务器并运行了VACUUM(VERBOZE,ANALYZE),它删除了空闲的死行。现在查询速度更快了。在维护时间内,我还将尝试运行VACUUM(FULL,ANALYZE)。 - Vignesh Bhaskar
1
不要经常运行 VACUUM (FULL)。通常情况下,除非您进行大规模删除,否则表格不应出现膨胀问题。 - Laurenz Albe
是的,一旦原始数据被处理,我会进行批量删除。在这种情况下可以吗?还是会有什么问题吗? - Vignesh Bhaskar
1
当执行 VACUUM (FULL) 时,只有表会被锁定,这没有问题。如果您要清空表,可以使用 TRUNCATE 而非 DELETE。如果不想进行清理也可以让表膨胀,这也没有问题,下一次的 INSERT 操作将会利用空闲空间。 - Laurenz Albe

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接