PostgreSQL中的连接性能

3

我有两个表格scheduling_flownodexact_message,它们之间的关系很弱。我正在尝试执行以下查询:

set search_path='ad_96d5be';
explain analyze 
SELECT f.id, f.target_object_id 
FROM "scheduling_flownode" f, 
     "xact_message" m 
where f.target_object_id = m.id 
and f.root_node=True 
AND f.state=1 
and m.state=4 
and m.templatelanguage_id IN (17, 18, 19, 20, 21, 22, 23, 24);

执行时,我得到了以下查询计划

  Gather  (cost=252701.26..1711972.04 rows=374109 width=8) (actual time=17737.908..164181.063 rows=441130 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
   ->  Hash Join  (cost=251701.26..1673561.14 rows=155879 width=8) (actual time=18805.587..163991.468 rows=147043 loops=3)
         Hash Cond: (f.target_object_id = m.id)
         Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
         ->  Parallel Bitmap Heap Scan on scheduling_flownode f  (cost=124367.21..1523127.76 rows=2061083 width=8) (actual time=963.910..155466.840 rows=1642157 loops=3)
               Recheck Cond: (state = 1)
               Rows Removed by Index Recheck: 44
               Filter: root_node
               Rows Removed by Filter: 12406874
               Heap Blocks: exact=10570 lossy=427078
               Buffers: shared read=1328631
               ->  Bitmap Index Scan on "root-node-and-state"  (cost=0.00..123130.57 rows=4946600 width=0) (actual time=955.044..955.045 rows=4926472 loops=1)
                     Index Cond: ((root_node = true) AND (state = 1))
                     Buffers: shared read=13464
         ->  Hash  (cost=120677.64..120677.64 rows=405712 width=4) (actual time=7124.131..7124.131 rows=441128 loops=3)
               Buckets: 131072  Batches: 8  Memory Usage: 2966kB
               Buffers: shared hit=35591 read=17793, temp written=3384
               ->  Bitmap Heap Scan on xact_message m  (cost=7893.56..120677.64 rows=405712 width=4) (actual time=61.307..6925.456 rows=441128 loops=3)
                     Recheck Cond: (state = 4)
                     Filter: (templatelanguage_id = ANY ('{17,18,19,20,21,22,23,24}'::integer[]))
                     Rows Removed by Filter: 4
                     Heap Blocks: exact=16585
                     Buffers: shared hit=35591 read=17793
                     ->  Bitmap Index Scan on "state-index"  (cost=0.00..7792.13 rows=421826 width=0) (actual time=58.781..58.781 rows=441132 loops=3)
                           Index Cond: (state = 4)
                           Buffers: shared hit=2420 read=1209
 Planning time: 1.382 ms
 Execution time: 164289.481 ms
(31 rows)


scheduling_flownode 表有超过 4 亿条记录,xact_message 表大约有 500 万行。我在使用 postgres 10 ,我认为这么大的负载应该可以轻松处理,如果可以,请问我在查询方面有什么问题吗?


1
统计数据有很大偏差。运行 analyze scheduling_flownode; 会改变什么吗? - user330315
这取决于您的硬件、其他负载和索引。 - Joel Coehoorn
硬件包括AWS RDS服务器8GB RAM,200 GB内存和索引按照查询计划设置。系统上绝对没有其他负载。单个查询正在运行。 - Varsha Teckchandani
已更新注释以删除 datepart。对于造成的混淆,我感到抱歉。只是尝试了太多东西。此外,target_object_id 上已经有一个索引,但是 postgres 没有使用它。 - Varsha Teckchandani
尝试创建以下两个索引,以覆盖您的查询:{ scheduling_flownode(state,root_node,target_object_id,id)}和{ xact_message(state,template_language,id)}。还尝试将模板语言谓词更改为m.templatelanguage_id BETWEEN 17 AND 24。 - SQLRaptor
显示剩余2条评论
1个回答

2
你没有展示你有哪些索引,但我强烈建议你的索引应该覆盖你筛选的所有列。
在Postgres 11中,可以通过使用覆盖索引来实现,例如在表格scheduling_flownode上,你会有一个像这样的索引:
CREATE INDEX ix_scheduling_flownode_target_object_id 
  ON scheduling_flownode(target_object_id) 
    INCLUDE (state, root_node);

在Postgres 10中,只需将列包含在索引中即可:
CREATE INDEX ix_scheduling_flownode_target_object_id 
  ON scheduling_flownode(target_object_id, state, root_node);

对于表格 xact_message,同样使用 templatelanguage_idstate

我们添加了这个查询,但似乎Postgres在查询时没有使用它。 - Varsha Teckchandani

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接