存在逻辑变更 vs 存在限制1

11
我写了一个查询。
select * from table
where exists (select 1 from table1 where table.column = table1.column)

如果我将其更改为

select * from table
where exists (select 1 from table1 where table.column = table1.column limit 1)

这是否会改变逻辑?

我问这个问题是因为查询计划的成本已经发生了变化(从17000变成了2400)。我使用的是Postgres 9.4。

更新:两个查询的详细解释分析

explain (analyze, verbose)
select * from sr_srv_rendered r
where exists (select 1 from sr_res_group rg where rg.id = r.res_group_id and rg.responsible_id = 1)

limit 30

"Limit  (cost=62.06..74.63 rows=30 width=157) (actual time=0.017..0.017 rows=0 loops=1)"
"  Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r.price_ (...)"
"  ->  Nested Loop  (cost=62.06..287707.96 rows=686607 width=157) (actual time=0.017..0.017 rows=0 loops=1)"
"        Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r. (...)"
"        ->  Bitmap Heap Scan on public.sr_res_group rg  (cost=61.62..10093.63 rows=2734 width=4) (actual time=0.017..0.017 rows=0 loops=1)"
"              Output: rg.id, rg.bdate, rg.edate, rg.is_system, rg.name, rg.department_id, rg.org_id, rg.responsible_id, rg.is_available_in_electronic_queue, rg.label_id, rg.ignore_regclinic_check, rg.note, rg.blocked, rg.block_comment, rg.template_res_grou (...)"
"              Recheck Cond: (rg.responsible_id = 1)"
"              ->  Bitmap Index Scan on responsible_fk  (cost=0.00..60.94 rows=2734 width=0) (actual time=0.015..0.015 rows=0 loops=1)"
"                    Index Cond: (rg.responsible_id = 1)"
"        ->  Index Scan using fkb95967dd9f6b119a on public.sr_srv_rendered r  (cost=0.43..99.03 rows=251 width=157) (never executed)"
"              Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_ (...)"
"              Index Cond: (r.res_group_id = rg.id)"
"Planning time: 0.931 ms"
"Execution time: 0.355 ms"





explain (analyze, verbose)
select * from sr_srv_rendered r
where exists (select 1 from sr_res_group rg where rg.id = r.res_group_id and rg.responsible_id = 1 limit 1)

limit 30

"Limit  (cost=0.00..509.03 rows=30 width=157) (actual time=49392.352..49392.352 rows=0 loops=1)"
"  Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r.price_ (...)"
"  ->  Seq Scan on public.sr_srv_rendered r  (cost=0.00..100177996.03 rows=5904050 width=157) (actual time=49392.340..49392.340 rows=0 loops=1)"
"        Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r. (...)"
"        Filter: (SubPlan 1)"
"        Rows Removed by Filter: 11062881"
"        SubPlan 1"
"          ->  Limit  (cost=0.43..8.46 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=11062881)"
"                Output: (1)"
"                ->  Index Scan using sr_res_group_pk on public.sr_res_group rg  (cost=0.43..8.46 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11062881)"
"                      Output: 1"
"                      Index Cond: (rg.id = r.res_group_id)"
"                      Filter: (rg.responsible_id = 1)"
"                      Rows Removed by Filter: 1"
"Planning time: 0.694 ms"
"Execution time: 49392.495 ms"

没有参数的解释更快


3
这不应该改变逻辑。也不应影响性能。为了获得最佳性能,您需要在“table1(column)”上建立索引。 - Gordon Linoff
这肯定是优化器的问题(如果不是一个漏洞的话)。你应该把它发布到Postgres邮件列表上。 - user330315
1个回答

14

根据这些结果,高层逻辑没有改变(在两种情况下都返回相同的空集),但计划有所改变,导致性能差异很大。

看起来正在发生的事情是PostgreSQL理解并且愿意将第一种情况(没有LIMIT内的EXISTS)转换为嵌套循环连接,而在第二种情况下(使用LIMIT内的EXISTS),PostgreSQL不知道如何将其转换为连接(由于LIMIT)并使用幼稚的方法实现- 在表中进行顺序扫描,并为每行运行子查询。

PostgreSQL了解EXISTS的工作原理,并且知道它只需要找到一行,添加“LIMIT 1”是不必要的,并且像在这种情况下一样,最终实际上是有害的。

PostgreSQL可能会被改进以意识到在EXISTS内部的LIMIT 1只是噪音,不应具有意义,但这将增加计划查询所需的时间,目前还不清楚是否值得花费这样的时间。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接