为具有子查询的查询优化SQL“Where”子句

Question

为具有子查询的查询优化SQL“Where”子句

sqlpostgresqlsubquerywhere-clauseclause

4

假设我有以下虚构的数据结构：

```html

让我们假设我有以下虚构的数据结构：

```

create table "country"
(
  country_id integer,  
  country_name varchar(50),
  continent varchar(50),
  constraint country_pkey primary key (country_id)
);

create table "person"
(
  person_id integer,
  person_name varchar(100),
  country_id integer,
  constraint person_pkey primary key (person_id)
);

create table "event"
(
  event_id integer,
  event_desc varchar(100),
  country_id integer,
  constraint event_pkey primary key (event_id)
);

我想查询每个国家的人员和事件行数。我决定使用子查询。

select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
group by c.country_name

我知道您可以使用SELECT语句在字段列表中实现此操作，但使用子查询的优点是我更灵活地修改SQL以使其更加概括并使用其他字段。比如说，如果我将查询改为按大陆显示，只需将字段“c.country_name”替换为“c.continent”即可轻松完成。

我的问题涉及筛选。如果我们添加一个WHERE子句：

select c.country_name, 
  sum(sub1.person_count) as person_count, 
  sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name

子查询似乎仍然对所有国家执行计数。假设人员和事件表很大，并且我已经在所有表上为country_id创建了索引。速度非常慢。数据库不应该只为过滤的国家执行子查询吗？我必须重新创建每个子查询的国家过滤器吗（这非常繁琐，代码不容易修改）？顺便说一下，我同时使用PostgreSQL 8.3和9.0，但我想其他数据库也会发生同样的情况。

- clj

执行计划是什么样子的？ - Kuberchaun

这只是一个假设的情景。但是，在运行非常类似的查询的生产数据库上，它对子查询中的表进行了顺序扫描。它从未使用索引。而且这些表很大。 - clj

2个回答

0

你能使用country_id而不是country_name来过滤/分组行吗？我猜你没有按名称建立索引。子查询不使用任何索引，因为你需要扫描整个表。如果要减少扫描次数，应该过滤数据。

- ravnur

我在 country_name 上建立了一个索引。但是，子查询仍然扫描整个表而不仅仅是特定的国家过滤器。 - clj

当然会这样。因为你没有过滤数据。索引仅在优化器认为您检索少量行时使用（使用索引进行快速扫描）。当您扫描整个表时，通过整个表运行以获取所有记录更快。所以...优化器很好，您应该阅读有关索引策略的文档 :) - ravnur

我有“where c.country_name ='UNITED STATES'”过滤器。那么，为什么子查询仍然要扫描整个表？ - clj

由于优化器的原因，它会按照从下到上的顺序分析查询。因此，在您的情况下，它将首先运行两个子查询并获取两个子集。然后它将连接country表并过滤数据。请对此查询进行解释，您将看到它的效果。 - ravnur

有没有什么办法可以避免将过滤器放在子查询中？比如说，如果我想将这个查询变成一个视图，那么如果你筛选特定的国家，使用这个视图会非常慢。如果你筛选特定的国家，那么要使这个视图运行快速是不可能的，因为这意味着你需要修改内部子查询以包含你的过滤器。 - clj

您可以单独运行子查询：1）获取person_count 2）获取event_count。在我看来，您正在试图重新发明自行车。 - ravnur

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike Sherrill 'Cat Recall' · Accepted Answer

数据库难道不应该只执行被筛选的国家的子查询吗？

不是这样的。像您的查询一样，第一步是从FROM子句中的所有表构造器中似乎构建一个工作表。然后才会评估WHERE子句。

想象一下，如果sub1和sub2都是基本表而不是子选择，您将如何执行此操作。它们都有两列，并且每个country_id都有一行。如果您想要JOIN所有行，可以编写以下代码。

from
  "country" c
  left join sub1 on (c.country_id=sub1.country_id)
  left join sub2 on (c.country_id=sub2.country_id)

但是如果你想要在单个行上进行JOIN操作，你需要编写类似于以下内容的代码。

from
  "country" c
  left join (select * from sub1 where country_id = ?)
    on (c.country_id=sub1.country_id)
  left join (select * from sub2 where country_id = ?)
    on (c.country_id=sub2.country_id)

Joe Celko曾参与开发早期的SQL标准，并经常在Usenet上写关于SQL计算顺序的问题。