MySQL查询优化 - 内部查询

5

这是整个查询的示例代码:

SELECT s.*, (SELECT url FROM show_medias WHERE show_id = s.id AND is_primary = 1) AS media_url
FROM (shows As s)
WHERE `s`.`id` IN (
 SELECT DISTINCT st.show_id
 FROM show_time_schedules AS sts
 LEFT JOIN show_times AS st ON st.id = sts.show_time_id
 WHERE sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date)
 )
AND `s`.`is_active` = 1
ORDER BY s.name asc 

如果……
SELECT url FROM show_medias WHERE show_id = s.id AND is_primary = 1
(0.0004 sec)

并且...

 SELECT DISTINCT st.show_id
 FROM show_time_schedules AS sts
 LEFT JOIN show_times AS st ON st.id = sts.show_time_id
 WHERE sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date)
(0.0061 sec)

有没有明显的原因导致我的电脑运行速度变慢?

SELECT s.*, (inner query 1) AS media_url
FROM (shows As s)
WHERE `s`.`id` IN ( inner query 2 )
AND `s`.`is_active` = 1
ORDER BY s.name asc

需要花费 5.7245 秒 吗?

详细解释

id  select_type         table       type    possible_keys   key     key_len ref                     rows    filtered    Extra
1   PRIMARY             s           ALL     NULL            NULL    NULL    NULL                    151     100.00      Using where; Using filesort
3   DEPENDENT SUBQUERY  sts         ALL     NULL            NULL    NULL    NULL                    26290   100.00      Using where; Using temporary
3   DEPENDENT SUBQUERY  st          eq_ref  PRIMARY         PRIMARY 4       bvcdb.sts.show_time_id  1       100.00      Using where
2   DEPENDENT SUBQUERY  show_medias ALL     NULL            NULL    NULL    NULL                    159     100.00      Using where

你有没有考虑过通过(INNER, LEFT取决于你是否只想显示带有URL的节目)在show_medias上进行JOIN,而不是在选择列表中进行子查询来提高性能?我很想看看这个。查询的EXPLAIN中有什么信息吗? - dash
@dash非常感谢你迄今为止的帮助,我添加了EXPLAIN EXTENDED,并且你建议的查询产生了几乎相同,如果不是稍微长一点的6.x秒性能。 - jondavidjohn
你知道你的表上有哪些索引吗?特别是,在shows、show_time_schedules、show_times和show_medias中,是否有任何id列在你的查询中被索引了? - dash
有趣的是,去掉 ORDER BY 会加快查询速度吗? - dash
@dash 天哪...谢谢你提到了索引...这是我在学习将外键字段与主键一起索引之前做的一个旧项目...应用了外键索引后,查询时间只有0.0064秒...将此添加到你的答案中,我们就可以结束了...谢谢! - jondavidjohn
2个回答

3

您可以随时使用EXPLAIN或EXPLAIN EXTENDED来查看MySql查询的执行情况。

您还可以稍微改变一下查询方式,尝试以下方法:

SELECT        s.*, 
              sm.url AS media_url 
FROM          shows AS s
INNER JOIN    show_medias AS sm ON s.id = SM.show_id
WHERE `s`.`id` IN ( 
                        SELECT DISTINCT st.show_id 
                        FROM show_time_schedules AS sts 
                        LEFT JOIN show_times AS st ON st.id = sts.show_time_id 
                        WHERE sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date) 
                        ) 
AND            `s`.`is_active` = 1 
AND            sm.is_primary = 1
ORDER BY       s.name asc 

很有趣的是看看这样做的效果如何。我认为它会更快,因为目前我认为MySql将对每个节目运行内部查询1(所以一个查询将运行多次。连接应该更有效率)。

如果您想要所有没有show_medias行的节目,请使用LEFT JOIN替换INNER JOIN。

编辑:

我很快会查看您的EXPLAIN EXTENDED,我也想知道您是否要尝试以下操作:它将删除所有子查询:

SELECT        DISTINCT s.*,  
                       sm.url AS media_url  
FROM                   shows AS s 
INNER JOIN             show_medias AS sm ON s.id = SM.show_id
INNER JOIN             show_times AS st ON (s.id = st.show_id)
RIGHT JOIN             show_time_schedules AS sts ON (st.id = sts.show_time_id)

WHERE                  `s`.`is_active` = 1  
AND                    sm.is_primary = 1 
AND                    sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date)  
ORDER BY               s.name asc 

如果您可以在这些查询上使用EXPLAIN EXTENDED,那将是很好的(您可以将其添加到此评论中)。

进一步编辑:

关于您的EXPLAIN EXTENDED(了解如何阅读这些内容的良好起点在此处),USING FILESORT和USING TEMPORARY都是关键指标。希望我推荐的第二个查询应该会删除任何临时表(在子查询中)。然后尝试省略ORDER BY以查看是否有所不同(我们可以将其添加到目前为止的发现中:

我还可以看到该查询可能会错过许多索引查找;所有id列都是主要的索引匹配候选项(具有通常的索引警告)。我还建议您尝试添加这些索引,然后再次运行EXPLAIN EXTENDED,以查看当前的差异(如上述评论已经知道!)


2

以下是CTE解决方案:(抱歉,MySQL没有CTE,但问题太普遍了)

WITH RECURSIVE tree AS (
    SELECT t0.id
        , t0.study_start_time
        , t0.study_end_time
    FROM tab t0
    WHERE NOT EXISTS (SELECT * FROM tab nx
           WHERE nx.id=t0.id 
           AND nx.study_end_time = t0.study_start_time
           )
    UNION
    SELECT tt.id
        ,tt.study_start_time
        ,t1.study_end_time
    FROM tab t1
    JOIN tree tt ON t1.id=tt.id
                AND t1.study_start_time = tt.study_end_time
    )
SELECT * FROM tree
WHERE NOT EXISTS (SELECT * FROM tab nx 
                WHERE nx.id=tree.id
                AND tree.study_end_time = nx.study_start_time
                )
ORDER BY id
    ;

结果:

CREATE TABLE
INSERT 0 15
  id  | study_start_time | study_end_time 
------+------------------+----------------
 1234 |              168 |            480
 2345 |              175 |            233
 2345 |              400 |            425
 4567 |              200 |            225
 4567 |              250 |            289
 4567 |              300 |            310
 4567 |              320 |            340
 4567 |              360 |            390
(8 rows)

添加明显的主键和索引后的查询计划:

DROP TABLE
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "tab_pkey" for table "tab"
CREATE TABLE
CREATE INDEX
INSERT 0 15

                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Merge Anti Join  (cost=16209.59..16292.13 rows=6386 width=12) (actual time=0.189..0.193 rows=8 loops=1)
   Merge Cond: ((tree.id = nx.id) AND (tree.study_end_time = nx.study_start_time))
   CTE tree
     ->  Recursive Union  (cost=0.00..15348.09 rows=8515 width=12) (actual time=0.022..0.136 rows=15 loops=1)
           ->  Merge Anti Join  (cost=0.00..175.04 rows=1455 width=12) (actual time=0.019..0.041 rows=8 loops=1)
                 Merge Cond: ((t0.id = nx.id) AND (t0.study_start_time = nx.study_end_time))
                 ->  Index Scan using tab_pkey on tab t0  (cost=0.00..77.35 rows=1940 width=12) (actual time=0.010..0.018 rows=15 loops=1)
                 ->  Index Scan using sssss on tab nx  (cost=0.00..77.35 rows=1940 width=8) (actual time=0.003..0.008 rows=14 loops=1)
           ->  Merge Join  (cost=1297.04..1500.28 rows=706 width=12) (actual time=0.010..0.012 rows=1 loops=6)
                 Merge Cond: ((t1.id = tt.id) AND (t1.study_start_time = tt.study_end_time))
                 ->  Index Scan using tab_pkey on tab t1  (cost=0.00..77.35 rows=1940 width=12) (actual time=0.001..0.004 rows=9 loops=6)
                 ->  Sort  (cost=1297.04..1333.42 rows=14550 width=12) (actual time=0.006..0.006 rows=2 loops=6)
                       Sort Key: tt.id, tt.study_end_time
                       Sort Method: quicksort  Memory: 25kB
                       ->  WorkTable Scan on tree tt  (cost=0.00..291.00 rows=14550 width=12) (actual time=0.000..0.001 rows=2 loops=6)
   ->  Sort  (cost=726.15..747.44 rows=8515 width=12) (actual time=0.166..0.169 rows=15 loops=1)
         Sort Key: tree.id, tree.study_end_time
         Sort Method: quicksort  Memory: 25kB
         ->  CTE Scan on tree  (cost=0.00..170.30 rows=8515 width=12) (actual time=0.025..0.149 rows=15 loops=1)
   ->  Sort  (cost=135.34..140.19 rows=1940 width=8) (actual time=0.018..0.018 rows=15 loops=1)
         Sort Key: nx.id, nx.study_start_time
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on tab nx  (cost=0.00..29.40 rows=1940 width=8) (actual time=0.003..0.004 rows=15 loops=1)
 Total runtime: 0.454 ms
(24 rows)

MySQL支持这个吗?如果支持的话,那就太棒了!不过,我想你可能看错了问题;-) - dash
不好意思,它没有。我没有看到mysql标签,或者它是后来添加的。CTE是追踪链接列表的好方法(这就是为什么我重新标记为Islands-and-gaps;IAG基本上是链接列表,所以CTE对我来说就像膝反射一样)。我会添加一个查询计划,只是为了好玩... - wildplasser
因此,这就是我的回答;-) 但是,我很想知道这个问题属于哪个范畴,因为那是一个非常优秀、构思精巧且有充分支持的答案。 - dash
我认为其他过于复杂的贡献让我感到困惑... 在这里:http://stackoverflow.com/questions/8776944/reducing-table-to-avoid-space-inefficiency/8780227#8780227 - wildplasser

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接