MySQL似乎无法优化带有GROUP BY子查询的选择,导致执行时间长。对于这种常见情况,必须有已知的优化方法。
假设我们试图从数据库返回所有订单,并指示它是否为客户的第一笔订单。
然而,一旦我们使用子查询将其与完整的订单集合连接,它就变得非常缓慢。
假设我们试图从数据库返回所有订单,并指示它是否为客户的第一笔订单。
CREATE TABLE orders (order int, customer int, date date);
通过客户检索第一笔订单非常快速。
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
然而,一旦我们使用子查询将其与完整的订单集合连接,它就变得非常缓慢。
SELECT order, first_order FROM orders LEFT JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders ON orders.order=first_orders.first_order;
我希望有一个简单的技巧我们没有发现,否则这将会是快1000倍的速度来完成
CREATE TEMPORARY TABLE tmp_first_order AS
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
CREATE INDEX tmp_boost ON tmp_first_order (first_order)
SELECT order, first_order FROM orders LEFT JOIN tmp_first_order
ON orders.order=tmp_first_order.first_order;
编辑:
受@ruakh提出的第三个选项的启发,确实有一种不太丑陋的解决方法,使用INNER JOIN
和UNION
,具有可接受的性能,但不需要临时表。然而,它有点特定于我们的情况,我想知道是否存在更通用的优化。
SELECT order, "YES" as first FROM orders INNER JOIN (
SELECT min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_1 ON orders.order=first_orders_1.first_order
UNION
SELECT order, "NO" as first FROM orders INNER JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_2 ON first_orders_2.customer = orders.customer
AND orders.order > first_orders_2.first_order;