优化MySQL中嵌套SELECT查询的方法

5

好的,这是我正在运行的一个查询,它在一个拥有45,000条记录且大小为65MB的表上运行...并且会变得越来越大(因此我必须考虑未来的性能问题):

SELECT count(payment_id) as signup_count, sum(amount) as signup_amount
FROM payments p
WHERE tm_completed BETWEEN '2009-05-01' AND '2009-05-30'
AND completed > 0
AND tm_completed IS NOT NULL
AND member_id NOT IN (SELECT p2.member_id FROM payments p2 WHERE p2.completed=1 AND p2.tm_completed < '2009-05-01' AND p2.tm_completed IS NOT NULL GROUP BY p2.member_id)

而你可能或可能不想象 - 它会使mysql服务器陷入停顿...它所做的是 - 它简单地提取已完成至少一次付款、tm_completed不为空(因为它仅为已完成的付款填充),并且(嵌入式Select)该成员以前从未完成过“已完成”的付款的新成员数量(这意味着他是一个新成员,因为系统进行了重新计费等操作,这是区分刚刚重新计费的现有成员和第一次收费的新成员的唯一方法)。现在,有没有可能优化此查询以使用更少的资源或其他方法,并停止将我的mysql资源拖到膝盖上...?我是否遗漏了任何信息来进一步澄清这个问题?让我知道...编辑:以下是该表中已有的索引:PRIMARY PRIMARY 46757 payment_id、member_id INDEX 23378 member_id、payer_id INDEX 11689 payer_id、coupon_id INDEX 1 coupon_id、tm_added INDEX 46757 tm_added、product_id、tm_completed INDEX 46757 tm_completed、product_id

您在搜索参数使用的字段上是否有索引? - James
3个回答

7

在MySQL中,那种IN子查询速度有点慢。我会重新表述如下:

SELECT COUNT(1) AS signup_count, SUM(amount) AS signup_amount
FROM   payments p
WHERE  tm_completed BETWEEN '2009-05-01' AND '2009-05-30'
AND    completed > 0
AND    NOT EXISTS (
           SELECT member_id
           FROM   payments
           WHERE  member_id = p.member_id
           AND    completed = 1
           AND    tm_completed < '2009-05-01');

您的BETWEEN条件已经包含了'tm_completed IS NOT NULL'的判断,因此此处的检查是不必要的。

另外,请确保以下字段已经创建索引:

(tm_completed, completed)

抢先一步了,速度加1。 - Todd Gardner
哇...我不知道只需要稍微更改一下就可以了,只要将"IN"替换为"EXISTS"就行了...谢谢你! - Crazy Serb

7
我很享受设计这个解决方案,它不需要子查询:
SELECT count(p1.payment_id) as signup_count, 
       sum(p1.amount)       as signup_amount  

  FROM payments p1
       LEFT JOIN payments p2 
       ON p1.member_id = p2.member_id
   AND p2.completed = 1
   AND p2.tm_completed < date '2009-05-01'

 WHERE p1.completed > 0
   AND p1.tm_completed between date '2009-05-01' and date '2009-05-30'
   AND p2.member_id IS NULL;

1
这种技术非常可靠,特别是在MySQL中(历史上它在子查询方面存在问题)。 - dkretz
我也喜欢这个答案...显然,当在我选择的这两个答案上运行EXPLAIN时,我得到了相同的性能/资源使用情况(比使用“IN”子查询快约12,000倍)。太棒了!谢谢... - Crazy Serb

2
避免使用带有子查询的IN语句;MySQL对此并不进行很好的优化(尽管在5.4和6.0中有关于这个问题的待定优化,详情见此处)。将其改写为join操作可能会提高性能:
SELECT count(payment_id) as signup_count, sum(amount) as signup_amount
FROM payments p
LEFT JOIN (SELECT p2.member_id
          FROM payments p2
          WHERE p2.completed=1
          AND p2.tm_completed < '2009-05-01'
          AND p2.tm_completed IS NOT NULL
          GROUP BY p2.member_id) foo
ON p.member_id = foo.member_id AND foo.member_id IS NULL
WHERE tm_completed BETWEEN '2009-05-01' AND '2009-05-30'
AND completed > 0
AND tm_completed IS NOT NULL

第二点,我需要查看您的表模式;您是否使用索引?

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接