MySQL查询速度太慢

3

我正在尝试查询一些趋势统计数据,但是基准测试非常慢。查询执行时间大约为134秒。

我有一个名为table_1的MySQL表。

以下是创建语句:

CREATE TABLE `table_1` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `original_id` bigint(11) DEFAULT NULL,
  `invoice_num` bigint(11) DEFAULT NULL,
  `registration` timestamp NULL DEFAULT NULL,
  `paid_amount` decimal(10,6) DEFAULT NULL,
  `cost_amount` decimal(10,6) DEFAULT NULL,
  `profit_amount` decimal(10,6) DEFAULT NULL,
  `net_amount` decimal(10,6) DEFAULT NULL,
  `customer_id` bigint(11) DEFAULT NULL,
  `recipient_id` text,
  `cashier_name` text,
  `sales_type` text,
  `sales_status` text,
  `sales_location` text,
  `invoice_duration` text,
  `store_id` double DEFAULT NULL,
  `is_cash` int(11) DEFAULT NULL,
  `is_card` int(11) DEFAULT NULL,
  `brandid` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_registration_compound` (`id`,`registration`)
) ENGINE=InnoDB AUTO_INCREMENT=47420958 DEFAULT CHARSET=latin1;

我设置了一个由id+registration组成的复合索引

以下是查询语句

SELECT 

store_id,
            CONCAT('[',GROUP_CONCAT(tot SEPARATOR ','),']') timeline_transactions,
            SUM(tot) AS total_transactions,
            CONCAT('[',GROUP_CONCAT(totalRevenues SEPARATOR ','),']') timeline_revenues,
            SUM(totalRevenues) AS revenues,
            CONCAT('[',GROUP_CONCAT(totalProfit SEPARATOR ','),']') timeline_profit,
            SUM(totalProfit) AS profit,
            CONCAT('[',GROUP_CONCAT(totalCost SEPARATOR ','),']') timeline_costs,
            SUM(totalCost) AS costs



 FROM (select t1.md,
COALESCE(SUM(t1.amount+t2.revenues), 0) AS totalRevenues,
COALESCE(SUM(t1.amount+t2.profit), 0) AS totalProfit,
COALESCE(SUM(t1.amount+t2.costs), 0) AS totalCost,
COALESCE(SUM(t1.amount+t2.tot), 0) AS tot,
t1.store_id

from
(
 SELECT a.store_id,b.md,b.amount from ( SELECT DISTINCT store_id FROM  table_1) AS a
  CROSS JOIN 
 (
 SELECT
  DATE_FORMAT(a.DATE, "%m") as md,
  '0' as  amount
  from (
    select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) month as Date
    from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
    cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
    cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
  ) a
  where a.Date >='2019-01-01' and a.Date <= '2019-01-14'
  group by md) AS b 
)t1
left join
(
  SELECT
                COUNT(epl.invoice_num) AS tot,
                SUM(paid_amount) AS revenues,
                SUM(profit_amount) AS profit,
                SUM(cost_amount) AS costs,
                store_id,
                date_format(epl.registration, '%m') md
                FROM table_1 epl
                GROUP BY store_id, date_format(epl.registration, '%m')
)t2
ON   t2.md=t1.md AND t2.store_id=t1.store_id
group BY t1.md, t1.store_id) AS t3 GROUP BY store_id  ORDER BY total_transactions desc

以下是EXPLAIN的内容

图片描述 图片描述

也许我应该将registration列中的时间戳更改为日期时间


2
出于好奇,您的系统中这个表目前有多少行?猜测大约有4800万行? - Chris Forrence
@ChrisForrence 是的,大约有4800万行。 - UgoL
你应该检查一下MySQL的性能设置。
  • innodb_buffer_pool_size是用于缓存表、索引和其他一些东西的内存量。
  • 你可以在MySQL中配置多个innodb_buffer_pool_instances来增加读/写线程。
尝试从查询中删除order by或设置适当的索引。更多详情请参考以下网站:
  1. https://www.percona.com/blog/2014/01/28/10-mysql-performance-tuning-settings-after-installation/
  2. https://www.saotn.org/mysql-innodb-performance-improvement/
- Priyesh
2个回答

4

您的执行时间的大约90%将用于执行GROUP BY store_id, date_format(epl.registration, '%m')

不幸的是,您无法使用索引按派生值进行group by,由于这对您的报告非常重要,因此您需要预先计算。您可以通过将该值添加到表中来实现,例如使用生成的列:

alter table table_1 add md varchar(2) as (date_format(registration, '%m')) stored

我保留了您在此处使用的varchar格式,您也可以使用数字(例如tinyint)表示月份。

这需要MySQL 5.7支持,否则可以使用触发器来实现相同的功能:

alter table table_1 add md varchar(2) null;
create trigger tri_table_1 before insert on table_1
for each row set new.md = date_format(new.registration,'%m');
create trigger tru_table_1 before update on table_1
for each row set new.md = date_format(new.registration,'%m');

然后添加一个索引,最好是覆盖索引,从 store_idmd 开始,例如:

create index idx_table_1_storeid_md on table_1 
   (store_id, md, invoice_num, paid_amount, profit_amount, cost_amount)

如果您有其他类似的报告,您可能需要检查它们是否使用了额外的列并且可以从覆盖更多列中受益。索引将需要约1.5GB的存储空间(除非缓存,否则驱动器读取1.5GB的时间基本上将单独定义执行时间)。

然后将查询更改为按此新的索引列分组,例如:

      ...
            SUM(cost_amount) AS costs,
            store_id,
            md -- instead of date_format(epl.registration, '%m') md
            FROM table_1 epl
            GROUP BY store_id, md -- instead of date_format(epl.registration, '%m')
)t2   ...

这个索引也可以处理其他9%的执行时间,SELECT DISTINCT store_id FROM table_1 ,它将从以 store_id 开头的索引中受益。

现在99%的查询已经处理完毕,以下是一些进一步的说明:

  • the subquery b and your date range where a.Date >='2019-01-01' and a.Date <= '2019-01-14' might not do what you think it does. You should run the part SELECT DATE_FORMAT(a.DATE, "%m") as md, ... group by md separately to see what it does. In its current state, it will give you one row with the tuple '01', 0, representing "january", so it is basically a complicated way of doing select '01', 0. Unless today is the 15th or later, then it returns nothing (which is probably unintended).

  • Particularly, it will not limit the invoice dates to that specific range, but to all invoices that are from (the whole) january of any year. If that is what you intended, you should (additionally) add that filter directly, e.g. by using FROM table_1 epl where epl.md = '01' GROUP BY ..., reducing your execution time by an additional factor of about 12. So (apart from the 15th and up-problem), with your current range you should get the same result if you use

      ...
            SUM(cost_amount) AS costs,
            store_id,
            md 
            FROM table_1 epl
            WHERE md = '01'
            GROUP BY store_id, md 
    )t2   ...
    

    For different date ranges you will have to adjust that term. And to emphasize my point, this is significantly different from filtering invoices by their date, e.g.

      ...
            SUM(cost_amount) AS costs,
            store_id,
            md 
            FROM table_1 epl
            WHERE epl.registration >='2019-01-01' 
               and epl.registration <= '2019-01-14'
            GROUP BY store_id, md 
    )t2   ...
    

    which you may (or may not) have tried to do. You would need a different index in that case though (and it would be a slightly different question).

  • there might be some additional optimizations, simplifications or beautifications in the rest of your query, e.g group BY t1.md, t1.store_id looks redundant and/or wrong (indicating you are actually not on MySQL 5.7), and the b-subquery can only give you values 1 to 12, so generating 1000 dates and reducing them again could be simplified. But since they are operating on 100-ish rows, they will not affect execution time significantly, and I haven't checked those in detail. Some of it is probably due to getting the right output format or to generalizations (although, if you are dynamically grouping by other formats than by month, you need other indexes/columns, but that would be a different question).

另一种预先计算值的替代方法是使用汇总表。例如,您可以每天运行内部查询(昂贵的 group by),并将结果存储在一个表中,然后重复使用它(通过从该表中选择而不执行 group by)。对于像发票这样永远不会更改的数据来说,这尤其可行(尽管否则可以使用触发器来保持汇总表更新)。如果有多种情况,例如,如果您的用户可以决定按工作日、年份、月份或星座进行分组,则此方法也更为可行,否则您需要为每个情况添加索引。但如果您需要动态限制您的发票范围(例如 2019-01-01 ... 2019-01-14),则此方法变得不太可行。如果您需要在报告中包括当前日期,仍然可以预先计算,然后从表中添加当前日期的值(这应该只涉及非常有限数量的行,如果您具有以日期列开头的索引,则速度很快)或使用触发器即时更新您的汇总表。

嗨@Solarflare,这是MySQL 5.6.10。在这种情况下,我应该如何添加触发器?您能展示考虑所有建议后的查询最终输出吗? - UgoL
你可以使用原始查询,只需要替换分组的列(我忘了提到这一点,所以现在将代码的这部分明确添加了)。备注只是注释。特别是如果你当前的查询返回了正确的结果,仅仅是速度慢而已,那么你可以忽略其中大部分(虽然第一个备注可以为你节省时间,如果你确实打算仅显示一月份的数据);但是仅仅通过使用md而不是date_format(...来进行分组,应该已经可以显著降低执行时间(如果你有SSD,则约为5-10秒,如果你有HDD,则可能达到30-40秒)。 - Solarflare
@UgoL 我对日期范围部分进行了详细说明,因为它可能不是您想要的(您可能从一个“填补缺失日期”的问题中复制了该代码),因此请检查您当前的查询是否实际上给出了您预期的结果(这是优化的前提条件)。 - Solarflare
感谢@Solarflare提供的所有建议。我尝试使用ALTER表并设置md字段,但似乎一直在运行而没有结束。 - UgoL

0
  • 使用 PRIMARY KEY(id),拥有 INDEX(id, anything) 实际上是没有用的。

  • 看看能否避免嵌套子查询。

  • 考虑永久构建“日期”表,并在其上有一个 PRIMARY KEY(md)。目前,两个子查询都没有在连接列(md)上建立索引。

  • 您可能患有“爆炸-合并”综合症。这是指 JOINs 扩展了行数,只为让 GROUP BY 折叠它们。

  • 除非需要检查 xx 是否为 NULL,否则不要使用 COUNT(xx)。直接使用 COUNT(*)

  • store_id double -- 真的吗?

  • TIMESTAMPDATETIME -- 它们的性能大致相同;不必改变。

  • 由于您只查看 2019-01,因此可以去掉

    date_format(epl.registration, '%m')
    

    仅这一步就可以大大加快速度。(但是,您会失去通用性。)


你有什么建议可以避免使用子查询吗?你确定在性能方面使用count(*)是个好主意吗? - UgoL
@ugol - COUNT(*) 相当于 COUNT(1),并且比 COUNT(x) 更快。不要被 * 所欺骗,在这种情况下它并不意味着“所有列”。 - Rick James

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接