MySQL数据汇总技巧 - 是否有一个UNGROUP BY？

Question

MySQL数据汇总技巧 - 是否有一个UNGROUP BY？

4

我想进行一些数据报告，但我很少进行 SQL 技巧，不知道我在寻找什么功能。我会称其为“取消分组”。我有一个 SELECT 输出了一系列月度订阅项目的内容：它们何时创建、关闭以及每月支出金额：

+----+---------------------------------+------------+------------+------------+
| id | description                     | created_on | closed_on  | monthly    |
+----+---------------------------------+------------+------------+------------+
|  3 | Daily horoscope email           | 2012-01-01 | null       | 10000.0000 |
|  5 | Pet food delivery               | 2012-01-05 | null       |  3500.0000 |
|  6 | Dirty magazine subscription     | 2012-01-09 | null       |  1500.0000 |
|  7 | Stupid nuts posted in a box     | 2012-01-01 | 2012-01-04 |  1500.0000 |
  .... etc ...

我想要做的是每天计算“运行速率”。因此，每天都会列出当前月度承诺的累计总数。例如，上述数据将映射到：

+------------+----------+
| date       | run_rate |
+------------+----------+
| 2012-01-01 | 11500    |
| 2012-01-02 | 11500    |
| 2012-01-03 | 11500    |
| 2012-01-04 | 10000    |
| 2012-01-05 | 13500    |
| 2012-01-06 | 13500    |
| 2012-01-07 | 13500    |
| 2012-01-08 | 13500    |
| 2012-01-09 | 15000    |

我认为可能的方法是创建一个临时表，每天一行，然后编写一个LEFT JOIN / GROUP BY语句引用第一个表来构建输出。但我只能想到如何通过这种方式创建逐日的“差异”，而不是运行总数，我需要将第一个表“取消分组”成两个条目，一个是订阅创建时的正数条目，一个是关闭时的负数条目。

我想坚持使用MySQL，并且如果可能的话，在一个超级语句中完成。如果不可能，我可以在我的查询框架中添加一些存储过程或临时表。或者我真的需要通过Ruby处理数据吗？（我知道确切的方法，但希望我可以将所有逻辑保存在一个地方，而且我正在试图改进我们目前使用ActiveRecord的缓慢计算。）

- Matthew Bloch

3个回答

1

试试这个：

select date,sum(monthly)
from     
(
   select created_on as date from yourtable 
   union 
   select closed_on from yourtable where closed_on is not null
) as alldates
left outer join yourtable
  on date >= created_on
 and (closed_on is null or date < closed_on)
where date between '2012-1-1' and '2012-1-31'
group by date order by 1

根据您提供的示例数据，输出结果为：

+------------+--------------+
| date       | sum(monthly) |
+------------+--------------+
| 2012-01-01 |     11500.00 |
| 2012-01-04 |     10000.00 |
| 2012-01-05 |     13500.00 |
| 2012-01-09 |     15000.00 |
+------------+--------------+
4 rows in set (0.00 sec)

我们可以推断出当天日期不存在时，与之最接近的日期相等。例如，'2012-01-02' 的运行速率等于 '2012-01-01' 的运行速率。

假设您已经有一个包含本月所有日期的表格，我们称之为“mydate”，其中有一列为“date”。

mysql> select * from mydate where date >= '2012-1-1' and date <= '2012-1-31';
+------------+
| date       |
+------------+
| 2012-01-01 |
| 2012-01-02 |
| 2012-01-03 |
| 2012-01-04 |
| 2012-01-05 |
| 2012-01-06 |
| 2012-01-07 |
...

然后替换

(
select created_on as date from yourtable 
union 
select closed_on from yourtable where closed_on is not null
) as alldates

使用 mydate

完成！

- carl

巧妙地利用现有表格生成日期列表，但我相信两个日期列的并集不会涵盖每一个可能的日期。当然，它将涵盖发生了某些事情的每一天，但是图形层需要做更多的工作，而我正试图避免这种情况！ - Matthew Bloch

1

这里有另一种方法可以使用：INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY，它将创建一个日期，而无需创建临时表。 Explan plan 可以确认哪种方法最适合您。

SQLFIDDLE DEMO

SET @rrate:=0;

SELECT X.rdate, (@rrate:=@rrate +
COALESCE(Y.summonthly,0) -
COALESCE(Z.summonthly,0)) as run_rate
FROM(
      SELECT date_add(P.createdon, interval `day` day)
      as rdate
      FROM 
          (SELECT @i:= @i + 1 AS `day`
           FROM   
           INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY,
           (SELECT @i:= -1) AS i
           ) As D,
      rategroups P
      GROUP BY rdate
      HAVING rdate <= (SELECT MAX(createdon) FROM rategroups)
      ORDER BY rdate) X
LEFT JOIN 
      (SELECT createdon, sum(monthly) summonthly
       FROM rategroups
       GROUP BY createdon) Y
ON X.rdate = Y.createdon
LEFT JOIN 
      (SELECT closed_on, sum(monthly) summonthly
       FROM rategroups
       GROUP BY closed_on) Z
ON X.rdate = Z.closed_on
GROUP BY X.rdate
;

|                          RDATE | RUN_RATE |
---------------------------------------------
| January, 01 2012 00:00:00+0000 |    11500 |
| January, 02 2012 00:00:00+0000 |    11500 |
| January, 03 2012 00:00:00+0000 |    11500 |
| January, 04 2012 00:00:00+0000 |    10000 |
| January, 05 2012 00:00:00+0000 |    13500 |
| January, 06 2012 00:00:00+0000 |    13500 |
| January, 07 2012 00:00:00+0000 |    13500 |
| January, 08 2012 00:00:00+0000 |    13500 |
| January, 09 2012 00:00:00+0000 |    15000 |

- bonCodigo

@MatthewBloch 在这个方法中，您不必创建临时表。它是在内部创建的，您可以提供您的表中的最大日期作为结束日期，以完成日期之间的计算。 :) 请在尝试后发表评论，并不要忘记查看“解释计划”。 - bonCodigo

我喜欢那种生成日期表的方法，尽管INFORMATION_SCHEMA中的表只有130行。但你说得对，只要我能找到任何行数符合要求的表格，我就可以用序列生成另一个表格。这正是我卡住的难点。我会再仔细检查一下它是否有效，并看看速度有多快。 - Matthew Bloch

哦，我相当喜欢使用set @i=0; select date_add(now(), interval (select @i := @i+1) day) from big_table limit 100;来生成我的日期序列。如果它会很糟糕，那么至少它可以很短 :) - Matthew Bloch

最重要的是，对于一个较大的表，速度有多快 =）我很高兴你解决了它。顺便说一下，检查一下这个 rows INFORMATION_SCHEMA.collation_character_set_applicability can generate -> 197 :D - bonCodigo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sgeddes · Accepted Answer

尝试像这样做--应该会产生您想要的结果：

SET @runtot:=0;
SELECT
   mydates.seeddate,
    (@runtot := @runtot + IFNULL(m.amt,0) - IFNULL(m2.amt,0)) AS rt
FROM
   mydates left join 
    (Select createdon, SUM(monthly) amt
     FROM mytable 
     group by createdon
     ) m on mydates.seeddate = m.createdon
left join 
    (Select closed_on, SUM(monthly) amt
     FROM mytable 
     group by closed_on  
     ) m2 on mydates.seeddate = m2.closed_on

这里是SQL Fiddle链接。

祝你好运。