在PostgreSQL中计算累加总和

Question

在PostgreSQL中计算累加总和

sqlpostgresqlwindow-functionsanalytic-functionscumulative-sum

154

我希望找到字段的累积或运行总数，并将其从暂存区插入表中。我的暂存结构大致如下：

ea_month    id       amount    ea_year    circle_id
April       92570    1000      2014        1
April       92571    3000      2014        2
April       92572    2000      2014        3
March       92573    3000      2014        1
March       92574    2500      2014        2
March       92575    3750      2014        3
February    92576    2000      2014        1
February    92577    2500      2014        2
February    92578    1450      2014        3

我希望我的目标表格看起来类似于这样：

ea_month    id       amount    ea_year    circle_id    cum_amt
February    92576    1000      2014        1           1000 
March       92573    3000      2014        1           4000
April       92570    2000      2014        1           6000
February    92577    3000      2014        2           3000
March       92574    2500      2014        2           5500
April       92571    3750      2014        2           9250
February    92578    2000      2014        3           2000
March       92575    2500      2014        3           4500
April       92572    1450      2014        3           5950

我真的很困惑如何实现这个结果。我想使用PostgreSQL实现这个结果。

有人可以建议如何获得这个结果集吗？

- Yousuf Sultan

2

你如何在目标表中获取cum_amount为1000？对于circle_id，金额似乎是2000。 - user1724295

1

@user1724295 正在按 ea_year、circle_id 和 ea_month 进行分组。然后想要获取 cum_amt。 - Rahmat Ali

我认为目标表中的金额列和累计金额也是错误的。例如，id 92576 的金额应该是2000而不是1000。如果我是正确的，请编辑此内容 - 它很令人困惑。 - Tomasz Brzezina

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Erwin Brandstetter · Accepted Answer

基本上，你需要一个窗口函数。这是现在的标准功能。除了真正的窗口函数外，在Postgres中你可以通过添加一个OVER子句来将任何聚合函数作为窗口函数使用。

这里的特殊困难在于正确设置分区和排序顺序：

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id
                         ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, ea_year, ea_month;

而且不需要GROUP BY。

每行的总和是从分区中的第一行到当前行计算的——引用手册的话：

默认的框架选项是RANGE UNBOUNDED PRECEDING，它与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW相同。使用ORDER BY，这将把范围设置为从分区开始到当前行的最后一个ORDER BY对等行的所有行。

粗体强调是我的。这就是您想要的累加（或“运行”）总和。

在默认的RANGE模式下，在排序顺序中排名相同的行是“对等行”-在此查询中为相同的(circle_id, ea_year, ea_month)。所有这些都会显示与所有对等行相加的相同的运行总和。但我假设您的表在(circle_id, ea_year, ea_month)上是UNIQUE，则排序顺序是确定的，没有行具有对等行。（并且您可能会使用更便宜的ROWS模式。）

Postgres 11添加了工具，可以使用新的frame_exclusion选项来包括/排除对等行。见：

Aggregating all values not in the same group

现在，ORDER BY ... ea_month 不能与月份名称的字符串一起使用。Postgres将根据语言环境设置按字母顺序排序。

如果您的表中存储有实际的date值，则可以正确排序。如果没有，则建议将ea_year和ea_month替换为表中类型为date的单个列the_date。

使用to_date()转换您所拥有的内容：

  to_date(ea_year || ea_month , 'YYYYMonth') AS the_date

如果要显示原始字符串，可以使用to_char()函数：

  to_char(the_date, 'Month') AS ea_month
  to_char(the_date, 'YYYY')  AS ea_year

虽然设计不太理想，但这个方法可以运作：

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id ORDER BY the_date) AS cum_amt
FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS the_date FROM tbl) sub
ORDER  BY circle_id, mon;