在PostgreSQL中查找日期范围的交集

Question

在PostgreSQL中查找日期范围的交集

sqlpostgresqlrelational-databasewindow-functionsgaps-and-islands

5

我有一些记录，其中包含两个日期 check_in 和 check_out，我想知道在同一时间有多少人签到。

例如，如果我有以下签到/签出时间：

人员A： 1PM - 6PM
人员B： 3PM - 10PM
人员C： 9PM - 11PM

我想要得到 3PM - 6PM（人员A和人员B的重叠）和 9PM - 10PM（人员B和人员C的重叠）。

我可以编写一个算法来在代码中以线性时间完成此操作，那么能否使用关系查询在 PostgreSQL 中以线性时间完成此操作呢？

它需要具有最小响应，意味着不应该返回重叠的范围。例如，如果结果给出了范围为 6PM - 9PM 和 8PM - 10PM，则是不正确的。它应该返回 6PM - 10pm。

- Josh Horowitz

1

请提供您的Postgres版本、完整的表定义（包括所有约束条件或在psql中使用\d tbl命令获取的信息）以及一些样本数据。谢谢。 - Erwin Brandstetter

是的，版本会帮助我们回答，最近的发布版本增加了新的日期范围功能，可能适用。 - Jasen

我想这个解决方案将涉及窗口函数和可能的递归CTE。 - Jasen

2个回答

1

想法是将时间分成不同的时间段，并将它们作为具有指定粒度的位值保存。

0 - 在一个时间段内没有人签到
1 - 在一个时间段内有人签到

假设粒度为1小时，时间段为1天。

000000000000000000000000 表示这一天没有人签到
000000000000000000000110 表示在21点至23点之间有人签到
000000000000011111000000 表示在13点至18点之间有人签到
000000000000000111111100 表示在15点至22点之间有人签到

然后我们对范围内每个值进行二进制OR运算，就得到了我们的答案。

000000000000011111111110

这可以在线性时间内完成。以下是来自Oracle的示例，但很容易转换为PostgreSQL。

with rec (checkin, checkout)
as ( select 13, 18 from dual 
   union all 
    select 15, 22 from dual 
   union all 
    select 21, 23 from dual )
,spanempty ( empt)
 as ( select '000000000000000000000000' from dual) ,
 spanfull( full)
 as ( select '111111111111111111111111' from dual)
, bookingbin( binbook) as ( select  substr(empt, 1, checkin) || 
        substr(full, checkin, checkout-checkin) || 
        substr(empt, checkout, 24-checkout) 
 from rec 
 cross join spanempty
 cross join spanfull ),
 bookingInt (rn, intbook) as 
 ( select rownum, bin2dec(binbook) from bookingbin),
 bitAndSum (bitAndSumm) as (
 select sum(bitand(b1.intbook, b2.intbook)) from bookingInt b1 
 join bookingInt b2 
 on b1.rn = b2.rn -1 ) ,
 SumAll (sumall) as (
 select sum(bin2dec(binbook)) from bookingBin  )
select lpad(dec2bin(sumall - bitAndSumm), 24, '0')
from SumAll, bitAndSum

结果：

000000000000011111111110

- dcieslak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Erwin Brandstetter · Accepted Answer

假设

该解决方案在很大程度上取决于准确的表定义，包括所有约束条件。由于问题中缺乏信息，我将假设该表：

CREATE TABLE booking (
  booking_id serial PRIMARY KEY
, check_in   timestamptz NOT NULL
, check_out  timestamptz NOT NULL
, CONSTRAINT valid_range CHECK (check_out > check_in)
);

因此，没有NULL值，只有包含下限和不包含上限的有效范围，我们并不在乎谁进行了检查。

同时假设使用至少9.2版本的Postgres。

查询

使用仅SQL的一种方法是使用UNION ALL和窗口函数：

SELECT ts AS check_id, next_ts As check_out
FROM  (
   SELECT *, lead(ts) OVER (ORDER BY ts) AS next_ts
   FROM  (
      SELECT *, lag(people_ct, 1 , 0) OVER (ORDER BY ts) AS prev_ct
      FROM  (
         SELECT ts, sum(sum(change)) OVER (ORDER BY ts)::int AS people_ct
         FROM  (
            SELECT check_in AS ts, 1 AS change FROM booking
            UNION ALL
            SELECT check_out, -1 FROM booking
            ) sub1
         GROUP  BY 1
         ) sub2
      ) sub3
   WHERE  people_ct > 1 AND prev_ct < 2 OR  -- start overlap
          people_ct < 2 AND prev_ct > 1     -- end overlap
   ) sub4
WHERE  people_ct > 1 AND prev_ct < 2;

SQL Fiddle.

Explanation

In subquery sub1 derive a table of check_in and check_out in one column. check_in adds one to the crowd, check_out subtracts one.
In sub2 sum all events for the same point in time and compute a running count with a window function: that's the window function sum() over an aggregate sum() - and cast to integer or we get numeric from this:
```
   sum(sum(change)) OVER (ORDER BY ts)::int
```
In sub3 look at the count of the previous row
In sub4 only keep rows where overlapping time ranges start and end, and pull the end of the time range into the same row with lead().
Finally, only keep rows, where time ranges start.

为了优化性能，我会像在dba.SE上相关回答所演示的那样，在plpgsql函数中仅遍历一次表：

在PostgreSQL / SSRS中计算重叠时间差异