选择日期范围,其中时间段不重叠。

7

我有两个表格,每个表格均包含多个周期的起始日期和结束日期。我希望找到一种高效的方法来查找第一个表格中日期范围内但不在第二个表格范围内的周期(日期范围)。

例如,如果这是我的第一个表格(包含所需日期)

start_date  end_date
2001-01-01  2010-01-01
2012-01-01  2015-01-01

这是我的第二个表格(包含我不需要的日期)

start_date  end_date
2002-01-01  2006-01-01
2003-01-01  2004-01-01
2005-01-01  2009-01-01
2014-01-01  2018-01-01

然后输出的外观如下所示。
start_date  end_date
2001-01-01  2001-12-31
2009-01-02  2010-01-01
2012-01-01  2013-12-31

我们可以安全地假设第一个表中的时间段不重叠,但是不能假设第二个表中的时间段重叠。
我已经有一种方法来实现这个目标,但它比我能接受的慢一个数量级。所以我希望有人能提出更快的方法。
我目前的方法如下:
1. 将表2合并为非重叠时间段 2. 找到表2的反向范围 3. 将表1和反向表2中的重叠时间段合并
如果这些步骤中的一些步骤可以合并在一起,我相信肯定有更快的方法。
更详细地说:
/* (1) merge overlapping preiods */
WITH
spell_starts AS (
    SELECT [start_date], [end_date]
    FROM table_2 s1
    WHERE NOT EXISTS (
        SELECT 1
        FROM table_2 s2
        WHERE s2.[start_date] < s1.[start_date] 
        AND s1.[start_date] <= s2.[end_date]
    )
),
spell_ends AS (
    SELECT [start_date], [end_date]
    FROM table_2 t1
    WHERE NOT EXISTS (
        SELECT 1 
        FROM table_2 t2
        WHERE t2.[start_date] <= t1.[end_date] 
        AND t1.[end_date] < t2.[end_date]
    )
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]

/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
    /* all forward looking spells */
    SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
          ,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
    FROM merge_table_2

    UNION ALL

    /* back looking spell (to 'origin of time') created separately */
    SELECT '1900-01-01' AS [start_date]
          ,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
    FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date] 
AND [end_date] <= '9999-01-01'

/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
      ,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date

3个回答

3
希望这个可以帮助您。我已经注释了我用于解释的两个CTE。 这是您需要的内容:
drop table table1

select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01',  '2015-01-01' 

drop table table2

select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01',  '2004-01-01'
union select '2005-01-01',  '2009-01-01'
union select '2014-01-01',  '2018-01-01'

/***** 解决方案 *****/

-- This cte put all dates into one column
with cte as
(
    select t
    from
    (
        select start_date as t
        from table1
        union all
        select end_date
        from table1

        union all

        select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
        from table2
        union all
        select  dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
        from table2
    )a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a 
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e

您的查询未获取到“2009-01-02 2010-01-01”范围。 - JohnyL
是的,没问题。 - JohnyL

2
如果你想要更好的性能,那么你需要使用窗口函数。
具体思路是:
- 将日期与进出两个表的标志组合起来。 - 使用累加和来确定日期何时开始进出。 - 然后你就会遇到间隔和岛屿问题,需要将结果组合起来。 - 最后,在你想要的特定时间段上进行过滤。
示例代码如下:
with dates as (
      select start_date as dte, 1 as in1, 0 as in2
      from table1
      union all
      select dateadd(day, 1, end_date), -1, 0
      from table1
      union all
      select start_date, 0, 1 as in2
      from table2
      union all
      select dateadd(day, 1, end_date), 0, -1
      from table2
     ),
     d as (
      select dte,
             sum(sum(in1)) over (order by dte) as ins_1,
             sum(sum(in2)) over (order by dte) as ins_2
      from dates
      group by dte
     )
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte, 
             row_number() over (order by dte) as seqnum,
             row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
      from d
     ) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);

这是一个db<>fiddle,点击这里查看。

0
感谢@zip和@Gordon的回答。它们都优于我的初始方法。然而,在我的环境和背景下,以下解决方案比它们都快:
WITH acceptable_starts AS (
    SELECT [start_date] FROM table1 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
    UNION ALL
    SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
),
acceptable_ends AS (
    SELECT [end_date] FROM table1 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
    UNION ALL
    SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接