我有两个表格,每个表格均包含多个周期的起始日期和结束日期。我希望找到一种高效的方法来查找第一个表格中日期范围内但不在第二个表格范围内的周期(日期范围)。
例如,如果这是我的第一个表格(包含所需日期)
start_date end_date
2001-01-01 2010-01-01
2012-01-01 2015-01-01
这是我的第二个表格(包含我不需要的日期)
start_date end_date
2002-01-01 2006-01-01
2003-01-01 2004-01-01
2005-01-01 2009-01-01
2014-01-01 2018-01-01
然后输出的外观如下所示。
start_date end_date
2001-01-01 2001-12-31
2009-01-02 2010-01-01
2012-01-01 2013-12-31
我们可以安全地假设第一个表中的时间段不重叠,但是不能假设第二个表中的时间段重叠。
我已经有一种方法来实现这个目标,但它比我能接受的慢一个数量级。所以我希望有人能提出更快的方法。
我目前的方法如下:
1. 将表2合并为非重叠时间段 2. 找到表2的反向范围 3. 将表1和反向表2中的重叠时间段合并
如果这些步骤中的一些步骤可以合并在一起,我相信肯定有更快的方法。
更详细地说:
/* (1) merge overlapping preiods */
WITH
spell_starts AS (
SELECT [start_date], [end_date]
FROM table_2 s1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 s2
WHERE s2.[start_date] < s1.[start_date]
AND s1.[start_date] <= s2.[end_date]
)
),
spell_ends AS (
SELECT [start_date], [end_date]
FROM table_2 t1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 t2
WHERE t2.[start_date] <= t1.[end_date]
AND t1.[end_date] < t2.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]
/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
/* all forward looking spells */
SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
FROM merge_table_2
UNION ALL
/* back looking spell (to 'origin of time') created separately */
SELECT '1900-01-01' AS [start_date]
,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date]
AND [end_date] <= '9999-01-01'
/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date