从一组重叠的日期范围中找到共同的日期范围

3
如何从给定日期范围的一组中找到重叠(公共)日期范围?需要考虑特定节目(PID)的所有事件(EID)的重叠(公共)日期范围。例如:节目(PID=13579)对于事件(EID=2)有两个日期范围。参考链接:link。已经查看过链接Link,但不实用。以下是示例模式和数据:Image for Visualizing Overlapped (Common) Date Ranges
CREATE TABLE #EventsTBL
(
    PID INT,
    EID INT,
    StartDate DATETIME,
    EndDate DATETIME
);

INSERT INTO #EventsTBL
VALUES
(13579, '1', '01 Jan 2018', '31 Mar 2019'),
(13579, '2', '01 Feb 2018', '31 May 2018'),
(13579, '2', '01 Jul 2018', '31 Jan 2019'),
(13579, '7', '01 Mar 2018', '31 Mar 2019'),
(13579, '5', '01 Feb 2018', '30 Apr 2018'),
(13579, '5', '01 Oct 2018', '31 Mar 2019'),
(13579, '8', '01 Jan 2018', '30 Apr 2018'),
(13579, '8', '01 Jun 2018', '31 Dec 2018'),
(13579, '13', '01 Jan 2018', '31 Mar 2019'),
(13579, '6', '01 Apr 2018', '31 May 2018'),
(13579, '6', '01 Sep 2018', '30 Nov 2018'),
(13579, '4', '01 Feb 2018', '31 Jan 2019'),
(13579, '19', '01 Mar 2018', '31 Jul 2018'),
(13579, '19', '01 Oct 2018', '28 Feb 2019'),
--
(13570, '16', '01 Feb 2018', '30 Jun 2018'),
(13570, '16', '01 Aug 2018', '31 Aug 2018'),
(13570, '16', '01 Oct 2018', '28 Feb 2019'),
(13570, '23', '01 Mar 2018', '30 Jun 2018'),
(13570, '23', '01 Nov 2018', '31 Jan 2019');

输出应该是:
PID     StartDate       EndDate
13579   01-Apr-2018     30-Apr-2018
13579   01-Oct-2018     30-Nov-2018
13570   01-Mar-2018     30-Jun-2018
13570   01-Nov-2018     31-Jan-2019

1
将期望的结果作为文本发布总是更好的 :) 对于你的第一个问题,做得相当不错!为什么马丁的答案不够呢? - S3S
1
我看不出来用什么规则来确定它们是否重叠。许多日期范围彼此重叠,但是被突出显示的那些重叠情况更多,但并非所有情况都如此 - 那么是如何决定的呢? - Cato
那么一个日期必须至少在所有PID的EID中出现一次吗? - Salman A
13570有意义,我看到了重叠之处 - 13579充满了未报告的重叠。 - Cato
需要找到特定程序ID(PID)的所有事件ID(EIDs)的日期范围重叠(公共)!例如,第二行没有从10月'18到11月'18的范围,但是相同的事件ID(eid = 2)具有另一个范围(第三行)从7月'18到1月'19,其中包括10月'18,11月'18。因此,在这里,第2行和第3行属于eid(事件ID)= 2。这意味着事件2具有10月'18和11月'18的公共日期范围。希望这很清楚。 - SaiKarthikChandra
显示剩余5条评论
2个回答

1

好的,所以我生成了一个CTE,其中包含考虑范围内所有日期的日期。

对于这些日期中的每一个,我生成了FLAG=1,表示我认为检测到了重叠。

然后我使用row_number()作为“岛屿”问题的标准解决方案,并输出flag=1的“岛屿”的开始和结束时间。

希望有帮助,我得到了您对13570的结果,但是根据我的理解,“重叠”应该是整个13579都重叠。也许需要进一步解释和调整。如果您能够按照您的规则生成FLAG,则排名部分仍将适用。

CREATE TABLE #EventsTBL
(
    PID INT,
    EID INT,
    StartDate DATETIME,
    EndDate DATETIME
);

INSERT INTO #EventsTBL
VALUES
(13579, '1', '01 Jan 2018', '31 Mar 2019'),
(13579, '2', '01 Feb 2018', '31 May 2018'),
(13579, '2', '01 Jul 2018', '31 Jan 2019'),
(13579, '7', '01 Mar 2018', '31 Mar 2019'),
(13579, '5', '01 Feb 2018', '30 Apr 2018'),
(13579, '5', '01 Oct 2018', '31 Mar 2019'),
(13579, '8', '01 Jan 2018', '30 Apr 2018'),
(13579, '8', '01 Jun 2018', '31 Dec 2018'),
(13579, '13', '01 Jan 2018', '31 Mar 2019'),
(13579, '6', '01 Apr 2018', '31 May 2018'),
(13579, '6', '01 Sep 2018', '30 Nov 2018'),
(13579, '4', '01 Feb 2018', '31 Jan 2019'),
(13579, '19', '01 Mar 2018', '31 Jul 2018'),
(13579, '19', '01 Oct 2018', '28 Feb 2019'),
--
(13570, '16', '01 Feb 2018', '30 Jun 2018'),
(13570, '16', '01 Aug 2018', '31 Aug 2018'),
(13570, '16', '01 Oct 2018', '28 Feb 2019'),
(13570, '23', '01 Mar 2018', '30 Jun 2018'),
(13570, '23', '01 Nov 2018', '31 Jan 2019');


SELECT count(enddate) FROM (SELECT CAST('19660423' as date) dt) A LEFT JOIN #EventsTBL B ON A.dt = b.StartDate;

WITH MIN_MAX AS (SELECT MIN(StartDate) S , MAX(EndDate) E FROM #EventsTBL ),
     ALL_DATES AS (SELECT S DT FROM MIN_MAX
                    UNION ALL
                    SELECT DATEADD(day,1,DT) FROM ALL_DATES WHERE DT < (SELECT E FROM MIN_MAX)
                  ),
     BuildFlags AS (SELECT  P.pid,
                            DT,
                            COUNT(e.PID ) CNT, 
                            CASE WHEN COUNT(e.pid) > 1 THEN 1 ELSE 0 END FLAG, 
                            row_number() OVER(partition by p.pid order by DT) RN
                        FROM ALL_DATES A CROSS JOIN (SELECT DISTINCT E2.pid FROM #EventsTBL E2) P
                        LEFT JOIN 
                            #EventsTBL E ON P.PID = E.pid AND
                            A.DT BETWEEN E.StartDate AND E.EndDate GROUP BY P.pid,DT),
    AddRanks AS (SELECT *,rn - row_number()over(partition by pid,flag order by dt) groupRank  FROM BuildFlags)

     select pid,min(dt) as start, max(dt) as ending from AddRanks 
        where flag = 1
        group by pid,grouprank
        order by pid,min(dt)
     option(maxrecursion 0)

编辑 - 我想我明白你的意思了,你想将pid和eid合并成唯一的pid和eid,并使用存在的日期。然后,您将所有pid和eid同时处于活动状态定义为重叠。因此,我提出了以下修改。

;WITH MIN_MAX AS (SELECT MIN(StartDate) S , MAX(EndDate) E FROM #EventsTBL ),
     ALL_DATES AS (SELECT S DT FROM MIN_MAX
                    UNION ALL
                    SELECT DATEADD(day,1,DT) FROM ALL_DATES WHERE DT < (SELECT E FROM MIN_MAX)
                  ),
     GROUPED AS (SELECT Q.pid,Q.eid,q.dt,case when max(tx.pid) is null then 0 else 1 end YES from (Select * FROM All_Dates cross join (select distinct pid,eid from #EventsTBL) AQ) Q
                                    LEFT JOIN  #EventsTBL TX ON TX.PID = Q.pid and tx.EID = Q.eid and 
                                                Q.DT BETWEEN TX.StartDate AND TX.EndDate GROUP BY q.pid,q.eid,q.dt
                ),                                       
     BuildFlags AS (SELECT g.pid,g.dt, row_number() OVER(partition by g.pid order by g.DT) RN,
          CASE WHEN WQ.tot = (SELECT count(distinct g2.eid)  FROM grouped g2 WHERE g2.PID = G.pid and g2.dt=g.dt and g2.yes=1) then 1 else 0 end FLAG
      FROM GROUPED G cross apply (select count(distinct E9.eid) tot FROM #EventsTBL E9 WHERE E9.PID = G.pid) WQ)
    ,AddRanks AS (SELECT *,rn - row_number()over(partition by pid,flag order by dt) groupRank  FROM BuildFlags)

     select pid,min(dt) as start, max(dt) as ending from AddRanks 
        where flag = 1
        group by pid,grouprank
        order by pid,min(dt)
     option(maxrecursion 0);

结果集看起来不错...谢谢Cato。 - SaiKarthikChandra

1
这个答案计算重叠区间的数量。它假设具有相同 EID 的日期范围不会重叠。以下是带有解释的查询:
DECLARE @EventsTBL TABLE (PID INT, EID INT, StartDate DATETIME, EndDate DATETIME);
INSERT INTO @EventsTBL VALUES
(13579, 1,  '01 Jan 2018', '31 Mar 2019'),
(13579, 2,  '01 Feb 2018', '31 May 2018'),
(13579, 2,  '01 Jul 2018', '31 Jan 2019'),
(13579, 7,  '01 Mar 2018', '31 Mar 2019'),
(13579, 5,  '01 Feb 2018', '30 Apr 2018'),
(13579, 5,  '01 Oct 2018', '31 Mar 2019'),
(13579, 8,  '01 Jan 2018', '30 Apr 2018'),
(13579, 8,  '01 Jun 2018', '31 Dec 2018'),
(13579, 13, '01 Jan 2018', '31 Mar 2019'),
(13579, 6,  '01 Apr 2018', '31 May 2018'),
(13579, 6,  '01 Sep 2018', '30 Nov 2018'),
(13579, 4,  '01 Feb 2018', '31 Jan 2019'),
(13579, 19, '01 Mar 2018', '31 Jul 2018'),
(13579, 19, '01 Oct 2018', '28 Feb 2019'),
(13570, 16, '01 Feb 2018', '30 Jun 2018'),
(13570, 16, '01 Aug 2018', '31 Aug 2018'),
(13570, 16, '01 Oct 2018', '28 Feb 2019'),
(13570, 23, '01 Mar 2018', '30 Jun 2018'),
(13570, 23, '01 Nov 2018', '31 Jan 2019');

WITH cte1 AS (
    /*
     * augment the data with the number of distinct EID per PID
     * we will need this later
     */
    SELECT e.PID, a.EIDCount, StartDate, EndDate
    FROM @EventsTBL AS e
    JOIN (
        SELECT PID, COUNT(DISTINCT EID) AS EIDCount
        FROM @EventsTBL
        GROUP BY PID
    ) AS a ON e.PID = a.PID
), cte2 AS (
    /*
     * build a list of "points in time" at which an event started or ended
     * and the number concurrent events changed
     * the zero value rows are required!
     */
    SELECT PID, EIDCount, StartDate AS pdate, 1 AS pval
    FROM cte1
    UNION ALL
    SELECT PID, EIDCount, EndDate, 0
    FROM cte1
    UNION ALL
    SELECT PID, EIDCount , DATEADD(DAY, 1, EndDate), -1
    FROM cte1
), cte3 AS (
    /*
     * calculate running sum of pval over dates; minus ones first
     */
    SELECT PID, EIDCount, pdate, SUM(pval) OVER (PARTITION BY PID ORDER BY pdate, pval) AS evtcount
    FROM cte2
), cte4 AS (
    /*
     * consolidate data for same dates and we are done with the main part
     */
    SELECT PID, EIDCount, pdate, MAX(evtcount) AS evtcount
    FROM cte3
    GROUP BY PID, EIDCount, pdate
), cte5 AS (
    /*
     * assign "change flag" to rows where number of concurrent events
     * enters or exits the required count w.r.t. previous row
     */
    SELECT PID, EIDCount, pdate, evtcount, CASE
        WHEN evtcount < EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) < EIDCount THEN 0
        WHEN evtcount = EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) = EIDCount THEN 0
        ELSE 1
    END AS chg
    FROM cte4
), cte6 AS (
    /*
     * convert "change flag" to "group numbers" over consecutive rows using running sum
     */
    SELECT PID, EIDCount, pdate, evtcount, SUM(chg) OVER (PARTITION BY PID ORDER BY pdate) AS grp
    FROM cte5
)
/*
 * group rows by pid and group numbers
 */
SELECT PID, MIN(pdate) AS StartDate, MAX(pdate) AS EndDate
FROM cte6
WHERE evtcount = EIDCount
GROUP BY PID, grp
ORDER BY PID, StartDate

db<>fiddle上的演示


看起来与ResultSet兼容且性能更好。谢谢Salman。 - SaiKarthikChandra

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接