使用SQL检测连续的日期范围

34

我想填写日历对象,需要起始日期和结束日期信息。我有一列包含一系列日期。其中一些日期是连续的(相差一天),而另一些日期则不是。

InfoDate  

2013-12-04  consecutive date [StartDate]
2013-12-05  consecutive date
2013-12-06  consecutive date [EndDate]

2013-12-09                   [startDate]
2013-12-10                   [EndDate]

2014-01-01                   [startDate]
2014-01-02 
2014-01-03                   [EndDate]

2014-01-06                   [startDate]
2014-01-07                   [EndDate]

2014-01-29                   [startDate]
2014-01-30 
2014-01-31                   [EndDate]

2014-02-03                   [startDate]
2014-02-04                   [EndDate]

我想要选择每个连续日期范围的开始和结束日期(块中的第一个和最后一个)。

StartDate     EndDate

2013-12-04    2013-12-06
2013-12-09    2013-12-10
2014-01-01    2014-01-03
2014-01-06    2014-01-07
2014-01-29    2014-01-31
2014-02-03    2014-02-04

我希望只使用SQL解决这个问题。


1
第二个清单中的空行是什么意思?你真的需要用SQL来解决这个问题吗?在SQL中表达这个问题似乎相当困难(至少在标准SQL中),显而易见的算法基本上是顺序的,可以很容易地用过程化语言编写。如果确实需要使用SQL,我会使用存储过程。 - Palec
9个回答

66

不需要连接或递归CTE。标准的间隔和岛屿解决方案是通过(value minus row_number)进行分组,因为它在连续序列中是不变的。开始和结束日期只是组的MIN()和MAX()。

WITH t AS (
  SELECT InfoDate d,ROW_NUMBER() OVER(ORDER BY InfoDate) i
  FROM @d
  GROUP BY InfoDate
)
SELECT MIN(d),MAX(d)
FROM t
GROUP BY DATEDIFF(day,i,d)

非常聪明的解决方案。谢谢! - Dan
1
我认为应该将GROUP BY更改为: GROUP BY DATEADD(day,-i,d) - BennyBechDk
@BennyBechDk 使用 GROUP BY DATEDIFF(day,i,d)GROUP BY DATEADD(day,-i,d) 将生成相同的分组。 - Anon
可能是因为你说“不需要使用CTE”,然后却使用了CTE,所以被downvote了。但是你当然可以将CTE替换为最终SELECT语句中的“t”,这样你仍然是正确的... - El Ronnoco
1
很棒的解决方案。另外,如果你将 ROW_NUMBER() 改为 DENSE_RANK(),那么这段代码就可以处理有重复日期的情况了。 - andrew
显示剩余2条评论

6

这里是你需要的内容:

;WITH CTEDATES
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY Infodate asc ) AS ROWNUMBER,infodate FROM YourTableName  

),
 CTEDATES1
AS
(
   SELECT ROWNUMBER, infodate, 1 as groupid FROM CTEDATES WHERE ROWNUMBER=1
   UNION ALL
   SELECT a.ROWNUMBER, a.infodate,case datediff(d, b.infodate,a.infodate) when 1 then b.groupid else b.groupid+1 end as gap FROM CTEDATES A INNER JOIN CTEDATES1 B ON A.ROWNUMBER-1 = B.ROWNUMBER
)

select min(mydate) as startdate, max(infodate) as enddate from CTEDATES1 group by groupid

2
你应该使用 OVER (ORDER BY Infodate) 而不是 OVER (ORDER BY (SELECT 1))。另外,将 min(mydate) 改为 min(infodate)。除此之外,这是一个好的答案。 - Lamak

1

--MS SQL

with cte as (
select start_date, end_date,
    dateadd(d, -row_number() over (order by start_date), start_date) as GRN
from projects)
select min(start_date), max(end_date) from cte group by grn order by grn;

--Oracle

with cte as(
select start_date, end_date, 
    start_date - row_number() over (order by start_date) as GRN 
    from projects)
select min(start_date), max(end_date) from cte  group by grn order by grn;

0

你可以这样做,这里是sqlfiddle

select
  min(ndate) as start_date,
  max(ndate) as end_date
from
(select
  ndate,
  dateadd(day, -row_number() over (order by ndate), ndate) as rnk
 from dates
 ) t
 group by
   rnk

0

我已经将这些值插入到名为#consec的表中,然后执行了以下操作:

select t1.*
,t2.infodate as binfod
into #temp1
from #consec t1
left join #consec t2 on dateadd(DAY,1,t1.infodate)=t2.infodate

select t1.*
,t2.infodate as binfod
into #temp2
from #consec t1
left join #consec t2 on dateadd(DAY,1,t2.infodate)=t1.infodate
;with cte as(
select infodate,  ROW_NUMBER() over(order by infodate asc) as seq from #temp1
where binfod is null
),
cte2 as(
select infodate, ROW_NUMBER() over(order by infodate asc) as seq from #temp2
where binfod is null
)

select t2.infodate as [start_date]
,t1.infodate as [end_date] from cte t1
left join cte2 t2 on t1.seq=t2.seq 

只要你的日期区间没有重叠,那对你来说就可以了。

0

这是我的带有测试数据的示例:

--required output
-- 01 - 03
-- 08 - 09
-- 12 - 14

DECLARE @maxRN int;
WITH #tmp AS (
                SELECT CAST('2013-01-01' AS date) DT
    UNION ALL   SELECT CAST('2013-01-02' AS date)
    UNION ALL   SELECT CAST('2013-01-03' AS date)
    UNION ALL   SELECT CAST('2013-01-05' AS date)
    UNION ALL   SELECT CAST('2013-01-08' AS date)
    UNION ALL   SELECT CAST('2013-01-09' AS date)
    UNION ALL   SELECT CAST('2013-01-12' AS date)
    UNION ALL   SELECT CAST('2013-01-13' AS date)
    UNION ALL   SELECT CAST('2013-01-14' AS date)
),
#numbered AS (
    SELECT 0 RN, CAST('1900-01-01' AS date) DT
    UNION ALL
    SELECT ROW_NUMBER() OVER (ORDER BY DT) RN, DT
    FROM #tmp
)

SELECT * INTO #tmpTable FROM #numbered;
SELECT @maxRN = MAX(RN) FROM #tmpTable;

INSERT INTO #tmpTable
SELECT @maxRN + 1, CAST('2100-01-01' AS date);

WITH #paired AS (
    SELECT 
    ROW_NUMBER() OVER(ORDER BY TStart.DT) RN, TStart.DT DTS, TEnd.DT DTE
    FROM #tmpTable TStart
    INNER JOIN #tmpTable TEnd 
    ON TStart.RN = TEnd.RN - 1
    AND DATEDIFF(dd,TStart.DT,TEnd.DT) > 1  
)

SELECT TS.DTE, TE.DTs 
FROM #paired TS
INNER JOIN #paired TE ON TS.RN = TE.RN -1
AND TS.DTE <> TE.DTs -- you could remove this filter if you want to have start and end on the same date

DROP TABLE #tmpTable

请使用实际表替换 #tmp 数据。


0
使用CTE(公共表达式): with cte as( select start_date, end_date, start_date - row_number() over (order by start_date) as GRN from projects) 从cte中选择最小的开始日期和最大的结束日期 按grn分组 按grn计数和最小的开始日期排序

2
谢谢您对Stack Overflow社区的贡献。这可能是一个正确的答案,但如果您能提供代码的附加解释,那将非常有用,让开发人员能够理解您的思路。对于不太熟悉语法或难以理解概念的新开发人员来说,这尤其有帮助。您是否可以友好地编辑您的回答,以包含额外的细节,以造福社区? - Jeremy Caney

0

这里可能还有另一个简单的解决方案 -

with tmp as 
(
select
datefield
, dateadd('day',-row_number() over(order by date asc),datefield) as date_group 
from table
)
select
min(datefield) as start_date
, max(datefield) as end_date 
from tmp
group by date_group

-1
SELECT InfoDate ,
    CASE
      WHEN TRUNC(InfoDate - 1) = TRUNC(lag(InfoDate,1,InfoDate) over (order by InfoDate))
      THEN NULL
      ELSE InfoDate
    END STARTDATE,
    CASE
      WHEN TRUNC(InfoDate + 1) = TRUNC(lead(InfoDate,1,InfoDate) over (order by InfoDate))
      THEN NULL
      ELSE InfoDate
    END ENDDATE
  FROM TABLE;

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接